[Tutor] looking but not finding

ThreeBlindQuarks threesomequarks at proton.me
Wed Jul 12 15:09:27 EDT 2023


It looks like the data gathering algorithm is fairly complex  using many parts and that it also has to compare recent to present values to decide what, if anything, needs to be recorded.

My personal goal here is to help with some stumbling block the user has with the language rather than help them build it. So some smaller problems asked carefully and clearly might elicit a response.

But can I ask a design question? Is this a long-running program that checks things for days or months at a time? Must it log things almost immediately, or can it buffer data internally and periodically write something out? What happens if things crash? 

Some designs may work better than others.

Is anything else going to be reading these logs immediately, or can you make one set of logs that nobody reads that can later be massaged into a form suitable for sharing?

A simple-minded approach may be to log everything using a design that drops one or more records that may be linked by something like the same date-time or sequence number. Don't worry if a measure has not changed and so on, just log it.

A second program can (maybe later) read in the contents of a log and apply many kinds of logic such as ignoring repeated same results, and perhaps use the logged data to populate a database of whatever kind in ways that can be easily queried.

As to how to save data, there are too many ways and for some purposes, simple trivial ones could be best. If you have a fixed number of simple fields in a single record, using some commas to add a line in a CSV file may be adequate. 

If you need to save multiple types of records, you could add them to different CSV as in tables, or have a wider record with room for many kinds of info and only populate the ones relevant for that item.

Or, if your data effectively can be viewed as being encapsulated in some kind of object you build, as is done in some languages, you may actually convert it to JSON or the like and save that, albeit perhaps using other technologies that support object storage and retrieval.

Again, a less broad question well described may get better answers.





Sent with Proton Mail secure email.

------- Original Message -------
On Wednesday, July 12th, 2023 at 11:26 AM, o1bigtenor <o1bigtenor at gmail.com> wrote:


> Thank you for your patience mr Alan.
> 
> What I'm finding is that the computing world uses terms in a different way than
> does the sensor world than does the storage world than does the rest of us.
> That seriously complicates things. (Details in the following.)
> 
> On Tue, Jul 11, 2023 at 8:15 PM Alan Gauld via Tutor tutor at python.org wrote:
> 
> > On 11/07/2023 22:08, o1bigtenor wrote:
> > 
> > > > > - fluid system is on a load cell system
> > > > 
> > > > I googled load call sysem and it seems to be a hardware based system,
> > > > The load cell system does NOT have to be proprietary.
> > > 
> > > Your response is headed in quite a different direction than my question.
> > 
> > Sorry, but it sounded like you were having trouble recording the data.
> > It was not obvious that you had managed to read the data and were
> > interested in storing it. Quite a different problem I agree.
> > 
> > > > > but I want to record that information (as well as other information around it).
> > > > > 
> > > > > I've tried looking for data log software
> > 
> > There are many, many different types of storage available
> > from structured files(shelves, JSON, YAML, XML, etc to
> > databases both SQL and NoSQL based.
> 
> 
> This is where I start getting 'lost'. I have no idea which of the three
> listed formats will work well with what I'm doing - - - dunno even how to
> find out either!
> 
> > Much depends on what you want to do with the data and
> > how regular it is. If its regular a SQL database may be
> > the best option - SQLite comes with python and is
> > lightweight and easily portable.
> 
> 
> The first point with the data is to store it. At the time of its collection
> the data will also be used to trigger other events so that will be (I think
> that is the proper place) be done in the capturing/organizing/supervising
> program (in python).
> 
> > If the data is irregular then a NoSQL database like
> > Mongo may be better. But if you only need the storage
> > for persistence then a structured file would be simpler.
> 
> 
> Went looking at MongoDB - - - honestly cannot tell what makes data
> 'irregular' - - - - perhaps you might advise - - - I cannot find a reasonable
> definition - - - -yet!
> 
> I am going to try to delineate at least some of what I'm thinking.
> 
> (Each point has data associated with it.)
> 1. reset weighing system
> 2. data is captured from the identification sub-system
> 3. data is captured from the calendar system (date/time)
> 4. (this one is not in initial plans but possibly in later
> test vacuum in system for level)
> 5. get weighing system value
> 6. zero value before starting measurement
> 7. poll sensor every 0.5 sec using step $8 if there is a change from #6
> 8. (assuming change in sensor value - - store data
> 9. when difference between values shows that the value change is less
> than 250 g/min continue for another 20 seconds
> 10. after #9 is fulfilled issue 'final weight'
> 11. 2 seconds (may change) after #10 open valve #1
> 12. after 30 seconds (this value may change) close valve #1
> 13. after valve #1 closes over valve #2 connecting system to wash/rinse system
> 14. 30 sec after valve #2 is opened valve #3 is opened
> (This may be opened to the rinse system at the same time as valve #2 is
> design not yet complete - - - needs fabbing and testing!!!)
> 15. 5 seconds after rinse is complete valves connecting to the rinse system
> (and its drain) are closed.
> 16. see #1
> 
> Each weighing subsystem will have its own control. There will be multiple
> such subsystems (initially at least 3 more likely 12 - - - further
> (future) - - - up to
> a couple hundred all operating at the same time).
> 
> > > What I specifically need assistance on is software that enables one to log
> > > data. What I find mountainous reams of information is on syslog - - -
> > > useful - - - - but when it comes to logging data - - - - well - - - quite NOT.
> > 
> > The logging module and its cousins are intended for
> > logging program progress/output/errors etc not for
> > storing data. You could use them for that but its
> > not their designed purpose.
> > 
> > > > > stuff. Tried looking to see if there were python libraries that could
> > > > > be manipulated into service
> > 
> > It sounds like you either want to persist data between
> > program runs or to store data for processing/analysis.
> > The former is probably best met with a structured file
> > format such as JSON. (Or even good old CSV!)
> > 
> > If you want to slice/dice and perform calculations on
> > the data, especially if it's a large volume, you may be
> > better with a proper database - it's what they are
> > designed for after all.
> > 
> > If the data is regular go with SQL if irregular go NoSQL.
> 
> 
> Now if I only knew what regular and/or irregular data was.
> Have been considering using Postgresql as a storage engine.
> AFAIK it has the horsepower to deal with serious large amounts of data.
> This weighing routine will be running for up to 3.5 minutes. That doesn't
> sound like much - - - some 400 'chunks' (meaning all the data that comprises
> what is being stored) of data. (Then there may be a repeat of this function
> every 4.5 to 6 minutes for say up to 180 minutes - - - multiple times (varying
> up to 3x) per day. Its not serious huge data amounts but its definitely non-
> trivial.
> 
> > > Hardware should be immaterial.
> > > Sensors are available for a plethora of measurables.
> > > I want to grab a value from sensor A and deposit that value in a
> > > data log over there.
> > 
> > OK, but that can be as simple as writing a string to a file.
> > Or it can mean a complex SQL query into a database. We don't
> > know what your data looks like or what you intend to do
> > with it, or how much data there is to store. (We have a
> > clue in the data capture rate of 2 samples per second
> > (but from how many sensors?) but we don't know the length
> > of a measurement period - minutes, hours, days, "infinite"?
> 
> 
> Sorry - - - previous - - -
> 
> > If there are several sensor types are the sensor data similar
> > for each? Or do you need a separate data format(table or file?)
> > for each sensor type 9or even each sensor!) And if that's
> > the case do you need to combine those data later for analysis,
> > based on which keys?
> 
> 
> This is part of what I hope I was able to answer in my list (1-16).
> 
> > Also will concurrent reading/writing be required? ie Will
> > there be reporting/analysis being performed at the same
> > time as the sensors are being read and their data stored?
> > What about locking issues?
> 
> 
> Here you have opened another pandora's box for me.
> Dunno much about any of this.
> Concurrent - - - yes, there are multiple sensors per station and
> multiple stations and hopefully they're all working (problems
> with the system if they're not!)
> 
> There is a small amount of analysis being done as the data is stored
> but that's to drive the system (ie at point x you stop this, at point
> y you do this, point y does this for z time then a happens etc - - - I
> do not consider that heavy duty analysis. That happens on a different
> system (storage happens first to the local governing system. After
> a round of data (one cycle) has completed that data is shipped in a burst
> to the long term storage system. (there may be a cascade of storage systems
> but I haven't got that far yet!) This system is quite capable of handling a LOT
> of information - - - or that's the goal anyway. Maybe One system will store the
> info and another will do analysis and then return that analysis to the storing
> system for accumulation - - - also not decided.)
> 
> Locking - - - how, when, where and why - - - the why I think I understand ( to
> make sure data doesn't get corrupted) but the rest - - - would love some
> guidance on.
> 
> > There are a myriad factors to consider when choosing
> > an appropriate storage mechanism.
> > 
> > > I understand how to read the sensors
> > 
> > That was the bit that was not apparent in your original
> > message, and usually the hardest bit by far!
> > 
> > > ... problems here would go to a much more
> > > hardware slanted list - - - - not software.
> > 
> > We have had several hardware interface discussions on
> > the tutor list. Often involving USB or RS232 serial
> > links and such. The list caters for experienced programmers
> > new to Python as well as new programmers.
> 
> 
> Hmmmmmmmmmmm - - - Modbus, Probus and RS-485 are all
> being considered - - - likely using a tcp/ip backbone.
> 
> > > The Python info https://docs.python.org/3/howto/logging.html is quite
> > > slanted toward syslog although it does not state that.
> > 
> > I agree, I don't think that's the best route for
> > what I believe you want. Logging is quite a well defined
> > operation in software engineering so I guess the authors
> > of the docs just assumed folks would think in terms
> > of system logs.
> 
> 
> Can understand that but think that those functions might also be used
> for what I'm trying to do - - - but I'm really not sure how to.
> 
> > > someone just might have produced something like pyplot for producing
> > > graphs or scipy which gets close to what I want in its statistics section
> > 
> > But this is a completely different issue again. This is
> > moving into how you analyse and report on the data
> > you have stored. That's a completely different toolset.
> 
> 
> snip
> The two mentioned items were examples of code (and/or analysis) - - - - not
> functions I wished to use.
> 
> > > but I haven't been able to find anything when
> > > I include the term data log (logger/logging etc).
> > 
> > I suspect what you are trying to do is not what
> > most programmers would think of as logging.
> > It sounds more like data storage.
> > 
> > > What I'm storing - - -
> > > 1. item identification
> > > 2. time/date (routine is called every 0.5 seconds at this point in the planning)
> > > 3. value (from the weighing sensor system)
> > > 
> > > above is data most likely shipping over tcp/ip.
> > 
> > See above. That looks simple and regular enough that
> > a simple formatted text string is all that is needed.
> > But for potential future complexity(*) I'd suggest
> > that a CSV or JSON file would be better for persistent
> > storage. A SQL database could handle it easily
> > too if you need more complex processing of the data
> > post storage, or concurrent access.
> > 
> > (*)From my experience of building telemetry applications
> > it isn't long before people want geographical location,
> > and error detection/correction keys etc. The amount
> > of data per event tends to grow over time.
> 
> Oh yes - - - - I'm quite greedy with trying to get as much data
> as I can. There are a lot of uses for such and if its easy to put
> together then there are less excuses for the not using.
> 
> I don't think I'll I'll be starting with csv - - - - really need to figure out
> what I should be using for what you termed 'structured files'.
> Do you have any suggestions as to where I find information that
> will help me figure out a good or a great way to do that part of things?
> (Choosing yaml/xml/???? for starters.)
> 
> Thanking you for your patience and assistance.
> 
> Regards
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor


More information about the Tutor mailing list