[Tutor] Reading .csv data vs. reading an array
Mats Wichmann
mats at wichmann.us
Mon Jul 15 15:03:10 EDT 2019
On 7/15/19 12:35 PM, Chip Wachob wrote:
> Oscar and Mats,
>
> Thank you for your comments and taking time to look at the snips.
>
> Yes, I think I had commented that the avg+trigger was = triggervolts in
> my original post.
>
> I did find that there was an intermediary process which I had forgotten
> to comment out that was adversely affecting the data in one instance and
> not the other. So it WAS a case of becoming code blind. But I didn't
> give y'all all of the code so you would not have known that. My apologies.
>
> Mats, I'd like to get a better handle on your suggestions about
> improving the code. Turns out, I've got another couple of 4GByte files
> to sift through, and they are less 'friendly' when it comes to
> determining the start and stop points. So, I have to basically redo
> about half of my code and I'd like to improve on my Python coding skills.
>
> Unfortunately, I have gaps in my coding time, and I end up forgetting
> the details of a particular language, especially a new language to me,
> Python.
>
> I'll admit that my 'C' background keeps me thinking as these data sets
> as arrays.. in fact they are lists, eg:
>
> [
> [t0, v0],
> [t1, v1],
> [t2, v2],
> .
> .
> .
> [tn, vn]
> ]
>
> Time and volts are floats and need to be converted from the csv file
> entries.
>
> I'm not sure that follow the "unpack" assignment in your example of:
>
> for row in TrigWind:
> time, voltage = row # unpack
>
> I think I 'see' what is happening, but when I read up on unpacking, I
> see that referring to using the * and ** when passing arguments to a
> function...
That's a different aspect of unpacking. This one is sequnce unpacking,
sometimes called tuple (or seqeucence) assignment. In the official
Python docs it is described in the latter part of this section:
https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
> I tried it anyhow, with this being an example of my source data:
>
> "Record Length",2000002,"Points",-0.005640001706,1.6363
> "Sample Interval",5e-09,s,-0.005639996706,1.65291
> "Trigger Point",1128000,"Samples",-0.005639991706,1.65291
> "Trigger Time",0.341197,s,-0.005639986706,1.60309
> ,,,-0.005639981706,1.60309
> "Horizontal Offset",-0.00564,s,-0.005639976706,1.6363
> ,,,-0.005639971706,1.65291
> ,,,-0.005639966706,1.65291
> ,,,-0.005639961706,1.6363
> .
> .
> .
>
> Note that I want the items in the third and fourth column of the csv
> file for my time and voltage.
>
> When I tried to use the unpack, they all came over as strings. I can't
> seem to convert them selectively..
That's what the csv module does, unless you tell it not to. Maybe this
will help:
https://docs.python.org/3/library/csv.html#csv.reader
There's an option to convert unquoted values to floats, and leave quoted
values alone as strings, which would seem to match your data above quite
well.
> Desc1, Val1, Desc2, TimeVal, VoltVal = row
>
> TimeVal and VoltVal return type of str, which makes sense.
>
> Must I go through yet another iteration of scanning TimeVal and VoltVal
> and converting them using float() by saving them to another array?
>
>
> Thanks for your patience.
>
> Chip
>
>
>
>
>
>
>
>
>
> On Sat, Jul 13, 2019 at 9:36 AM Mats Wichmann <mats at wichmann.us
> <mailto:mats at wichmann.us>> wrote:
>
> On 7/11/19 8:15 AM, Chip Wachob wrote:
>
> kinda restating what Oscar said, he came to the same conclusions, I'm
> just being a lot more wordy:
>
>
> > So, here's where it gets interesting. And, I'm presuming that
> someone out
> > there knows exactly what is going on and can help me get past this
> hurdle.
>
> Well, each snippet has some "magic" variables (from our point of view,
> since we don't see where they are set up):
>
> 1: if(voltage > (avg + triglevel)
>
> 2: if((voltage > triggervolts)
>
> since the value you're comparing voltage to gates when you decide
> there's a transition, and thus what gets added to the transition list
> you're building, and the list size comes out different, and you claim
> the data are the same, then guess where a process of elimination
> suggests the difference is coming from?
>
> ===
>
> Stylistic comment, I know this wasn't your question.
>
> > for row in range (len(TrigWind)):
>
> Don't do this. It's not a coding error giving you wrong results, but
> it's not efficient and makes for harder to read code. You already have
> an iterable in TrigWind. You then find the size of the iterable and use
> that size to generate a range object, which you then iterate over,
> producing index values which you use to index into the original
> iterable. Why not skip all that? Just do
>
> for row in TrigWind:
>
> now row is actually a row, as the variable name suggests, rather than an
> index you use to go retrieve the row.
>
> Further, the "row" entries in TrigWind are lists (or tuples, or some
> other indexable iterable, we can't tell), which means you end up
> indexing into two things - into the "array" to get the row, then into
> the row to get the individual values. It's nicer if you unpack the rows
> into variables so they can have meaningful names - indeed you already do
> that with one of them. Lets you avoid code snips like "x[7][1]"
>
> Conceptually then, you can take this:
>
> for row in range(len(Trigwind)):
> voltage = float(TrigWind[row][1])
> ...
> edgearray.append([float(TrigWind[row][0]),
> float(TrigWind[row][1])])
> ...
>
> and change to this:
>
> for row in TrigWind:
> time, voltage = row # unpack
> ....
> edgearray.append([float)time, float(voltage)])
>
> or even more compactly you can unpack directly at the top:
>
> for time, voltage in TrigWind:
> ...
> edgearray.append([float)time, float(voltage)])
> ...
>
> Now I left an issue to resolve with conversion - voltage is not
> converted before its use in the not-shown comparisons. Does it need to
> be? every usage of the values from the individual rows here uses them
> immediately after converting them to float. It's usually better not to
> convert all over the place, and since the creation of TrigWind is under
> your own control, you should do that at the point the data enters the
> program - that is as TrigWind is created; then you just consume data
> from it in its intended form. But if not, just convert voltage before
> using, as your original code does. You don't then need to convert
> voltage a second time in the list append statements.
>
> for time, voltage in TrigWind:
> voltage = float(voltage)
> ...
> edgearray.append([float)time, voltage])
> ...
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org <mailto:Tutor at python.org>
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list