[Tutor] Reading .csv data vs. reading an array

Chip Wachob wachobc at gmail.com
Mon Jul 15 15:59:01 EDT 2019


Mats,

Thank you!

So I included the QUOTE_NONNUMERIC to my csv.reader() call and it almost
worked.

Now, how wonderful that the scope's csv file simply wrote an s for seconds
and didn't include quotes.  Now Python tells me it can't create a float of
s.  Of course I can't edit a 4G file in any editor that I have installed,
so I have to work with the fact that there is a bit of text in there that
isn't quoted.

Which leads me to another question related to working with these csv
files.

Is there a way for me to tell the reader to skip the first 'n' rows?  Or,
for that matter, skip rows in the middle of the file?

A this point, I think it may be less painful for me to just skip those few
lines that have text.  I don't believe there will be any loss of accuracy.

But, since row is not really an index, how does one conditionally skip a
given set of row entries?

I started following the link to iterables but quickly got lost in the
terminology.

Best,


On Mon, Jul 15, 2019 at 3:03 PM Mats Wichmann <mats at wichmann.us> wrote:

> On 7/15/19 12:35 PM, Chip Wachob wrote:
> > Oscar and Mats,
> >
> > Thank you for your comments and taking time to look at the snips.
> >
> > Yes, I think I had commented that the avg+trigger was = triggervolts in
> > my original post.
> >
> > I did find that there was an intermediary process which I had forgotten
> > to comment out that was adversely affecting the data in one instance and
> > not the other.  So it WAS a case of becoming code blind.  But I didn't
> > give y'all all of the code so you would not have known that.  My
> apologies.
> >
> > Mats, I'd like to get a better handle on your suggestions about
> > improving the code.  Turns out, I've got another couple of 4GByte files
> > to sift through, and they are less 'friendly' when it comes to
> > determining the start and stop points.  So, I have to basically redo
> > about half of my code and I'd like to improve on my Python coding skills.
> >
> > Unfortunately, I have gaps in my coding time, and I end up forgetting
> > the details of a particular language, especially a new language to me,
> > Python.
> >
> > I'll admit that my 'C' background keeps me thinking as these data sets
> > as arrays.. in fact they are lists, eg:
> >
> > [
> > [t0, v0],
> > [t1, v1],
> > [t2, v2],
> > .
> > .
> > .
> > [tn, vn]
> > ]
> >
> > Time and volts are floats and need to be converted from the csv file
> > entries.
> >
> > I'm not sure that follow the "unpack" assignment in your example of:
> >
> > for row in TrigWind:
> >     time, voltage = row  # unpack
> >
> > I think I 'see' what is happening, but when I read up on unpacking, I
> > see that referring to using the * and ** when passing arguments to a
> > function...
>
> That's a different aspect of unpacking.  This one is sequnce unpacking,
> sometimes called tuple (or seqeucence) assignment.  In the official
> Python docs it is described in the latter part of this section:
>
> https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
>
>
> > I tried it anyhow, with this being an example of my source data:
> >
> > "Record Length",2000002,"Points",-0.005640001706,1.6363
> > "Sample Interval",5e-09,s,-0.005639996706,1.65291
> > "Trigger Point",1128000,"Samples",-0.005639991706,1.65291
> > "Trigger Time",0.341197,s,-0.005639986706,1.60309
> > ,,,-0.005639981706,1.60309
> > "Horizontal Offset",-0.00564,s,-0.005639976706,1.6363
> > ,,,-0.005639971706,1.65291
> > ,,,-0.005639966706,1.65291
> > ,,,-0.005639961706,1.6363
> > .
> > .
> > .
> >
> > Note that I want the items in the third and fourth column of the csv
> > file for my time and voltage.
> >
> > When I tried to use the unpack, they all came over as strings.  I can't
> > seem to convert them selectively..
>
> That's what the csv module does, unless you tell it not to. Maybe this
> will help:
>
> https://docs.python.org/3/library/csv.html#csv.reader
>
> There's an option to convert unquoted values to floats, and leave quoted
> values alone as strings, which would seem to match your data above quite
> well.
>
> > Desc1, Val1, Desc2, TimeVal, VoltVal = row
> >
> > TimeVal and VoltVal return type of str, which makes sense.
> >
> > Must I go through yet another iteration of scanning TimeVal and VoltVal
> > and converting them using float() by saving them to another array?
> >
> >
> > Thanks for your patience.
> >
> > Chip
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Jul 13, 2019 at 9:36 AM Mats Wichmann <mats at wichmann.us
> > <mailto:mats at wichmann.us>> wrote:
> >
> >     On 7/11/19 8:15 AM, Chip Wachob wrote:
> >
> >     kinda restating what Oscar said, he came to the same conclusions, I'm
> >     just being a lot more wordy:
> >
> >
> >     > So, here's where it gets interesting.  And, I'm presuming that
> >     someone out
> >     > there knows exactly what is going on and can help me get past this
> >     hurdle.
> >
> >     Well, each snippet has some "magic" variables (from our point of
> view,
> >     since we don't see where they are set up):
> >
> >     1: if(voltage > (avg + triglevel)
> >
> >     2: if((voltage > triggervolts)
> >
> >     since the value you're comparing voltage to gates when you decide
> >     there's a transition, and thus what gets added to the transition list
> >     you're building, and the list size comes out different, and you claim
> >     the data are the same, then guess where a process of elimination
> >     suggests the difference is coming from?
> >
> >     ===
> >
> >     Stylistic comment, I know this wasn't your question.
> >
> >     >         for row in range (len(TrigWind)):
> >
> >     Don't do this.  It's not a coding error giving you wrong results, but
> >     it's not efficient and makes for harder to read code.  You already
> have
> >     an iterable in TrigWind.  You then find the size of the iterable and
> use
> >     that size to generate a range object, which you then iterate over,
> >     producing index values which you use to index into the original
> >     iterable.  Why not skip all that?  Just do
> >
> >     for row in TrigWind:
> >
> >     now row is actually a row, as the variable name suggests, rather
> than an
> >     index you use to go retrieve the row.
> >
> >     Further, the "row" entries in TrigWind are lists (or tuples, or some
> >     other indexable iterable, we can't tell), which means you end up
> >     indexing into two things - into the "array" to get the row, then into
> >     the row to get the individual values. It's nicer if you unpack the
> rows
> >     into variables so they can have meaningful names - indeed you
> already do
> >     that with one of them. Lets you avoid code snips like  "x[7][1]"
> >
> >     Conceptually then, you can take this:
> >
> >     for row in range(len(Trigwind)):
> >         voltage = float(TrigWind[row][1])
> >         ...
> >             edgearray.append([float(TrigWind[row][0]),
> >     float(TrigWind[row][1])])
> >         ...
> >
> >     and change to this:
> >
> >     for row in TrigWind:
> >         time, voltage = row  # unpack
> >         ....
> >             edgearray.append([float)time, float(voltage)])
> >
> >     or even more compactly you can unpack directly at the top:
> >
> >     for time, voltage in TrigWind:
> >         ...
> >             edgearray.append([float)time, float(voltage)])
> >         ...
> >
> >     Now I left an issue to resolve with conversion - voltage is not
> >     converted before its use in the not-shown comparisons. Does it need
> to
> >     be? every usage of the values from the individual rows here uses them
> >     immediately after converting them to float.  It's usually better not
> to
> >     convert all over the place, and since the creation of TrigWind is
> under
> >     your own control, you should do that at the point the data enters the
> >     program - that is as TrigWind is created; then you just consume data
> >     from it in its intended form.  But if not, just convert voltage
> before
> >     using, as your original code does. You don't then need to convert
> >     voltage a second time in the list append statements.
> >
> >     for time, voltage in TrigWind:
> >         voltage = float(voltage)
> >         ...
> >             edgearray.append([float)time, voltage])
> >         ...
> >
> >
> >     _______________________________________________
> >     Tutor maillist  -  Tutor at python.org <mailto:Tutor at python.org>
> >     To unsubscribe or change subscription options:
> >     https://mail.python.org/mailman/listinfo/tutor
> >
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list