[Tutor] Reading .csv data vs. reading an array
Chip Wachob
wachobc at gmail.com
Mon Jul 15 15:59:01 EDT 2019
Mats,
Thank you!
So I included the QUOTE_NONNUMERIC to my csv.reader() call and it almost
worked.
Now, how wonderful that the scope's csv file simply wrote an s for seconds
and didn't include quotes. Now Python tells me it can't create a float of
s. Of course I can't edit a 4G file in any editor that I have installed,
so I have to work with the fact that there is a bit of text in there that
isn't quoted.
Which leads me to another question related to working with these csv
files.
Is there a way for me to tell the reader to skip the first 'n' rows? Or,
for that matter, skip rows in the middle of the file?
A this point, I think it may be less painful for me to just skip those few
lines that have text. I don't believe there will be any loss of accuracy.
But, since row is not really an index, how does one conditionally skip a
given set of row entries?
I started following the link to iterables but quickly got lost in the
terminology.
Best,
On Mon, Jul 15, 2019 at 3:03 PM Mats Wichmann <mats at wichmann.us> wrote:
> On 7/15/19 12:35 PM, Chip Wachob wrote:
> > Oscar and Mats,
> >
> > Thank you for your comments and taking time to look at the snips.
> >
> > Yes, I think I had commented that the avg+trigger was = triggervolts in
> > my original post.
> >
> > I did find that there was an intermediary process which I had forgotten
> > to comment out that was adversely affecting the data in one instance and
> > not the other. So it WAS a case of becoming code blind. But I didn't
> > give y'all all of the code so you would not have known that. My
> apologies.
> >
> > Mats, I'd like to get a better handle on your suggestions about
> > improving the code. Turns out, I've got another couple of 4GByte files
> > to sift through, and they are less 'friendly' when it comes to
> > determining the start and stop points. So, I have to basically redo
> > about half of my code and I'd like to improve on my Python coding skills.
> >
> > Unfortunately, I have gaps in my coding time, and I end up forgetting
> > the details of a particular language, especially a new language to me,
> > Python.
> >
> > I'll admit that my 'C' background keeps me thinking as these data sets
> > as arrays.. in fact they are lists, eg:
> >
> > [
> > [t0, v0],
> > [t1, v1],
> > [t2, v2],
> > .
> > .
> > .
> > [tn, vn]
> > ]
> >
> > Time and volts are floats and need to be converted from the csv file
> > entries.
> >
> > I'm not sure that follow the "unpack" assignment in your example of:
> >
> > for row in TrigWind:
> > time, voltage = row # unpack
> >
> > I think I 'see' what is happening, but when I read up on unpacking, I
> > see that referring to using the * and ** when passing arguments to a
> > function...
>
> That's a different aspect of unpacking. This one is sequnce unpacking,
> sometimes called tuple (or seqeucence) assignment. In the official
> Python docs it is described in the latter part of this section:
>
> https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
>
>
> > I tried it anyhow, with this being an example of my source data:
> >
> > "Record Length",2000002,"Points",-0.005640001706,1.6363
> > "Sample Interval",5e-09,s,-0.005639996706,1.65291
> > "Trigger Point",1128000,"Samples",-0.005639991706,1.65291
> > "Trigger Time",0.341197,s,-0.005639986706,1.60309
> > ,,,-0.005639981706,1.60309
> > "Horizontal Offset",-0.00564,s,-0.005639976706,1.6363
> > ,,,-0.005639971706,1.65291
> > ,,,-0.005639966706,1.65291
> > ,,,-0.005639961706,1.6363
> > .
> > .
> > .
> >
> > Note that I want the items in the third and fourth column of the csv
> > file for my time and voltage.
> >
> > When I tried to use the unpack, they all came over as strings. I can't
> > seem to convert them selectively..
>
> That's what the csv module does, unless you tell it not to. Maybe this
> will help:
>
> https://docs.python.org/3/library/csv.html#csv.reader
>
> There's an option to convert unquoted values to floats, and leave quoted
> values alone as strings, which would seem to match your data above quite
> well.
>
> > Desc1, Val1, Desc2, TimeVal, VoltVal = row
> >
> > TimeVal and VoltVal return type of str, which makes sense.
> >
> > Must I go through yet another iteration of scanning TimeVal and VoltVal
> > and converting them using float() by saving them to another array?
> >
> >
> > Thanks for your patience.
> >
> > Chip
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Jul 13, 2019 at 9:36 AM Mats Wichmann <mats at wichmann.us
> > <mailto:mats at wichmann.us>> wrote:
> >
> > On 7/11/19 8:15 AM, Chip Wachob wrote:
> >
> > kinda restating what Oscar said, he came to the same conclusions, I'm
> > just being a lot more wordy:
> >
> >
> > > So, here's where it gets interesting. And, I'm presuming that
> > someone out
> > > there knows exactly what is going on and can help me get past this
> > hurdle.
> >
> > Well, each snippet has some "magic" variables (from our point of
> view,
> > since we don't see where they are set up):
> >
> > 1: if(voltage > (avg + triglevel)
> >
> > 2: if((voltage > triggervolts)
> >
> > since the value you're comparing voltage to gates when you decide
> > there's a transition, and thus what gets added to the transition list
> > you're building, and the list size comes out different, and you claim
> > the data are the same, then guess where a process of elimination
> > suggests the difference is coming from?
> >
> > ===
> >
> > Stylistic comment, I know this wasn't your question.
> >
> > > for row in range (len(TrigWind)):
> >
> > Don't do this. It's not a coding error giving you wrong results, but
> > it's not efficient and makes for harder to read code. You already
> have
> > an iterable in TrigWind. You then find the size of the iterable and
> use
> > that size to generate a range object, which you then iterate over,
> > producing index values which you use to index into the original
> > iterable. Why not skip all that? Just do
> >
> > for row in TrigWind:
> >
> > now row is actually a row, as the variable name suggests, rather
> than an
> > index you use to go retrieve the row.
> >
> > Further, the "row" entries in TrigWind are lists (or tuples, or some
> > other indexable iterable, we can't tell), which means you end up
> > indexing into two things - into the "array" to get the row, then into
> > the row to get the individual values. It's nicer if you unpack the
> rows
> > into variables so they can have meaningful names - indeed you
> already do
> > that with one of them. Lets you avoid code snips like "x[7][1]"
> >
> > Conceptually then, you can take this:
> >
> > for row in range(len(Trigwind)):
> > voltage = float(TrigWind[row][1])
> > ...
> > edgearray.append([float(TrigWind[row][0]),
> > float(TrigWind[row][1])])
> > ...
> >
> > and change to this:
> >
> > for row in TrigWind:
> > time, voltage = row # unpack
> > ....
> > edgearray.append([float)time, float(voltage)])
> >
> > or even more compactly you can unpack directly at the top:
> >
> > for time, voltage in TrigWind:
> > ...
> > edgearray.append([float)time, float(voltage)])
> > ...
> >
> > Now I left an issue to resolve with conversion - voltage is not
> > converted before its use in the not-shown comparisons. Does it need
> to
> > be? every usage of the values from the individual rows here uses them
> > immediately after converting them to float. It's usually better not
> to
> > convert all over the place, and since the creation of TrigWind is
> under
> > your own control, you should do that at the point the data enters the
> > program - that is as TrigWind is created; then you just consume data
> > from it in its intended form. But if not, just convert voltage
> before
> > using, as your original code does. You don't then need to convert
> > voltage a second time in the list append statements.
> >
> > for time, voltage in TrigWind:
> > voltage = float(voltage)
> > ...
> > edgearray.append([float)time, voltage])
> > ...
> >
> >
> > _______________________________________________
> > Tutor maillist - Tutor at python.org <mailto:Tutor at python.org>
> > To unsubscribe or change subscription options:
> > https://mail.python.org/mailman/listinfo/tutor
> >
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list