[Tutor] regex and parsing through a semi-csv file

Marc Tompkins marc.tompkins at gmail.com
Wed Oct 19 21:06:00 CEST 2011


On Wed, Oct 5, 2011 at 11:12 AM, Mina Nozar <nozarm at triumf.ca> wrote:

> Now, I would like to parse through this code and fill out 3 lists: 1)
> activity_time, 2) activity, 3) error, and plot the activities as a function
> of time using matplotlip.  My question specifically is on how to parse
> through the lines containing the data (activity time, activity, error) for a
> given isotope, stopping before reaching the next isotope's info.


Regular expressions certainly are terse, but (IMHO) they're really, really
hard to debug and maintain; I find I have to get myself into a Zen state to
even unpack them, and that just doesn't feel very Pythonic.

Here's an approach I've used in similar situations (a file with arbitrary
sequences of differently-formatted lines, where one line determines the
"type" of the lines that follow):
-  create a couple of status variables: currentElement, currentIsotope
-  read each line and split it into a list, separating on the commas
-  look at the first item on the line: is it an element?  (You could use a
list of the 120 symbols, or you could just check to see if it's
alphabetic...)
  -  if the first item is an element, then set currentElement and
currentIsotope, move on to next line.
-  if the first item is NOT an element, then this is a data line.
  -  if currentElement and currentIsotope match what the user asked for,
     -  add time, activity, and error to the appropriate lists
  - if not, move on.

This approach also works in the event that the data wasn't all collected in
order - i.e. there might be data for Ag111 followed by U235 followed by
Ag111 again.

Note that the size of the lists will change depending on the number of
> activities for a given run of the simulation so I don't want to hard code
> '13' as the number of lines to read in followed by the line containing
> isotope_name, etc.
>

This should work for any number of lines or size of file, as long as the
data lines are all formatted as you expect.  Obviously a bit of
error-trapping would be a good thing....

If there is a more graceful way of doing this, please let me know as well.
>  I am new to python...
>
> For me, readability and maintainability trump "grace" every time.  Nobody's
handing out awards for elegance (outside of the classroom), but complexity
gets punished (with bugs and wasted time.)  More elegant solutions might
also run faster, but remember that premature optimization is a Bad Thing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111019/e1c87463/attachment.html>


More information about the Tutor mailing list