[Tutor] regex and parsing through a semi-csv file
Mina Nozar
nozarm at triumf.ca
Tue Oct 25 00:40:18 CEST 2011
Hi Marc,
Thank you. Following some of your suggestion, the rewrite below worked. I agree with your point on readability over
complexity. By grace I meant not convoluted or simpler. That's all. As a beginner, I find not knowing all the
existing functions, I end up re-inventing the wheel sometimes.
Cheers,
Mina
====
isotope_name,isotope_A = args.isotope.split('-')
print isotope_name, isotope_A
found_isotope = False
activity_time = []
activity = []
activity_err = []
f = open(args.fname, 'r')
lines = f.readlines()
f.close()
for i, line in enumerate(lines):
line = line.strip()
if isotope_name in line and isotope_A in line:
found_isotope = True
print 'found isotope'
#print line
lines = lines[i+1:]
break
for line in lines:
line = line.strip()
if not line[0].isdigit():
break
print 'found'
words = line.split(',')
activity_time.append(float(words[0]))
activity.append(float(words[1]))
activity_err.append(float(words[2]))
On 11-10-19 12:06 PM, Marc Tompkins wrote:
> On Wed, Oct 5, 2011 at 11:12 AM, Mina Nozar <nozarm at triumf.ca <mailto:nozarm at triumf.ca>> wrote:
>
> Now, I would like to parse through this code and fill out 3 lists: 1) activity_time, 2) activity, 3) error, and plot
> the activities as a function of time using matplotlip. My question specifically is on how to parse through the
> lines containing the data (activity time, activity, error) for a given isotope, stopping before reaching the next
> isotope's info.
>
>
> Regular expressions certainly are terse, but (IMHO) they're really, really hard to debug and maintain; I find I have to
> get myself into a Zen state to even unpack them, and that just doesn't feel very Pythonic.
>
> Here's an approach I've used in similar situations (a file with arbitrary sequences of differently-formatted lines,
> where one line determines the "type" of the lines that follow):
> - create a couple of status variables: currentElement, currentIsotope
> - read each line and split it into a list, separating on the commas
> - look at the first item on the line: is it an element? (You could use a list of the 120 symbols, or you could just
> check to see if it's alphabetic...)
> - if the first item is an element, then set currentElement and currentIsotope, move on to next line.
> - if the first item is NOT an element, then this is a data line.
> - if currentElement and currentIsotope match what the user asked for,
> - add time, activity, and error to the appropriate lists
> - if not, move on.
>
> This approach also works in the event that the data wasn't all collected in order - i.e. there might be data for Ag111
> followed by U235 followed by Ag111 again.
>
> Note that the size of the lists will change depending on the number of activities for a given run of the simulation
> so I don't want to hard code '13' as the number of lines to read in followed by the line containing isotope_name, etc.
>
>
> This should work for any number of lines or size of file, as long as the data lines are all formatted as you expect.
> Obviously a bit of error-trapping would be a good thing....
>
> If there is a more graceful way of doing this, please let me know as well. I am new to python...
>
> For me, readability and maintainability trump "grace" every time. Nobody's handing out awards for elegance (outside of
> the classroom), but complexity gets punished (with bugs and wasted time.) More elegant solutions might also run faster,
> but remember that premature optimization is a Bad Thing.
>
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list