[Tutor] regex and parsing through a semi-csv file

Mina Nozar nozarm at triumf.ca
Tue Oct 25 00:40:18 CEST 2011


Hi Marc,

Thank you.  Following some of your suggestion, the rewrite below worked.  I agree with your point on readability over 
complexity.  By grace I meant not convoluted or simpler.  That's all.  As a beginner, I find not knowing all the 
existing functions, I end up re-inventing the wheel sometimes.


Cheers,
Mina
====

isotope_name,isotope_A = args.isotope.split('-')
print isotope_name, isotope_A

found_isotope = False
activity_time = []
activity = []
activity_err = []


f = open(args.fname, 'r')
lines = f.readlines()
f.close()

for i, line in enumerate(lines):
	line = line.strip()
	if isotope_name in line and isotope_A in line:
		found_isotope = True
		print 'found isotope'
		#print line
		lines = lines[i+1:]
		break
	
for line in lines:
	line = line.strip()
	if not line[0].isdigit():
		break
	print 'found'
	words = line.split(',')
	activity_time.append(float(words[0]))
	activity.append(float(words[1]))
	activity_err.append(float(words[2]))	
	
On 11-10-19 12:06 PM, Marc Tompkins wrote:
> On Wed, Oct 5, 2011 at 11:12 AM, Mina Nozar <nozarm at triumf.ca <mailto:nozarm at triumf.ca>> wrote:
>
>     Now, I would like to parse through this code and fill out 3 lists: 1) activity_time, 2) activity, 3) error, and plot
>     the activities as a function of time using matplotlip.  My question specifically is on how to parse through the
>     lines containing the data (activity time, activity, error) for a given isotope, stopping before reaching the next
>     isotope's info.
>
>
> Regular expressions certainly are terse, but (IMHO) they're really, really hard to debug and maintain; I find I have to
> get myself into a Zen state to even unpack them, and that just doesn't feel very Pythonic.
>
> Here's an approach I've used in similar situations (a file with arbitrary sequences of differently-formatted lines,
> where one line determines the "type" of the lines that follow):
> -  create a couple of status variables: currentElement, currentIsotope
> -  read each line and split it into a list, separating on the commas
> -  look at the first item on the line: is it an element?  (You could use a list of the 120 symbols, or you could just
> check to see if it's alphabetic...)
>    -  if the first item is an element, then set currentElement and currentIsotope, move on to next line.
> -  if the first item is NOT an element, then this is a data line.
>    -  if currentElement and currentIsotope match what the user asked for,
>       -  add time, activity, and error to the appropriate lists
>    - if not, move on.
>
> This approach also works in the event that the data wasn't all collected in order - i.e. there might be data for Ag111
> followed by U235 followed by Ag111 again.
>
>     Note that the size of the lists will change depending on the number of activities for a given run of the simulation
>     so I don't want to hard code '13' as the number of lines to read in followed by the line containing isotope_name, etc.
>
>
> This should work for any number of lines or size of file, as long as the data lines are all formatted as you expect.
> Obviously a bit of error-trapping would be a good thing....
>
>     If there is a more graceful way of doing this, please let me know as well.  I am new to python...
>
> For me, readability and maintainability trump "grace" every time.  Nobody's handing out awards for elegance (outside of
> the classroom), but complexity gets punished (with bugs and wasted time.)  More elegant solutions might also run faster,
> but remember that premature optimization is a Bad Thing.
>
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list