extract certain values from file with re

Fri Oct 6 13:01:50 EDT 2006

bearophileHUGS at lycos.com wrote:
   <a fine solution component for the second problem>

Use his solution like:
     datafile = open(data_file_name, 'r')
     for line in datafile:
         if 'U-Mom' in line:
             print float(line.split("|")[4])
     datafile.close()

For the earlier problem:

     def data_specific(source):
         global headings   # in case some other bit wants to read them
         saw_top = False
         gen = iter(source)
         for line in gen:
             cut = line.split(None, 1)
             if len(cut) > 1 and (cut[0] == 'ITER'
                      and 'GLOBAL ABSOLUTE RESIDUAL' in cut[1]):
             break
         else:
             return
         headings = gen.next().split()  # column headings
         starts = range(11, 74, 9) + range(75, 138, 9)  # for fixed-width
         for line in gen:
             data = line.split()
             if data and data != ['...']:  # suppress blank lines
                 if data[0] == '&&&&&&':  # found the terminator
                     break
                 assert line[10] == ' ' and line[74] == ' '
                 yield [int(line[:10])] + [
                        float(line[n : n+9]) for n in starts]

     datafile = open(data_file_name, 'r')
     for row in data_specific(datafile):
         print row  # or row[headings.index('MASS')] or whatever
     datafile.close()

The general theme here is: don't use re unless it is a good solution.
sometimes you know which columns things are in, sometimes you know a
separator, sometimes there is a mix, and sometimes you do need a regular
expression.  Save re for when you need to do pattern matching.

--Scott David Daniels
scott.daniels at acm.org