File Data Extraction Approach

Thu Nov 22 21:33:55 EST 2001

"Jim St.Cyr" wrote:
> 
> I have a multiline file of the format:
> 
> : Some Name = data : Another Name = data : etc = data:
> 
> Each line consists of 20 tagnames and associated data.  Some of the tagnames
> have spaces in them though most don't.  There is a space on each side of the
> equal sign and on each side of the colon which acts as a field seperator.  I
> only need the data associated with 7 out of the 20 tagnames.
> 
> I was thinking about removing the whitespace from the line and then seeking
> the tagnames that I am interested in.  This strikes me as sort of brute
> force and I would like some help in formulating an approach that is a bit
> more elegant.

Just write code that reads *all* the tags and data from a line, then
ignore the ones you don't want.  Do the work once the data is in a
more tractable form (i.e. in memory) rather than trying to dissect
the file while it's still a file.

Slightly ugly code which makes a dictionary from each line and prints
it to stdout:

>>> f = open('file')
>>> import string
>>> while line in f.readlines():
...   d = {}
...   for field in line.split(':'):
...     if field.strip():
...       tag, data = map(string.strip, field.split('='))
...       d[tag] = data
...   print 'Result %s' % d

-- 
----------------------
Peter Hansen, P.Eng.
peter at engcorp.com