* 'struct-like' list *

Raymond Hettinger python at rcn.com
Mon Feb 6 21:39:34 EST 2006


[Ernesto]
> I'm still fairly new to python, so I need some guidance here...
>
> I have a text file with lots of data.  I only need some of the data.  I
> want to put the useful data into an [array of] struct-like
> mechanism(s).  The text file looks something like this:
>
> [BUNCH OF NOT-USEFUL DATA....]
>
> Name:  David
> Age: 108   Birthday: 061095   SocialSecurity: 476892771999
>
> [MORE USELESS DATA....]
>
> Name........
>
> I would like to have an array of "structs."  Each struct has
>
> struct Person{
>     string Name;
>     int Age;
>     int Birhtday;
>     int SS;
> }
>
> I want to go through the file, filling up my list of structs.
>
> My problems are:
>
> 1.  How to search for the keywords "Name:", "Age:", etc. in the file...
> 2.  How to implement some organized "list of lists" for the data
> structure.

Since you're just starting out in Python, this problem presents an
excellent opportunity to learn Python's two basic approaches to text
parsing.

The first approach involves looping over the input lines, searching for
key phrases, and extracting them using string slicing and using
str.strip() to trim irregular length input fields.  The start/stop
logic is governed by the first and last key phrases and the results get
accumulated in a list.  This approach is easy to program, maintain, and
explain to others:

# Approach suitable for inputs with fixed input positions
result = []
for line in inputData.splitlines():
    if line.startswith('Name:'):
        name = line[7:].strip()
    elif line.startswith('Age:'):
        age = line[5:8].strip()
        bd = line[20:26]
        ssn = line[45:54]
        result.append((name, age, bd, ssn))
print result

The second approach uses regular expressions.  The pattern is to search
for a key phrase, skip over whitespace, and grab the data field in
parenthesized group.  Unlike slicing, this approach is tolerant of
loosely formatted data where the target fields do not always appear in
the same column position.  The trade-off is having less flexibility in
parsing logic (i.e. the target fields must arrive in a fixed order):

# Approach for more loosely formatted inputs
import re
pattern = '''(?x)
    Name:\s+(\w+)\s+
    Age:\s+(\d+)\s+
    Birthday:\s+(\d+)\s+
    SocialSecurity:\s+(\d+)
'''
print re.findall(pattern, inputData)

Other respondants have suggested the third-party PyParsing module which
provides a powerful general-purpose toolset for text parsing; however,
it is always worth mastering Python basics before moving on to special
purpose tools.  The above code fragements are easy to construct and not
hard to explain to others.  Maintenance is a breeze.


Raymond


P.S.  Once you've formed a list of tuples, it is trivial to create
Person objects for your pascal-like structure:

class Person(object):
    def __init__(self, (name, age, bd, ssn)):
        self.name=name; self.age=age; self.bd=bd; self.ssn=ssn

personlist = map(Person, result)
for p in personlist:
    print p.name, p.age, p.bd, p.ssn




More information about the Python-list mailing list