* 'struct-like' list *

Paul McGuire ptmcg at austin.rr._bogus_.com
Mon Feb 6 16:58:37 EST 2006


"Ernesto" <erniedude at gmail.com> wrote in message
news:1139245389.529742.317110 at g43g2000cwa.googlegroups.com...
> I'm still fairly new to python, so I need some guidance here...
>
> I have a text file with lots of data.  I only need some of the data.  I
> want to put the useful data into an [array of] struct-like
> mechanism(s).  The text file looks something like this:
>
> [BUNCH OF NOT-USEFUL DATA....]
>
> Name:  David
> Age: 108   Birthday: 061095   SocialSecurity: 476892771999
>
> [MORE USELESS DATA....]
>
> Name........
>
> I would like to have an array of "structs."  Each struct has
>
> struct Person{
>     string Name;
>     int Age;
>     int Birhtday;
>     int SS;
> }
>
> I want to go through the file, filling up my list of structs.
>
> My problems are:
>
> 1.  How to search for the keywords "Name:", "Age:", etc. in the file...
> 2.  How to implement some organized "list of lists" for the data
> structure.
>
> Any help is much appreciated.
>
Ernesto -

Since you are searching for keywords and matching fields, and trying to
populate data structures as you go, this sounds like a good fit for
pyparsing.  Pyparsing as built-in features for scanning through text and
extracting data, with suitably named data fields for accessing later.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

------------------------------------------------
from pyparsing import *

inputData = """[BUNCH OF NOT-USEFUL DATA....]

Name:  David
Age: 108   Birthday: 061095   SocialSecurity: 476892771999

[MORE USELESS DATA....]

Name:  Fred
Age: 101   Birthday: 061065   SocialSecurity: 587903882000

[MORE USELESS DATA....]

Name:  Barney
Age: 99   Birthday: 061265   SocialSecurity: 698014993111

[MORE USELESS DATA....]

"""

dob = Word(nums,exact=6)
# this matches your sample data, but I think SSN's are only 9 digits long
socsecnum = Word(nums,exact=12)

# define the personalData pattern - use results names to associate
# field names with matched tokens, can then access data as if they were
# attributes on an object
personalData = ( "Name:" + empty + restOfLine.setResultsName("Name") +
                 "Age:" + Word(nums).setResultsName("Age") +
                 "Birthday:" + dob.setResultsName("Birthday") +
                 "SocialSecurity:"  + socsecnum.setResultsName("SS") )

# use personData.scanString to scan through the input, returning the
matching
# tokens, and their respective start/end locations in the string
for person,s,e in personalData.scanString(inputData):
    print "Name:", person.Name
    print "Age:",  person.Age
    print "DOB:",  person.Birthday
    print "SSN:",  person.SS
    print

# or use a list comp to scan the whole file, and return your Person data,
giving you
# your requested array of "structs" - not really structs, but ParseResults
objects
persons = [person for person,s,e in personalData.scanString(inputData)]

# or convert to Python dict's, which some people prefer to pyparsing's
ParseResults
persons = [dict(p) for p,s,e in personalData.scanString(inputData)]
print persons[0]
print

# or create an array of Person objects, as suggested in previous postings
class Person(object):
    def __init__(self,parseResults):
        self.__dict__.update(dict(parseResults))

    def __str__(self):
        return "Person(%s, %s, %s, %s)" %
(self.Name,self.Age,self.Birthday,self.SS)

persons = [Person(p) for p,s,e in personalData.scanString(inputData)]
for p in persons:
    print p.Name,"->",p

--------------------------------------
prints out:
Name: David
Age: 108
DOB: 061095
SSN: 476892771999

Name: Fred
Age: 101
DOB: 061065
SSN: 587903882000

Name: Barney
Age: 99
DOB: 061265
SSN: 698014993111

{'SS': '476892771999', 'Age': '108', 'Birthday': '061095', 'Name': 'David'}

David -> Person(David, 108, 061095, 476892771999)
Fred -> Person(Fred, 101, 061065, 587903882000)
Barney -> Person(Barney, 99, 061265, 698014993111)






More information about the Python-list mailing list