* 'struct-like' list *

Bengt Richter bokr at oz.net
Tue Feb 7 13:10:05 EST 2006


On 6 Feb 2006 09:03:09 -0800, "Ernesto" <erniedude at gmail.com> wrote:

>I'm still fairly new to python, so I need some guidance here...
>
>I have a text file with lots of data.  I only need some of the data.  I
>want to put the useful data into an [array of] struct-like
>mechanism(s).  The text file looks something like this:
>
>[BUNCH OF NOT-USEFUL DATA....]
>
>Name:  David
>Age: 108   Birthday: 061095   SocialSecurity: 476892771999
>
>[MORE USELESS DATA....]
>
>Name........

Does the useful data always come in fixed-format pairs of lines as in your example?
If so, you could just iterate through the lines of your text file as in example at end [1]

>
>I would like to have an array of "structs."  Each struct has
>
>struct Person{
>    string Name;
>    int Age;
>    int Birhtday;
>    int SS;
>}
You don't normally want to do real structs in python. You probably want to define
a class to contain the data, e.g., class Person in example at end [1]

>
>I want to go through the file, filling up my list of structs.
>
>My problems are:
>
>1.  How to search for the keywords "Name:", "Age:", etc. in the file...
>2.  How to implement some organized "list of lists" for the data
>structure.
>
It may be very easy, if the format is fixed and space-separated and line-paired
as in your example data, but you will have to tell us more if not.

[1] exmaple:

----< ernesto.py >---------------------------------------------------------
class Person(object):
    def __init__(self, name):
        self.name = name
    def __repr__(self): return 'Person(%r)'%self.name

def extract_info(lineseq):
    lineiter = iter(lineseq) # normalize access to lines
    personlist = []
    for line in lineiter:
        substrings = line.split()
        if substrings and isinstance(substrings, list) and substrings[0] == 'Name:':
            try:
                name = ' '.join(substrings[1:]) # allow for names with spaces
                line = lineiter.next()
                age_hdr, age, bd_hdr, bd, ss_hdr, ss = line.split()
                assert age_hdr=='Age:' and bd_hdr=='Birthday:' and ss_hdr=='SocialSecurity:', \
                        'Bad second line after "Name: %s" line:\n    %r'%(name, line)
                person = Person(name)
                person.age = int(age); person.bd = int(bd); person.ss=int(ss)
                personlist.append(person)             
            except Exception,e:
                print '%s: %s'%(e.__class__.__name__, e)
    return personlist

def test():
    lines = """\
[BUNCH OF NOT-USEFUL DATA....]

Name:  David
Age: 108   Birthday: 061095   SocialSecurity: 476892771999

[MORE USELESS DATA....]

Name: Ernesto
Age: 25 Birthday: 040181  SocialSecurity: 123456789

Name: Ernesto
Age: 44 Brithdy: 040106 SocialSecurity: 123456789

Name........
"""
    persondata = extract_info(lines.splitlines())
    print persondata
    ssdict = {}
    for person in persondata:
        if person.ss in ssdict:
            print 'Rejecting %r with duplicate ss %s'%(person, person.ss)
        else:
            ssdict[person.ss] = person
    print 'ssdict keys: %s'%ssdict.keys()
    for ss, pers in sorted(ssdict.items(), key=lambda item:item[1].name): #sorted by name
        print 'Name: %s Age: %s SS: %s' % (pers.name, pers.age, pers.ss)

if __name__ == '__main__': test()
---------------------------------------------------------------------------

this produces output:

[10:07] C:\pywk\clp>py24 ernesto.py
AssertionError: Bad second line after "Name: Ernesto" line:
    'Age: 44 Brithdy: 040106 SocialSecurity: 123456789'
[Person('David'), Person('Ernesto')]
ssdict keys: [123456789, 476892771999L]
Name: David Age: 108 SS: 476892771999
Name: Ernesto Age: 25 SS: 123456789

if you want to try this on a file, (we'll use the source itself here
since it includes valid example data lines), do something like:

 >>> import ernesto
 >>> info = ernesto.extract_info(open('ernesto.py'))
 AssertionError: Bad second line after "Name: Ernesto" line:
     'Age: 44 Brithdy: 040106 SocialSecurity: 123456789\n'
 >>> info
 [Person('David'), Person('Ernesto')]

tweak to taste ;-)

Regards,
Bengt Richter



More information about the Python-list mailing list