* 'struct-like' list *
Bengt Richter
bokr at oz.net
Tue Feb 7 13:10:05 EST 2006
On 6 Feb 2006 09:03:09 -0800, "Ernesto" <erniedude at gmail.com> wrote:
>I'm still fairly new to python, so I need some guidance here...
>
>I have a text file with lots of data. I only need some of the data. I
>want to put the useful data into an [array of] struct-like
>mechanism(s). The text file looks something like this:
>
>[BUNCH OF NOT-USEFUL DATA....]
>
>Name: David
>Age: 108 Birthday: 061095 SocialSecurity: 476892771999
>
>[MORE USELESS DATA....]
>
>Name........
Does the useful data always come in fixed-format pairs of lines as in your example?
If so, you could just iterate through the lines of your text file as in example at end [1]
>
>I would like to have an array of "structs." Each struct has
>
>struct Person{
> string Name;
> int Age;
> int Birhtday;
> int SS;
>}
You don't normally want to do real structs in python. You probably want to define
a class to contain the data, e.g., class Person in example at end [1]
>
>I want to go through the file, filling up my list of structs.
>
>My problems are:
>
>1. How to search for the keywords "Name:", "Age:", etc. in the file...
>2. How to implement some organized "list of lists" for the data
>structure.
>
It may be very easy, if the format is fixed and space-separated and line-paired
as in your example data, but you will have to tell us more if not.
[1] exmaple:
----< ernesto.py >---------------------------------------------------------
class Person(object):
def __init__(self, name):
self.name = name
def __repr__(self): return 'Person(%r)'%self.name
def extract_info(lineseq):
lineiter = iter(lineseq) # normalize access to lines
personlist = []
for line in lineiter:
substrings = line.split()
if substrings and isinstance(substrings, list) and substrings[0] == 'Name:':
try:
name = ' '.join(substrings[1:]) # allow for names with spaces
line = lineiter.next()
age_hdr, age, bd_hdr, bd, ss_hdr, ss = line.split()
assert age_hdr=='Age:' and bd_hdr=='Birthday:' and ss_hdr=='SocialSecurity:', \
'Bad second line after "Name: %s" line:\n %r'%(name, line)
person = Person(name)
person.age = int(age); person.bd = int(bd); person.ss=int(ss)
personlist.append(person)
except Exception,e:
print '%s: %s'%(e.__class__.__name__, e)
return personlist
def test():
lines = """\
[BUNCH OF NOT-USEFUL DATA....]
Name: David
Age: 108 Birthday: 061095 SocialSecurity: 476892771999
[MORE USELESS DATA....]
Name: Ernesto
Age: 25 Birthday: 040181 SocialSecurity: 123456789
Name: Ernesto
Age: 44 Brithdy: 040106 SocialSecurity: 123456789
Name........
"""
persondata = extract_info(lines.splitlines())
print persondata
ssdict = {}
for person in persondata:
if person.ss in ssdict:
print 'Rejecting %r with duplicate ss %s'%(person, person.ss)
else:
ssdict[person.ss] = person
print 'ssdict keys: %s'%ssdict.keys()
for ss, pers in sorted(ssdict.items(), key=lambda item:item[1].name): #sorted by name
print 'Name: %s Age: %s SS: %s' % (pers.name, pers.age, pers.ss)
if __name__ == '__main__': test()
---------------------------------------------------------------------------
this produces output:
[10:07] C:\pywk\clp>py24 ernesto.py
AssertionError: Bad second line after "Name: Ernesto" line:
'Age: 44 Brithdy: 040106 SocialSecurity: 123456789'
[Person('David'), Person('Ernesto')]
ssdict keys: [123456789, 476892771999L]
Name: David Age: 108 SS: 476892771999
Name: Ernesto Age: 25 SS: 123456789
if you want to try this on a file, (we'll use the source itself here
since it includes valid example data lines), do something like:
>>> import ernesto
>>> info = ernesto.extract_info(open('ernesto.py'))
AssertionError: Bad second line after "Name: Ernesto" line:
'Age: 44 Brithdy: 040106 SocialSecurity: 123456789\n'
>>> info
[Person('David'), Person('Ernesto')]
tweak to taste ;-)
Regards,
Bengt Richter
More information about the Python-list
mailing list