Multiline regex help

Kent Johnson kent37 at tds.net
Thu Mar 3 16:25:39 EST 2005


Here is another attempt. I'm still not sure I understand what form you want the data in. I made a 
dict -> dict -> list structure so if you lookup e.g. scores['10/11/04']['60'] you get a list of all 
the RelevantInfo2 values for Relevant1='10/11/04' and Relevant2='60'.

The parser is a simple-minded state machine that will misbehave if the input does not have entries 
in the order Relevant1, Relevant2, Relevant3 (with as many intervening lines as you like).

All three values are available when Relevant3 is detected so you could do something else with them 
if you want.

HTH
Kent

import cStringIO

raw_data = '''Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34

Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34

SecondSetofGarbage
2423
YouGetThePicture
342342
RelevantInfo1
10/10/04
HoHum
343
MoreStuffNotNeeded
232
RelevantInfo2
33
RelevantInfo3
44
sdfsdf
RelevantInfo1
10/11/04
InsertBoringFillerHere
43234
Stuff
MoreStuff
RelevantInfo2
45
ExcitingIsntIt
324234
RelevantInfo3
60
Lalala'''
raw_data = cStringIO.StringIO(raw_data)

scores = {}
info1 = info2 = info3 = None

for line in raw_data:
     if line.startswith('RelevantInfo1'):
         info1 = raw_data.next().strip()
     elif line.startswith('RelevantInfo2'):
         info2 = raw_data.next().strip()
     elif line.startswith('RelevantInfo3'):
         info3 = raw_data.next().strip()
         scores.setdefault(info1, {}).setdefault(info3, []).append(info2)
         info1 = info2 = info3 = None

print scores
print scores['10/11/04']['60']
print scores['10/10/04']['23']

## prints:
{'10/10/04': {'44': ['33'], '23': ['22', '22']}, '10/11/04': {'60': ['45']}}
['45']
['22', '22']



More information about the Python-list mailing list