Multiline regex help

Thu Mar 3 07:14:50 EST 2005

Yatima wrote:
> Hey Folks,
> 
> I've got some info in a bunch of files that kind of looks like so:
> 
> Gibberish
> 53
> MoreGarbage
> 12
> RelevantInfo1
> 10/10/04
> NothingImportant
> ThisDoesNotMatter
> 44
> RelevantInfo2
> 22
> BlahBlah
> 343
> RelevantInfo3
> 23
> Hubris
> Crap
> 34
> 
> and so on...
> 
> Anyhow, these "fields" repeat several times in a given file (number of
> repetitions varies from file to file). The number on the line following the
> "RelevantInfo" lines is really what I'm after. Ideally, I would like to have
> something like so:
> 
> RelevantInfo1 = 10/10/04 # The variable name isn't actually important
> RelevantInfo3 = 23       # it's just there to illustrate what info I'm
>                          # trying to snag.

Here is a way to create a list of [RelevantInfo, value] pairs:
import cStringIO

raw_data = '''Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34'''
raw_data = cStringIO.StringIO(raw_data)

data = []
for line in raw_data:
     if line.startswith('RelevantInfo'):
         key = line.strip()
         value = raw_data.next().strip()
         data.append([key, value])

print data

> 
> Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2

I'm not sure what you mean by this. Do you want to build a Score dictionary as well?

Kent

> 
> Collected from all of the files.
> 
> So, there would be several of these "scores" per file and there are a bunch
> of files. Ultimately, I am interested in printing them out as a csv file but
> that should be relatively easy once they are trapped in my array of doom
> <cue evil laughter>.
> 
> I've got a fairly ugly "solution" (I am using this term *very* loosely)
> using awk and his faithfail companion sed, but I would prefer something in
> python.
> 
> Thanks for your time.
>