Multiline regex help
Kent Johnson
kent37 at tds.net
Thu Mar 3 07:14:50 EST 2005
Yatima wrote:
> Hey Folks,
>
> I've got some info in a bunch of files that kind of looks like so:
>
> Gibberish
> 53
> MoreGarbage
> 12
> RelevantInfo1
> 10/10/04
> NothingImportant
> ThisDoesNotMatter
> 44
> RelevantInfo2
> 22
> BlahBlah
> 343
> RelevantInfo3
> 23
> Hubris
> Crap
> 34
>
> and so on...
>
> Anyhow, these "fields" repeat several times in a given file (number of
> repetitions varies from file to file). The number on the line following the
> "RelevantInfo" lines is really what I'm after. Ideally, I would like to have
> something like so:
>
> RelevantInfo1 = 10/10/04 # The variable name isn't actually important
> RelevantInfo3 = 23 # it's just there to illustrate what info I'm
> # trying to snag.
Here is a way to create a list of [RelevantInfo, value] pairs:
import cStringIO
raw_data = '''Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34'''
raw_data = cStringIO.StringIO(raw_data)
data = []
for line in raw_data:
if line.startswith('RelevantInfo'):
key = line.strip()
value = raw_data.next().strip()
data.append([key, value])
print data
>
> Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
I'm not sure what you mean by this. Do you want to build a Score dictionary as well?
Kent
>
> Collected from all of the files.
>
> So, there would be several of these "scores" per file and there are a bunch
> of files. Ultimately, I am interested in printing them out as a csv file but
> that should be relatively easy once they are trapped in my array of doom
> <cue evil laughter>.
>
> I've got a fairly ugly "solution" (I am using this term *very* loosely)
> using awk and his faithfail companion sed, but I would prefer something in
> python.
>
> Thanks for your time.
>
More information about the Python-list
mailing list