Ideas for parsing this text?

Wed Apr 23 21:17:41 EDT 2008

On Apr 23, 9:00 pm, "Eric Wertman" <ewert... at gmail.com> wrote:

> I have a set of files with this kind of content (it's dumped from WebSphere):
>
> [snipped]
>
> I'm trying to figure out the best way to feed it all into
> dictionaries, without having to know exactly what the contents of the
> file are.  

It would be pretty pointless if you had to know in advance the exact
file content, but you still have to know the structure of the files,
that is the grammar they conform to.

> Any ideas?  I was considering making a list of string combinations, like so:
>
> junk = ['[[','"[',']]']
>
> and just using re.sub to covert them into a single character that I
> could start to do split() actions on.  There must be something else I
> can do..

Yes, find out the formal grammar of these files and use a parser
generator [1] to specify it.

HTH,
George

[1] http://wiki.python.org/moin/LanguageParsing