Converting LF/FF delimited logs to XML w/ Python?

kyosohma at gmail.com kyosohma at gmail.com
Wed Dec 5 16:55:53 EST 2007


On Dec 5, 3:19 pm, Kadin2048 <usenet.ka... at xoxy.net> wrote:
> This is a very noob-ish question so I apologize in advance, but I'm
> hoping to get some input and advice before I get too over my head.
>
> I'm trying to convert some log files from a formfeed- and
> linefeed-delimited form into XML.  I'd been thinking of using Python to
> do this, but I'll be honest and say that I'm very inexperienced with
> Python, so before I dive in I wanted to see whether some more
> experienced minds thought I was choosing the right tool.
>
> Basically, what I want to do is convert from instant messaging logs
> produced by CenterIM, which look like this (Where "^L" represents ASCII
> 12, the formfeed character):
>
>     ^L
>     IN
>     MSG
>     1190126325
>     1190126325
>     hi
>     ^L
>     OUT
>     MSG
>     1190126383
>     1190126383
>     hello
>
> To an XML-based format* like this:
>
>     <chat account="joeblow" service="AIM" version="0.4">
>       <message sender="janedoe" time="1190126325">hi</message>
>       <message sender="joeblow" time="1190126383">hello</message>
>     </chat>
>
> Obviously there's information in the bottom example not present in the
> top (account names, protocol), but I'll grab those from the file name or
> prompt the user.
>
> Given that I'd be learning as I go along, is Python a good tool for
> doing this? (Am I totally insane to be trying this as a beginner?) And
> if so, where should I start?  I'd like to avoid massive
> wheel-reinvention if at all possible.
>
> I'm not afraid to RTFM but there's a lot of information around on Python
> and I'm not sure what's most relevant.  Suggestions on what to read,
> books to buy, etc., are all welcomed.
>
> Thanks in advance,
> Kadin.
>
> * For the curious, this is sort of poor attempt at the "Universal Log
> Format" as used by Adium on OS X.
>
> --http://kadin.sdf-us.org/

I've used lxml and DOM/minidom. Both took my a while to figure out and
I still don't always understand them. Anyway, lxml is similar to the
method Chris mentioned.

http://docs.python.org/lib/module-xml.dom.html
http://www.oreilly.com/catalog/pythonxml/chapter/ch01.html
http://pyxml.sourceforge.net/topics/

Mike



More information about the Python-list mailing list