Converting LF/FF delimited logs to XML w/ Python?
kyosohma at gmail.com
kyosohma at gmail.com
Wed Dec 5 16:55:53 EST 2007
On Dec 5, 3:19 pm, Kadin2048 <usenet.ka... at xoxy.net> wrote:
> This is a very noob-ish question so I apologize in advance, but I'm
> hoping to get some input and advice before I get too over my head.
>
> I'm trying to convert some log files from a formfeed- and
> linefeed-delimited form into XML. I'd been thinking of using Python to
> do this, but I'll be honest and say that I'm very inexperienced with
> Python, so before I dive in I wanted to see whether some more
> experienced minds thought I was choosing the right tool.
>
> Basically, what I want to do is convert from instant messaging logs
> produced by CenterIM, which look like this (Where "^L" represents ASCII
> 12, the formfeed character):
>
> ^L
> IN
> MSG
> 1190126325
> 1190126325
> hi
> ^L
> OUT
> MSG
> 1190126383
> 1190126383
> hello
>
> To an XML-based format* like this:
>
> <chat account="joeblow" service="AIM" version="0.4">
> <message sender="janedoe" time="1190126325">hi</message>
> <message sender="joeblow" time="1190126383">hello</message>
> </chat>
>
> Obviously there's information in the bottom example not present in the
> top (account names, protocol), but I'll grab those from the file name or
> prompt the user.
>
> Given that I'd be learning as I go along, is Python a good tool for
> doing this? (Am I totally insane to be trying this as a beginner?) And
> if so, where should I start? I'd like to avoid massive
> wheel-reinvention if at all possible.
>
> I'm not afraid to RTFM but there's a lot of information around on Python
> and I'm not sure what's most relevant. Suggestions on what to read,
> books to buy, etc., are all welcomed.
>
> Thanks in advance,
> Kadin.
>
> * For the curious, this is sort of poor attempt at the "Universal Log
> Format" as used by Adium on OS X.
>
> --http://kadin.sdf-us.org/
I've used lxml and DOM/minidom. Both took my a while to figure out and
I still don't always understand them. Anyway, lxml is similar to the
method Chris mentioned.
http://docs.python.org/lib/module-xml.dom.html
http://www.oreilly.com/catalog/pythonxml/chapter/ch01.html
http://pyxml.sourceforge.net/topics/
Mike
More information about the Python-list
mailing list