Converting LF/FF delimited logs to XML w/ Python?

Chris Mellon arkanes at gmail.com
Wed Dec 5 16:39:37 EST 2007


On Dec 5, 2007 3:19 PM, Kadin2048 <usenet.kadin at xoxy.net> wrote:
> This is a very noob-ish question so I apologize in advance, but I'm
> hoping to get some input and advice before I get too over my head.
>
> I'm trying to convert some log files from a formfeed- and
> linefeed-delimited form into XML.  I'd been thinking of using Python to
> do this, but I'll be honest and say that I'm very inexperienced with
> Python, so before I dive in I wanted to see whether some more
> experienced minds thought I was choosing the right tool.
>
> Basically, what I want to do is convert from instant messaging logs
> produced by CenterIM, which look like this (Where "^L" represents ASCII
> 12, the formfeed character):
>
>     ^L
>     IN
>     MSG
>     1190126325
>     1190126325
>     hi
>     ^L
>     OUT
>     MSG
>     1190126383
>     1190126383
>     hello
>
> To an XML-based format* like this:
>
>     <chat account="joeblow" service="AIM" version="0.4">
>       <message sender="janedoe" time="1190126325">hi</message>
>       <message sender="joeblow" time="1190126383">hello</message>
>     </chat>
>
> Obviously there's information in the bottom example not present in the
> top (account names, protocol), but I'll grab those from the file name or
> prompt the user.
>
> Given that I'd be learning as I go along, is Python a good tool for
> doing this? (Am I totally insane to be trying this as a beginner?) And
> if so, where should I start?  I'd like to avoid massive
> wheel-reinvention if at all possible.
>
> I'm not afraid to RTFM but there's a lot of information around on Python
> and I'm not sure what's most relevant.  Suggestions on what to read,
> books to buy, etc., are all welcomed.
>

This is a pretty simple problem and is well suited for a beginner
project. The file() builtin will get you the data in your log file.
Using the split() method of the string object, you can break your
logfile into chunks.

There are a number of XML libraries in the standard lib, but xml.etree
is my preferred one. It is documented in the stdlib docs, and on the
effbot site.



More information about the Python-list mailing list