Converting LF/FF delimited logs to XML w/ Python?

Kadin2048 usenet.kadin at xoxy.net
Wed Dec 5 16:19:00 EST 2007


This is a very noob-ish question so I apologize in advance, but I'm 
hoping to get some input and advice before I get too over my head.

I'm trying to convert some log files from a formfeed- and 
linefeed-delimited form into XML.  I'd been thinking of using Python to 
do this, but I'll be honest and say that I'm very inexperienced with 
Python, so before I dive in I wanted to see whether some more 
experienced minds thought I was choosing the right tool.

Basically, what I want to do is convert from instant messaging logs 
produced by CenterIM, which look like this (Where "^L" represents ASCII 
12, the formfeed character):

    ^L
    IN
    MSG
    1190126325
    1190126325
    hi
    ^L
    OUT
    MSG
    1190126383
    1190126383
    hello

To an XML-based format* like this:

    <chat account="joeblow" service="AIM" version="0.4">
      <message sender="janedoe" time="1190126325">hi</message>
      <message sender="joeblow" time="1190126383">hello</message>
    </chat>

Obviously there's information in the bottom example not present in the 
top (account names, protocol), but I'll grab those from the file name or 
prompt the user.

Given that I'd be learning as I go along, is Python a good tool for 
doing this? (Am I totally insane to be trying this as a beginner?) And 
if so, where should I start?  I'd like to avoid massive 
wheel-reinvention if at all possible.

I'm not afraid to RTFM but there's a lot of information around on Python 
and I'm not sure what's most relevant.  Suggestions on what to read, 
books to buy, etc., are all welcomed.

Thanks in advance,
Kadin.

* For the curious, this is sort of poor attempt at the "Universal Log 
Format" as used by Adium on OS X.

-- 
http://kadin.sdf-us.org/



More information about the Python-list mailing list