[XML-SIG] Creating XML with Python

Fredrik Lundh fredrik at pythonware.com
Thu Jul 24 14:04:58 CEST 2008


Eric Chao wrote:

> I've been trying to convert some text that has some odd coding to xml. I 
> am trying to use python to create a program that will process this text:
> 
> <BN>GENESIS</BN>
> <CN>CHAPTER 1</CN>
> <SH>The Creation</SH>
> <C>{{01:1}}1 <RA>In the beginning <RB>God <RC>created the heavens and 
> the earth.
> <V>{{01:1}}2 The earth was <$FOr {a waste and 
> emptiness}>><N1><RA>formless and void, and <RB>darkness was over the 
> <V>{{01:1}}3 Then <RA>God said, ``Let there be light"; and there was light.
> 
> to something like this:
> 
> <book osisID="Gen">
> <chapter sID="Gen.1"/>
> <p><verse sID="Gen.1.1"/>In the beginning God created the heaven and the 
> earth.<verse eID="Gen.1.1"/></p>
> <p><verse sID="Gen.1.2"/>And the earth was without form, and void; and 
> darkness was upon the face of the deep. And the Spirit of God moved upon 
> the face of the waters.<verse eID="Gen.1.2"/></p>
> <p><verse sID="Gen.1.3"/>And God said, Let there be light: and there was 
> light.<verse eID="Gen.1.3"/></p>
> 
> I am not very good with Python and I was hoping someone could offer some 
> advice on how to get started. I tried to write a program that produces 
> XML, but I think I need more of a find and replace type program. Thanks !

that looks a rather daunting task even for an experienced Python 
programmer (especially mapping between different translations ;-).

I'd concentrate on parsing the original file format first, before even 
thinking about how to write it out in XML.

it might be some kind of SGML, in which case the standard sgmllib 
library might be helpful:

     http://effbot.org/librarybook/sgmllib.htm

if that seems to work, try building some suitable data structure from 
the incoming data (lists of strings might work, but you might want to 
create some simple container objects that holds the lists for you).

when you have all this in place, you can either just walk the data 
structure and create XML on the fly (don't forget to escape reserved 
characters; you can use cgi.escape for that), or build e.g. an 
ElementTree (xml.tree) and then ask that module to serialize the tree 
for you.

hope this helps, at least a little.

</F>



More information about the XML-SIG mailing list