[XML-SIG] Creating XML with Python
Fredrik Lundh
fredrik at pythonware.com
Thu Jul 24 14:04:58 CEST 2008
Eric Chao wrote:
> I've been trying to convert some text that has some odd coding to xml. I
> am trying to use python to create a program that will process this text:
>
> <BN>GENESIS</BN>
> <CN>CHAPTER 1</CN>
> <SH>The Creation</SH>
> <C>{{01:1}}1 <RA>In the beginning <RB>God <RC>created the heavens and
> the earth.
> <V>{{01:1}}2 The earth was <$FOr {a waste and
> emptiness}>><N1><RA>formless and void, and <RB>darkness was over the
> <V>{{01:1}}3 Then <RA>God said, ``Let there be light"; and there was light.
>
> to something like this:
>
> <book osisID="Gen">
> <chapter sID="Gen.1"/>
> <p><verse sID="Gen.1.1"/>In the beginning God created the heaven and the
> earth.<verse eID="Gen.1.1"/></p>
> <p><verse sID="Gen.1.2"/>And the earth was without form, and void; and
> darkness was upon the face of the deep. And the Spirit of God moved upon
> the face of the waters.<verse eID="Gen.1.2"/></p>
> <p><verse sID="Gen.1.3"/>And God said, Let there be light: and there was
> light.<verse eID="Gen.1.3"/></p>
>
> I am not very good with Python and I was hoping someone could offer some
> advice on how to get started. I tried to write a program that produces
> XML, but I think I need more of a find and replace type program. Thanks !
that looks a rather daunting task even for an experienced Python
programmer (especially mapping between different translations ;-).
I'd concentrate on parsing the original file format first, before even
thinking about how to write it out in XML.
it might be some kind of SGML, in which case the standard sgmllib
library might be helpful:
http://effbot.org/librarybook/sgmllib.htm
if that seems to work, try building some suitable data structure from
the incoming data (lists of strings might work, but you might want to
create some simple container objects that holds the lists for you).
when you have all this in place, you can either just walk the data
structure and create XML on the fly (don't forget to escape reserved
characters; you can use cgi.escape for that), or build e.g. an
ElementTree (xml.tree) and then ask that module to serialize the tree
for you.
hope this helps, at least a little.
</F>
More information about the XML-SIG
mailing list