[XML-SIG] Creating XML with Python
J. Clifford Dyer
jcd at unc.edu
Thu Jul 24 14:30:18 CEST 2008
On Thu, 2008-07-24 at 14:04 +0200, Fredrik Lundh wrote:
> Eric Chao wrote:
>
> > I've been trying to convert some text that has some odd coding to xml. I
> > am trying to use python to create a program that will process this text:
> >
> > <BN>GENESIS</BN>
> > <CN>CHAPTER 1</CN>
> > <SH>The Creation</SH>
> > <C>{{01:1}}1 <RA>In the beginning <RB>God <RC>created the heavens and
> > the earth.
> > <V>{{01:1}}2 The earth was <$FOr {a waste and
> > emptiness}>><N1><RA>formless and void, and <RB>darkness was over the
> > <V>{{01:1}}3 Then <RA>God said, ``Let there be light"; and there was light.
> >
> > to something like this:
> >
> > <book osisID="Gen">
> > <chapter sID="Gen.1"/>
> > <p><verse sID="Gen.1.1"/>In the beginning God created the heaven and the
> > earth.<verse eID="Gen.1.1"/></p>
> > <p><verse sID="Gen.1.2"/>And the earth was without form, and void; and
> > darkness was upon the face of the deep. And the Spirit of God moved upon
> > the face of the waters.<verse eID="Gen.1.2"/></p>
> > <p><verse sID="Gen.1.3"/>And God said, Let there be light: and there was
> > light.<verse eID="Gen.1.3"/></p>
> >
> > I am not very good with Python and I was hoping someone could offer some
> > advice on how to get started. I tried to write a program that produces
> > XML, but I think I need more of a find and replace type program. Thanks !
>
> that looks a rather daunting task even for an experienced Python
> programmer (especially mapping between different translations ;-).
>
> I'd concentrate on parsing the original file format first, before even
> thinking about how to write it out in XML.
>
> it might be some kind of SGML, in which case the standard sgmllib
> library might be helpful:
>
> http://effbot.org/librarybook/sgmllib.htm
>
> if that seems to work, try building some suitable data structure from
> the incoming data (lists of strings might work, but you might want to
> create some simple container objects that holds the lists for you).
If it turns out not to be valid SGML, you may need to look into using
pyparsing. There was a good introduction to it in a recent issue of
python magazine. There are also a bunch of online tutorials.
--
Oook!
J. Cliff Dyer
Carolina Digital Library and Archives
UNC Chapel Hill
More information about the XML-SIG
mailing list