Questions on XML

Emmanuel Surleau emmanuel.surleau at gmail.com
Sat Aug 22 02:59:56 EDT 2009


On Saturday 22 August 2009 08:13:33 joy99 wrote:
> On Aug 22, 10:53 am, Stefan Behnel <stefan... at behnel.de> wrote:
> > Rami Chowdhury wrote:
> > >> I am using primarily UTF-8 based strings, like Hindi or Bengali. Can I
> > >> use Python to help me in this regard?
> > >
> > > I can say from experience that Python on Windows (at least, Python 2.5
> > > on 32-bit Vista) works perfectly well with UTF-8 files containing
> > > Bangla. I have had trouble with working with the data in IDLE, however,
> > > which seems to prefer ASCII by default.
> >
> > Defaults almost never work for encodings. You have to be explicit: add an
> > encoding declaration to the top of your source file if you use encoded
> > literal strings in your code; use the codecs module with a suitable
> > encoding to read encoded text files, and use an XML parser when reading
> > XML.
> >
> > Stefan
>
> Dear Group,
> Thanx for your reply. Python works perfectly for Hindi and Bangla with
> Win XP. I never had a trouble.
> Best Regards,
> Subhabrata.

You might also want to have a look at lxml. It can much more than the XML 
module in the default distribution, uses ElementTree as well, and is backed by 
the kickass, fast libxml library (http://codespeak.net/lxml/). It will allow 
you to use XSLs, for instance. Regardless of whether you use lxml or not, have 
a look at etree.iterparse, it is invaluable when processing huge XML 
documents.

Cheers,

Emm



More information about the Python-list mailing list