SAX-Parser entity

Jason Orendorff jason at jorendorff.com
Sat Mar 2 02:11:50 EST 2002


fabi kreutz wrote:
> where Character 19 in Row 29 is the occurence of an ü.

Yep, if you are going to use non-ascii characters you have to
specify the encoding in the XML document itself, so the XML parser
knows what's going on.

> After browsing the FAQs I changed the default encoding in site.py to
> iso-8859-1, which had some nice effect, but not on minidom.

What you need is:

    <?xml version="1.0" encoding="ISO-8859-1"?>

It must be the very first thing in the XML document; i.e. the two
characters "<?" must be the first two bytes.

(Alternatively, ü can also be written in XML as &#x00FC; .
See http://www.unicode.org/charts/PDF/U0080.pdf for more
codepoints and http://www.unicode.org/charts/ for oodles
more still.)

*** In general, your life as a developer will be much easier
    once you grok Unicode.

(I don't know why people can't just read the XML standard and
figure this out for themselves.  I mean, come on, guys, it's only
40 pages of incredibly dense gibberish.  <wink>)

## Jason Orendorff    http://www.jorendorff.com/





More information about the Python-list mailing list