[XML-SIG] [URGENT] Problem with accent char

Lars Marius Garshol larsga@garshol.priv.no
10 Jan 2001 14:31:50 +0100


* Olivier Deckmyn
| 
| One can notice that there are accents chars (iso-8859-1) inside
| <Name> or <HeadLine> tags ; with a well defined encoding value in
| header...
| 
| If I parse this string (using Sax2.FromXml(...), getElementsByTagName() and
| nodes[0].firstChild.nodeValue) ; the <Headline> tag content becomes :
| """
| La pol\303\251mique loin d'\303\252tre apais\303\251e par l'annonce de tests
| \303\240 Londres
| """
| 
| Looks like there has been a unicode (utf-8 ?) conversion ...

That is correct.
 
| What can I do, not to have this conversion made ? I don't want the
| parser to modify my content !!!!
 
You can use xmlproc, you can convert back to latin1 yourself, or you
can use Python 2.0, where you'd get Unicode strings.

IMHO this is perfectly reasonable behaviour on the part of pyexpat.

--Lars M.