[XML-SIG] [URGENT] Problem with accent char

Olivier Deckmyn Olivier Deckmyn" <odeckmyn@teaser.fr
Wed, 10 Jan 2001 12:09:26 +0100


Hi all,

Looks like parser modifies my content :(

I have the following "xml" string :
"""
<?xml version="1.0" encoding="iso-8859-1"?>
<Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
  <Head>
    <Name>GB-OTAN-santé</Name>
    <DateReleased>20010110T105314Z</DateReleased>
    <Source>AFP</Source>
  </Head>
  <NewsLines>
    <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
Londres</HeadLine>
    <DateLine>LONDRES</DateLine>
  </NewsLines>
</Xafp>
"""

One can notice that there are accents chars (iso-8859-1) inside <Name> or
<HeadLine> tags ; with a well defined encoding value in header...

If I parse this string (using Sax2.FromXml(...), getElementsByTagName() and
nodes[0].firstChild.nodeValue) ; the <Headline> tag content becomes :
"""
La pol\303\251mique loin d'\303\252tre apais\303\251e par l'annonce de tests
\303\240 Londres
"""

Looks like there has been a unicode (utf-8 ?) conversion ...

What can I do, not to have this conversion made ? I don't want the parser to
modify my content !!!!

Thanx for your support...

I've tried with py-xml 0.5.1 and 0.6.2

I use python 1.5.2 under FreeBSD 4.2

My imports (might help ?):
from xml import dom
from xml.dom.ext.reader import Sax2
from xml.dom import ext
from xml.dom.Node import Node

Thanx again,

Olivier.

---
We are Micro$oft. You will be assimilated. Resistance is futile.