[XML-SIG] [URGENT] Problem with accent char
matt
matt@virtualspectator.com
Thu, 11 Jan 2001 00:29:38 +1300
Have a look through the mailing list ... I asked a whol lot of these question
earlier ... anyway, comments below :
On Thu, 11 Jan 2001, Olivier Deckmyn wrote:
> Hi all,
>
> Looks like parser modifies my content :(
>
good .. it should ... see later
> I have the following "xml" string :
> """
> <?xml version="1.0" encoding="iso-8859-1"?>
> <Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
> <Head>
> <Name>GB-OTAN-santé</Name>
> <DateReleased>20010110T105314Z</DateReleased>
> <Source>AFP</Source>
> </Head>
> <NewsLines>
> <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
> Londres</HeadLine>
> <DateLine>LONDRES</DateLine>
> </NewsLines>
> </Xafp>
> """
>
> One can notice that there are accents chars (iso-8859-1) inside <Name> or
> <HeadLine> tags ; with a well defined encoding value in header...
>
> If I parse this string (using Sax2.FromXml(...), getElementsByTagName() and
> nodes[0].firstChild.nodeValue) ; the <Headline> tag content becomes :
> """
> La pol\303\251mique loin d'\303\252tre apais\303\251e par l'annonce de tests
> \303\240 Londres
> """
>
> Looks like there has been a unicode (utf-8 ?) conversion ...
>
Yes, that is correct, as specified. All xml parsers should recognise the
encoding set and CONVERT it to unicode ... UTF-8 being the common flavour.
> What can I do, not to have this conversion made ? I don't want the parser to
> modify my content !!!!
It's ok, you can get it back out nicely ....
try the following little function I use :
from xml.dom import ext
def retPrettyPrint(doc):
t = cStringIO.StringIO()
ext.PrettyPrint(doc,t, encoding='ISO-8859-1')
return t.getvalue()
regards
Matt
>
> Thanx for your support...
>
> I've tried with py-xml 0.5.1 and 0.6.2
>
> I use python 1.5.2 under FreeBSD 4.2
>
> My imports (might help ?):
> from xml import dom
> from xml.dom.ext.reader import Sax2
> from xml.dom import ext
> from xml.dom.Node import Node
>
> Thanx again,
>
> Olivier.
>
> ---
> We are Micro$oft. You will be assimilated. Resistance is futile.
>
>
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
--
Matt Halstead (PhD)
Research and development
VirtualSpectator
http://www.virtualspectator.com
ph 64-9-9136896