[XML-SIG] [URGENT] Problem with accent char

matt matt@virtualspectator.com
Thu, 11 Jan 2001 00:29:38 +1300


Have a look through the mailing list ... I asked a whol lot of these question
earlier ... anyway, comments below :


On Thu, 11 Jan 2001, Olivier Deckmyn wrote:
> Hi all,
> 
> Looks like parser modifies my content :(
> 

good .. it should ... see later


> I have the following "xml" string :
> """
> <?xml version="1.0" encoding="iso-8859-1"?>
> <Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
>   <Head>
>     <Name>GB-OTAN-santé</Name>
>     <DateReleased>20010110T105314Z</DateReleased>
>     <Source>AFP</Source>
>   </Head>
>   <NewsLines>
>     <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
> Londres</HeadLine>
>     <DateLine>LONDRES</DateLine>
>   </NewsLines>
> </Xafp>
> """
> 
> One can notice that there are accents chars (iso-8859-1) inside <Name> or
> <HeadLine> tags ; with a well defined encoding value in header...
> 
> If I parse this string (using Sax2.FromXml(...), getElementsByTagName() and
> nodes[0].firstChild.nodeValue) ; the <Headline> tag content becomes :
> """
> La pol\303\251mique loin d'\303\252tre apais\303\251e par l'annonce de tests
> \303\240 Londres
> """
> 
> Looks like there has been a unicode (utf-8 ?) conversion ...
> 

Yes, that is correct, as specified.  All xml parsers should recognise the
encoding set and CONVERT it to unicode ... UTF-8 being the common flavour.


> What can I do, not to have this conversion made ? I don't want the parser to
> modify my content !!!!


It's ok, you can get it back out nicely ....

try the following little function I use :

from xml.dom import ext
def retPrettyPrint(doc):
    t = cStringIO.StringIO()
    ext.PrettyPrint(doc,t, encoding='ISO-8859-1')
    return t.getvalue()


regards
Matt




> 
> Thanx for your support...
> 
> I've tried with py-xml 0.5.1 and 0.6.2
> 
> I use python 1.5.2 under FreeBSD 4.2
> 
> My imports (might help ?):
> from xml import dom
> from xml.dom.ext.reader import Sax2
> from xml.dom import ext
> from xml.dom.Node import Node
> 
> Thanx again,
> 
> Olivier.
> 
> ---
> We are Micro$oft. You will be assimilated. Resistance is futile.
> 
> 
> 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
-- 
Matt Halstead (PhD)
Research and development
VirtualSpectator
http://www.virtualspectator.com
ph 64-9-9136896