[XML-SIG] parsing chinese characters

Luis Miguel Morillas morillas at gmail.com
Mon Oct 22 22:52:16 CEST 2007


You must add the correct encoding info in the xml source file.

Ex. using amara:

chinese.xml
<?xml version="1.0" encoding="utf-8"?>
<test>ÔuÔuà¢à¢²ÅÊDZw.¼ššìéLï³²ÅÊǐÛ</test>

>>> import amara
>>> doc = amara.parse('chinese.xml')
>>> print unicode(doc.test)
>>> ÔuÔuà¢à¢²ÅÊDZw.¼ššìéLï³²ÅÊǐÛ

No problem with big5

>>> doc = amara.parse('http://xml.ascc.net/test/wfall/big5/test13.xml')
>>>



2007/10/22, Fabian L¨®pez <fabian at syameses.com>:
> Hi,
> I am parsing an XML file that includes chineses characters, like
> ^ÔuÔuà¢à¢²ÅÊDZw.¼ššìéLï³²ÅÊÇÛ or ¥Ø¥¢¥¢¥¤¥í¥ó... The problem is that I get an error like:
> UnicodeEncodeerror:'charmap' codec can't encode characters in position....
> The thing is that I would like to ignore it and parse all the characters
> less these ones. So, could anyone help me? I suppose that I can catch an
> exception that ignores it or maybe use any function that detects this
> chinese characters and after that ignore them.
>
> Thanks!!
> Fabian
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG at python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>
>


-- 
Saludos,

--

Luis Miguel


More information about the XML-SIG mailing list