ElementTree cannot parse UTF-8 Unicode?
Fredrik Lundh
fredrik at pythonware.com
Thu Jan 20 02:17:03 EST 2005
Erik Bethke wrote:
> 2) You are right in that the print of the file read works just fine.
but what does it look like? I saved a raw copy of your original mail,
fixed the quoted-printable encoding, and got an UTF-8 encoded file
that works just fine. the thing you've been parsing, and that you've
cut and pasted into your mail, must be different, in some way.
> 3) You are also right in that the digitally encoded unicode also works
> fine. However, this solution has two new problems:
that was just a test to make sure that your version of elementtree could
handle Unicode characters on your platform.
> 1) The xml file is now not human readable
> 2) After ElementTree gets done parsing it, I am feeding the text to a
> wx.TextCtrl via .SetValue() but that is now giving me an error message
> of being unable to convert that style of string
on my machine, the L1 attribute contains a Unicode string:
>>> print repr(root.find("Word").get("L1"))
u'\uc5b4\ub155\ud558\uc138\uc694!'
what does it give you on your machine? (looks like wxPython cannot handle
Unicode strings, but can that really be true?)
> So it seems to me, that ElementTree is just not expecting to run into
> the Korean characters for it is at column 16 that these begin. Am I
> formatting the XML properly?
nobody knows...
</F>
More information about the Python-list
mailing list