ElementTree cannot parse UTF-8 Unicode?

Fredrik Lundh fredrik at pythonware.com
Thu Jan 20 02:17:03 EST 2005


Erik Bethke wrote:

> 2) You are right in that the print of the file read works just fine.

but what does it look like?  I saved a raw copy of your original mail,
fixed the quoted-printable encoding, and got an UTF-8 encoded file
that works just fine.  the thing you've been parsing, and that you've
cut and pasted into your mail, must be different, in some way.

> 3) You are also right in that the digitally encoded unicode also works
> fine.  However, this solution has two new problems:

that was just a test to make sure that your version of elementtree could
handle Unicode characters on your platform.

> 1) The xml file is now not human readable
> 2) After ElementTree gets done parsing it, I am feeding the text to a
> wx.TextCtrl via .SetValue() but that is now giving me an error message
> of being unable to convert that style of string

on my machine, the L1 attribute contains a Unicode string:

    >>> print repr(root.find("Word").get("L1"))
    u'\uc5b4\ub155\ud558\uc138\uc694!'

what does it give you on your machine?  (looks like wxPython cannot handle
Unicode strings, but can that really be true?)

> So it seems to me, that ElementTree is just not expecting to run into
> the Korean characters for it is at column 16 that these begin.  Am I
> formatting the XML properly?

nobody knows...

</F> 






More information about the Python-list mailing list