ElementTree cannot parse UTF-8 Unicode?

Wed Jan 19 15:50:57 EST 2005

Erik Bethke wrote:

> I am getting an error of not well-formed at the beginning of the Korean
> text in the second example.  I am doing something wrong with how I am
> encoding my Korean?  Do I need more of a wrapper about it than simple
> quotes?  Is there some sort of XML syntax for indicating a Unicode
> string, or does the Elementree library just not support reading of
> Unicode?

XML is Unicode, and ElementTree supports all common encodings just
fine (including UTF-8).

> this one fails:
> <?xml version="1.0" encoding="UTF-8"?>
> <Vocab>
>    <Word L1="?????!"></Word>
> </Vocab>

this works just fine on my machine.

what's the exact error message?

what does

    print repr(open("test2.xml").read())

print on your machine?

what happens if you attempt to parse

<Vocab>
    <Word L1="어녕하세요!" />
</Vocab>

?

</F>