Python parsing iTunes XML/COM

Stefan Behnel stefan_ml at behnel.de
Thu Jul 31 02:10:41 EDT 2008


william tanksley wrote:
> william tanksley <wtanksle... at gmail.com> wrote:
>> I'm still puzzled why I'm getting some non-Unicode out of an
>> ElementTree's text, though.
> 
> Now I know.
> 
> Okay, my answer is that cElementTree (in Python 2.5) is simply
> deranged when it comes to Unicode. It assumes everything's ASCII.

It does not "assume" that. It *requires* byte strings to be ASCII. If it
didn't enforce that, how could it possibly know what encoding they were using,
i.e. what they were supposed to mean at all? Read the Python Zen, in the face
of ambiguity, ElementTree refuses the temptation to guess. Python 2.x does
exactly the same thing when it comes to implicit conversion between encoded
strings and Unicode strings.

If you want to pass plain ASCII strings, you can either pass a byte string or
a Unicode string (that's a plain convenience feature). If you want to pass
anything that's not ASCII, you *must* pass a Unicode string.


> Reference: http://codespeak.net/lxml/compatibility.html
> 
> (Note that the lxml version also doesn't handle Unicode correctly; it
> errors when XML declares its encoding.)

It definitely does "handle Unicode correctly". Let me guess, you tried passing
XML as a Unicode string into the parser, and your XML declared itself as
having a byte encoding (<?xml encoding="..."?>). How can that *not* be an error?


> This is unpleasant, but at least now I know WHY it was driving me
> insane.

You should *really* read a bit about Unicode and byte encodings. Not
understanding a topic is not a good excuse for complaining about it being
broken for you.

Stefan



More information about the Python-list mailing list