[Python-3000] XML as bytes or unicode?

"Martin v. Löwis" martin at v.loewis.de
Mon Aug 25 07:37:41 CEST 2008


> Well, does the parser handle it or should the code that got the XML in
> the first place handle it?

The parser handles encodings in XML; XML parsing is "bytes in, pieces of
Unicode out".

> Apparently whomever wrote the parsers originally thought it was not
> the parser's job. =)

Why do you think so? In Python, the XML parsers have always supported
encoding declarations.

Parsing Unicode XML strings isn't quite that meaningful.

> If someone wanted to you could possibly dispatch on bytes to some code
> that tried to determine the encoding and do the proper decode before
> proceeding.

That's the parser's job (and one that expat does correctly).

Regards,
Martin


More information about the Python-3000 mailing list