[Python-Dev] XML codec?

Walter Dörwald walter at livinglogic.de
Fri Nov 9 14:44:30 CET 2007


M.-A. Lemburg wrote:

> On 2007-11-09 14:10, Walter Dörwald wrote:
>> Martin v. Löwis wrote:
>>>>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
>>>>> codecs to do the encoding.  There's no need to create a magical
>>>>> mystery codec to pick out which though.
>>>> So the code is good, if it is inside an XML parser, and it's bad if it
>>>> is inside a codec?
>>> Exactly so. This functionality just *isn't* a codec - there is no
>>> encoding. Instead, it is an algorithm for *detecting* an encoding.
>> And what do you do once you've detected the encoding? You decode the
>> input, so why not combine both into an XML decoder?
> 
> FWIW: I'm +1 on adding such a codec.
> 
> It makes working with XML data a lot easier: you simply don't have to
> bother with the encoding of the XML data anymore and can just let the
> codec figure out the details. The XML parser can then work directly
> on the Unicode data.

Exactly. I have a version of sgmlop lying around that does that.

> Whether it needs to be in C or not is another question (I would have
> done this in Python since performance is not really an issue), but since
> the code is already written, why not use it ?

Servus,
   Walter


More information about the Python-Dev mailing list