[Python-Dev] XML codec?

"Martin v. Löwis" martin at v.loewis.de
Sun Nov 11 14:40:44 CET 2007


> I don't know. Is an XML document ill-formed if it doesn't contain an
> XML declaration, is not in UTF-8 or UTF-8, but there's external
> encoding info?

If there is external encoding info, matching the actual encoding,
it would be well-formed. Of course, preserving that information would
be up to the application.

> This looks good. Now we would have to extent the code to detect and
> replace the encoding in the XML declaration too.

I'm still opposed to making this a codec. Right - for a pure Python
solution, the processing of the XML declaration would still need to
be implemented.

>> I think there could be a much simpler routine to have the same 
>> effect. - if it's less than 4 bytes, answer "need more data".
> 
> Can there be an XML document that is less then 4 bytes? I guess not.

No, the smallest document has exactly 4 characters (e.g. "<f/>").
However, external entities may be smaller, such as "x".

> But anyway: would a Python implementation of these two functions
> (detect_encoding()/fix_encoding()) be accepted?

I could agree to a Python implementation of this algorithm as long
as it's not packaged as a codec.

Regards,
Martin



More information about the Python-Dev mailing list