[XML-SIG] PyExpat changes for encoding (was: XML support in Python 1.6)

Walter Underwood wunder@ultraseek.com
Mon, 05 Jun 2000 14:19:33 -0700


--On Friday, June 02, 2000 8:17 AM -0700 Greg Stein <gstein@lyra.org>
wrote:
> On Fri, 2 Jun 2000, Andrew M. Kuchling wrote:
>> ...
>> parser.nativeEncoding() -> returns "UTF-8" or "UTF-16"
> 
> pyexpat.native_encoding as a readonly attribute. I see no particular
> use in making it a function. (Note the module-level, too!)

I like it, but "unicode" is not an encoding. The proper Unicode 3.0 name
for this is "UTF-16". In Uncode 2.x, it was called "UCS-2". If there is
no byte-order mark (BOM), then it should identify as little- or
big-endian,
that is, UTF-16LE or UTF-16BE.

But I'm strongly in favor of Expat returning UTF-16 in native byte
order,
and the Python interface returning Python unicode objects. Relying on 
locally-installed copies of Expat would be a support nightmare for us.

wunder
--
Walter R. Underwood
Senior Staff Engineer, Infoseek Software
http://software.infoseek.com/