[XML-SIG] PyExpat encoding

Paul Prescod paul@prescod.net
Thu, 01 Jun 2000 22:54:59 -0500


"Andrew M. Kuchling" wrote:
> 
> ...
>
> On the other hand, that means you can't use the system's copy of
> Expat, since who knows what it was compiled with?  Actually, this
> seems like a bug in Expat; if I have an Expat library, I have no way
> of figuring out what it'll be outputting: 

Adding this feature doesn't sound too tough. We should concentrate on
what we want because the implementation doesn't sound too brutal.

I don't see how we can in good conscience choose not to use Python's
Unicode type. I am not averse, however, to a flag that returns 8-bit
strings instead. We can use the Unicode object's features do that
easily.

So how about, this: we ask Expat 1.1000000001 (our new version) what
encoding it was compiled with. We can even expose this to the Python
programmer. 

parser.nativeEncoding() -> returns "UTF-8" or "UTF-16"

There is an independent flag that controls the encoding and type of the
returned objects. You get Unicode objects by default. If you want 8-bit
strings, you specifically ask for them. 

parser.requestUTF8( )

97% of programmers will never ask Expat what encoding it is using under
the cover nor will they change the flag to get 8-bit strings. Docs say:
"Unless you know what you are doing, leave these methods alone. They are
for performance freaks who know what they are doing only."

A performance freak would probably write code like this:

if parser.nativeEncoding()=="UTF-8":
	parser.requestUTF8()

Now managing the internationalization of the code is their problem.

The Windows binaries should come with a 16-bit-returing Expat.

Still and all, this is getting more complex than just bundling our
favorite version of Expat with the compile flags set the way we want
them!!!

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Simplicity does not precede complexity, but follows it. 
	- http://www.cs.yale.edu/~perlis-alan/quotes.html