[Python-checkins] CVS: python/dist/src/Objects unicodeobject.c,2.127,2.128

M.-A. Lemburg mal@lemburg.com
Thu, 07 Feb 2002 13:57:25 +0100


[UTF-8 codec changes]

Thinking about this some more, I am not that confident anymore
about making this a bug fix candidate. The fix of the encoder
output affects PYC files as well as tests which encode
Unicode strings containing unpaired high surrogates (e.g.
ones which construct Unicode strings using range(0x10000)).

About the PYC issue:
Byte code from source files containing '\uD800' cannot be read 
back into the interpreter; as a result importing such modules
fails if Python finds a valid PYC file for it. The fix would
recover this error, but users would have to manually delete
the PYC files since we cannot change the PYC magic in patch
releases.

About the UTF-8 encoding output:
Prior to the fix, u'\uD800'.encode('utf-8') gave '\xa0\x80',
now it returns '\xeo\xa0\x80' which is correct, but breaks
any code relying on the old behaviour. OTOH, '\xa0\x80' cannot
be decoded back into Unicode, so this may be a non-issue.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/