[Python-Dev] 2.2.1 issues

M.-A. Lemburg mal@lemburg.com
Tue, 19 Feb 2002 15:34:24 +0100


Michael Hudson wrote:
> 
> Well, we have the first 2.2 bugfix that isn't a no-brainer to port to
> 2.2.1.  This is to do with the
> 
> [ #495401 ] Build troubles: --with-pymalloc
> 
> bug.
> 
> As far as understand it, there were two problems.
> 
> 1) with wide unicode characters, some function in unicodeobject.c to
>    do with interpreting escape codes could write into memory it didn't
>    own.
> 
> 2) something to do with the handling of "unpaired high surrogates" in
>    the utf-8 codec.
> 
> Were these problems related?  I think they got fixed at the same time,
> but I may have gotten confused.

Right. 1) was caused by 2). Both are fixed now.
 
> 1) shouldn't be too much of an issue to get into 2.2.1 (there was some
> contention about which fix performed better, but for 2.2.1 I don't
> care too much).
> 
> 2) is more troublesome, because to fix it properly breaks .pycs, in
> turn because marshal uses the utf-8 codec to store unicode string
> constants, and this is a no-no according to PEP 6.
> 
> Is it possible to worm around 2) by reconstructing valid strings from
> the bad marshal data, or has information been lost?  How severe is the
> bug?  Maybe it would be best to leave it unfixed in 2.2.1.

Well, I posted a message to python-dev or the checkins list about 
this (don't remember). The situation is basically like this:

In Python <= 2.2.0, you could write

u = u"\uD800"

in a .py file. The first time you import this file, Python will
create a .pyc file for it using the broken UTF-8 encoding. The
import will succeed. The second time you import the module,
Python will try to use the .pyc file. Now reading that file
in fails with a UnicodeError and Python also does not revert
to the .py file.

As a result, modules using unpaired surrogates in Unicode
literals are simply broken in Python <= 2.2.0.

The problem with backporting this patch is that in order
for Python to properly recompile any broken module, the
magic will have to be changed. Question is whether this
is a reasonable thing to do in a patch level release...

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/