[Python-Dev] 2.2.1 issues
M.-A. Lemburg
mal@lemburg.com
Tue, 19 Feb 2002 15:34:24 +0100
Michael Hudson wrote:
>
> Well, we have the first 2.2 bugfix that isn't a no-brainer to port to
> 2.2.1. This is to do with the
>
> [ #495401 ] Build troubles: --with-pymalloc
>
> bug.
>
> As far as understand it, there were two problems.
>
> 1) with wide unicode characters, some function in unicodeobject.c to
> do with interpreting escape codes could write into memory it didn't
> own.
>
> 2) something to do with the handling of "unpaired high surrogates" in
> the utf-8 codec.
>
> Were these problems related? I think they got fixed at the same time,
> but I may have gotten confused.
Right. 1) was caused by 2). Both are fixed now.
> 1) shouldn't be too much of an issue to get into 2.2.1 (there was some
> contention about which fix performed better, but for 2.2.1 I don't
> care too much).
>
> 2) is more troublesome, because to fix it properly breaks .pycs, in
> turn because marshal uses the utf-8 codec to store unicode string
> constants, and this is a no-no according to PEP 6.
>
> Is it possible to worm around 2) by reconstructing valid strings from
> the bad marshal data, or has information been lost? How severe is the
> bug? Maybe it would be best to leave it unfixed in 2.2.1.
Well, I posted a message to python-dev or the checkins list about
this (don't remember). The situation is basically like this:
In Python <= 2.2.0, you could write
u = u"\uD800"
in a .py file. The first time you import this file, Python will
create a .pyc file for it using the broken UTF-8 encoding. The
import will succeed. The second time you import the module,
Python will try to use the .pyc file. Now reading that file
in fails with a UnicodeError and Python also does not revert
to the .py file.
As a result, modules using unpaired surrogates in Unicode
literals are simply broken in Python <= 2.2.0.
The problem with backporting this patch is that in order
for Python to properly recompile any broken module, the
magic will have to be changed. Question is whether this
is a reasonable thing to do in a patch level release...
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/