[Python-Dev] Fix Unicode-disabled build of Python 2.7

Serhiy Storchaka storchaka at gmail.com
Wed Jun 25 14:55:35 CEST 2014


25.06.14 00:03, Jim J. Jewett написав(ла):
> It would be good to fix the tests (and actual library issues).
> Unfortunately, some of the specifically proposed changes (such as
> defining and using _unicode instead of unicode within python code)
> look to me as though they would trigger problems in the normal build
> (where the unicode object *does* exist, but would no longer be used).

This is recomended by MvL [1] and widely used (19 times in source code) 
idiom.

[1] http://bugs.python.org/issue8767#msg159473

> Other changes, such as the use of \x escapes, appear correct, but make
> the tests harder to read -- and might end up removing a test for
> correct unicode funtionality across different spellings.


>
> Even if we assume that the tests are fine, and I'm just an idiot who
> misread them, the fact that there is any confusion means that these
> particular changes may be tricky enough to be for a bad tradeoff for 2.7.
>
> It *might* work if you could make a more focused change.  For example,
> instead of leaving the 'unicode' name unbound, provide an object that
> simply returns false for isinstance and raises a UnicodeError for any
> other method call.  Even *this* might be too aggressive to 2.7, but the
> fact that it would only appear in the --disable-unicode builds, and
> would make them more similar to the regular build are points in its
> favor.

No, existing code use different approach. "unicode" doesn't exist, while 
encode/decode methods exist but are useless. If my memory doesn't fail 
me, there is even special explanatory comment about this historical 
decision somewhere. This decision was made many years ago.

> Before doing that, though, please document what the --disable-unicode
> mode is actually *supposed* to do when interacting with byte-streams
> that a standard defines as UTF-8.  (For example, are the changes to
> _xml_dumps and _xml_loads at
>      http://bugs.python.org/file35758/multiprocessing.patch
> correct, or do those functions assume they get bytes as input, or
> should the functions raise an exception any time they are called?)

Looking more carefully, I see that there is a bug in unicode-enable 
build (wrong backporting from 3.x). In 2.x xmlrpclib.dumps produces 
already utf-8 encoded string, in 3.x xmlrpc.client.dumps produces 
unicode string. multiprocessing should fail with non-ascii str or unicode.

Side benefit of my patches is that they expose existing errors in 
unicode-enable build.



More information about the Python-Dev mailing list