[Python-Dev] len(chr(i)) = 2?

Victor Stinner victor.stinner at haypocalc.com
Fri Nov 19 21:23:14 CET 2010


Hi,

On Friday 19 November 2010 17:53:58 Alexander Belopolsky wrote:
> I was recently surprised to learn that chr(i) can produce a string of
> length 2 in python 3.x.

Yes, but only on narrow build. Eg. Debian and Ubuntu compile Python 3.1 in 
wide mode (sys.maxunicode == 1114111).

> I suspect that I am not alone finding this behavior non-obvious 
> given that a mistake in Python manual stating the contrary survived 
> several releases.  [1]

It was a documentation bug and you fixed it. Non-BMP characters are rare, so 
few (maybe only you?) noticed the documentation bug. I consider the behaviour 
as an improvment of non-BMP support of Python3.

Python is unclear about non-BMP characters: narrow build was called "ucs2" for 
long time, even if it is UTF-16 (each character is encoded to one or two 
UTF-16 words). Python2 accepts non-BMP characters with \U syntax, but not with 
chr(). This is inconsistent and I see this as a bug. But I don't want to touch 
Python2 about non-BMP characters, and the "bug" is already fixed in Python3!

> I do believe, however that a change like
> this [2] and its consequences should be better publicized.

Change made before the release of Python 3.0. Do you want to patch the "What's 
new in Python 3.0?" document?

> I have not
> found any discussion of this change in PEPs or "What's new" documents.
>  The closest find was a mentioning of a related issue #3280 in the 3.0
> NEWS file. [3]  Since this feature will be first documented in the
> Library Reference in 3.2, I wonder if it will be appropriate to
> mention it in "What's new in 3.2"?

In my opinion, the question is more what was it not fixed in Python2. I suppose 
that the answer is something ugly like "backward compatibility" or "historical 
reasons" :-)

Victor


More information about the Python-Dev mailing list