Unicode support in Python 2.7.8 - 16 bit

Terry Reedy tjreedy at udel.edu
Tue Mar 7 17:31:43 EST 2017


On 3/7/2017 5:05 PM, John Nagle wrote:
>    How do I test if a Python 2.7.8 build was built for 32-bit
> Unicode?  (I'm dealing with shared hosting, and I'm stuck
> with their provided versions.)
>
> If I give this to Python 2.7.x:
>
>     sy = u'\U0001f60f'
>
> len(sy) is 1 on a Ubuntu 14.04LTS machine, but 2 on the
> Red Hat shared hosting machine.  I assume "1" indicates
> 32-bit Unicode capability, and "2" indicates 16-bit.

Correct

> It looks like  Python 2.x in 16-bit mode is using a UTF-16
> pair encoding, like Java. Is that right?  Is it documented
> somewhere?

Yes, surrogate pairs. Probably

> (Annoyingly, while the shared host has a Python 3, it's
> 3.2.3, which rejects "u" Unicode string constants and
> has other problems in the MySQL area.)

;Very annoying. 3.2 on *nix can also have either narrow or wide build.
3.3+ use new flexible string representation on all platforms.


-- 
Terry Jan Reedy




More information about the Python-list mailing list