Unicode support in Python 2.7.8 - 16 bit

Chris Angelico rosuav at gmail.com
Tue Mar 7 17:21:14 EST 2017


On Wed, Mar 8, 2017 at 9:05 AM, John Nagle <nagle at animats.com> wrote:
>    How do I test if a Python 2.7.8 build was built for 32-bit
> Unicode?  (I'm dealing with shared hosting, and I'm stuck
> with their provided versions.)
>
> If I give this to Python 2.7.x:
>
>     sy = u'\U0001f60f'
>
> len(sy) is 1 on a Ubuntu 14.04LTS machine, but 2 on the
> Red Hat shared hosting machine.  I assume "1" indicates
> 32-bit Unicode capability, and "2" indicates 16-bit.
> It looks like  Python 2.x in 16-bit mode is using a UTF-16
> pair encoding, like Java. Is that right?  Is it documented
> somewhere?

That's correct. A narrow build will treat that as a pair of
surrogates. You may also be able to check this way:

>>> sys.maxunicode
1114111

> (Annoyingly, while the shared host has a Python 3, it's
> 3.2.3, which rejects "u" Unicode string constants and
> has other problems in the MySQL area.)

Yeah, you'll do well to get a newer Py3 than that. Fortunately, any
Linux old enough to be shipping 3.2 is likely to not depend on it in
any way, so you can install a new Py3 (maybe even 3.6) and shadow the
name "python3" with that. That's what I did when I was on Debian....
Squeeze, I think? and nothing newer than 3.2 was available.

Soon as you hit 3.3, the u"..." prefix becomes legal again, and
subsequent versions have added even more compatibility.

ChrisA



More information about the Python-list mailing list