[Python-Dev] New Py_UNICODE doc

Shane Hathaway shane at hathawaymix.org
Fri May 6 01:55:23 CEST 2005


Nicholas Bastin wrote:
> 
> On May 4, 2005, at 6:20 PM, Shane Hathaway wrote:
>> On a related note, it would be help if the documentation provided a
>> little more background on unicode encoding.  Specifically, that UCS-2 is
>> not the same as UTF-16, even though they're both two bytes wide and most
>> of the characters are the same.  UTF-16 can encode 4 byte characters,
>> while UCS-2 can't.  A Py_UNICODE is either UCS-2 or UCS-4.  It took me
> 
> 
> I'm not sure the Python documentation is the place to teach someone
> about unicode.  The ISO 10646 pretty clearly defines UCS-2 as only
> containing characters in the BMP (plane zero).  On the other hand, I
> don't know why python lets you choose UCS-2 anyhow, since it's almost
> always not what you want.

Then something in the Python docs ought to say why UCS-2 is not what you
want.  I still don't know; I've heard differing opinions on the subject.
 Some say you'll never need more than what UCS-2 provides.  Is that
incorrect?

More generally, how should a non-unicode-expert writing Python extension
code find out the minimum they need to know about unicode to use the
Python unicode API?  The API reference [1] ought to at least have a list
of background links.  I had to hunt everywhere.

.. [1] http://docs.python.org/api/unicodeObjects.html

Shane


More information about the Python-Dev mailing list