What encoding does u'...' syntax use?

"Martin v. Löwis" martin at v.loewis.de
Sat Feb 21 15:45:09 EST 2009


>> Indeed. As Python *can* encode all characters even in 2-byte mode
>> (since PEP 261), it seems clear that Python's Unicode representation
>> is *not* strictly UCS-2 anymore.
> 
> Since we're already discussing this, I'm curious - why was UCS-2
> chosen over plain UTF-16 or UTF-8 in the first place for Python's
> internal storage?

You mean, originally? Originally, the choice was only between UCS-2
and UCS-4; choice was in favor of UCS-2 because of size concerns.
UTF-8 was ruled out easily because it doesn't allow constant-size
indexing; UTF-16 essentially for the same reason (plus there was
no point to UTF-16, since there were no assigned characters outside
the BMP).

Regards,
Martin






More information about the Python-list mailing list