Flexible string representation, unicode, typography, ...

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Aug 30 02:55:01 EDT 2012


On Wed, 29 Aug 2012 08:43:05 -0700, wxjmfauth wrote:

> I can hit the nail a little more.
> I have even a better idea and I'm serious.
> 
> If "Python" has found a new way to cover the set of the Unicode
> characters, why not proposing it to the Unicode consortium?

Because the implementation of the str datatype in a programming language 
has nothing to do with the Unicode consortium. You might as well propose 
it to the International Union of Railway Engineers.


> Unicode has already three schemes covering practically all cases: memory
> consumption, maximum flexibility and an intermediate solution.

And Python's solution uses those: UCS-2, UCS-4, and UTF-8.

The only thing which is innovative here is that instead of the Python 
compiler declaring that "all strings will be stored in UCS-2", the 
compiler chooses an implementation for each string as needed. So some 
strings will be stored internally as UCS-4, some as UCS-2, and some as 
ASCII (which is a standard, but not the Unicode consortium's standard).

(And possibly some as UTF-8? I'm not entirely sure from reading the PEP.)

There's nothing radical here, honest.



-- 
Steven



More information about the Python-list mailing list