[Python-Dev] Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions

Tue Sep 14 20:12:35 CEST 2004

On Sep 14, 2004, at 2:54 AM, Terry Reedy wrote:
> This is why I am not especially enamored of Unicode and the prospect of
> Python becoming married to it.  It is heavily weighted in favor of
> efficiently representing Chinese and inefficiently representing 
> English.
> To give English equivalent treatment, the 20,000 or so most common 
> words,
> roots, prefixes, and suffixes would each get its own codepoint.

Of course it is perfectly possible to have the Python unicode 
implementation choose to represent some unicode strings with only 8 
bits per character. There is no (conceptual) reason it could not 
represent (u'a' * 8) with 8 bytes + class header overhead. That is 
simply an implementation detail and really has nothing to do with 
Unicode itself.

It would also be possible to use UTF-8 string storage, although this 
has the tradeoff that indexing an element takes linear time w.r.t. 
position instead of constant time.

James