[Python-Dev] Re: Re: Alternative Implementation for PEP
292:SimpleString Substitutions
James Y Knight
foom at fuhm.net
Tue Sep 14 20:12:35 CEST 2004
On Sep 14, 2004, at 2:54 AM, Terry Reedy wrote:
> This is why I am not especially enamored of Unicode and the prospect of
> Python becoming married to it. It is heavily weighted in favor of
> efficiently representing Chinese and inefficiently representing
> English.
> To give English equivalent treatment, the 20,000 or so most common
> words,
> roots, prefixes, and suffixes would each get its own codepoint.
Of course it is perfectly possible to have the Python unicode
implementation choose to represent some unicode strings with only 8
bits per character. There is no (conceptual) reason it could not
represent (u'a' * 8) with 8 bytes + class header overhead. That is
simply an implementation detail and really has nothing to do with
Unicode itself.
It would also be possible to use UTF-8 string storage, although this
has the tradeoff that indexing an element takes linear time w.r.t.
position instead of constant time.
James
More information about the Python-Dev
mailing list