[Python-Dev] Support for "wide" Unicode characters

Martin von Loewis loewis@informatik.hu-berlin.de
Sun, 1 Jul 2001 15:52:58 +0200 (MEST)


> The problem I have with this PEP is that it is a compile time option
> which makes it hard to work with both 32 bit and 16 bit strings in
> one program.

Can you elaborate why you think this is a problem?

> Can not the 32 bit string type be introduced as an additional type?

Yes, but not just "like that". You'd have to define an API for
creating values of this type, you'd have to teach all functions which
ought to accept it to process it, you'd have to define conversion
operations and all that: In short, you'd have to go through all the
trouble that introduction of the Unicode type gave us once again.
Also, I cannot see any advantages in introducing yet another type.

Implementing this PEP is straight forward, and with almost no visible
effect to Python programs.

People have suggested to make it a run-time decision, having the
internal representation switch on demand, but that would give an API
nightmare for C code that has to access such values.

> u[i] is a character. If u is Unicode, then u[i] is a Python Unicode
> character.

>  This wasn't usefully true in the past for DBCS strings and is not the
> right way to think of either narrow or wide strings now. The idea
> that strings are arrays of characters gets in the way of dealing
> with many encodings and is the primary difficulty in localising
> software for Japanese.

While I don't know much about localising software for Japanese (*), I
agree that 'u[i] is a character' isn't useful to say in many cases. If
this is the old Python string type, I'd much prefer calling u[i] a
'byte'.

Regards,
Martin

(*) Methinks that the primary difficulty still is translating all the
documentation, and messages. Actually, keeping the translations
up-to-date is even more challenging.