[I18n-sig] Unicode surrogates: just say no!

Toby Dickenson tdickenson@geminidataloggers.com
Tue, 26 Jun 2001 13:49:12 +0100


On Tue, 26 Jun 2001 04:51:38 -0400, Guido van Rossum
<guido@digicool.com> wrote:

>I see only one remaining argument against choosing 3 over 2: FUD about
>disk and promary memory space usage.

In previous discussion about unifying plain strings an unicode
strings, someone (I forget who, sorry) proposed that a unified string
type that would store its data in arrays of either 1 or 2 byte
elements (depending what was efficient for each string) but provide a
unified interface independant of storage option.

Could the same option be used to support an option E, individual
strings use UCS-4 if they have to, but otherwise gain the space
advantages of UCS-2?

>
>A. At some Python version, we switch.
>
>B. Choose between 1 and 3 based on the platform.
>
>C. Make it a configuration-time choice.
>
>D. Make it a run-time choice.

Toby Dickenson
tdickenson@geminidataloggers.com