[Python-3000] string module trimming

Jim Jewett jimjjewett at gmail.com
Wed Apr 18 23:59:40 CEST 2007


On 4/18/07, Guido van Rossum <guido at python.org> wrote:
> On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> > On 4/17/07, Guido van Rossum <guido at python.org> wrote:
> > > The locale module doesn't deal with Unicode, only with 8-bit characters (not
> > > multi-byte characters). You'll lose this anyway. Certainly
> > > string.letters is not going to provide this functionality.

> > But for languages in Latin1, 8-bit characters are sufficient --
> > anything with more than 8 bits is by definition not a (local) letter.

> Latin-1 is just another encoding (and not a very useful one given that
> it can't encode all of Unicode). I don't want to define a feature that
> only works for Latin-1.

Today, string.letters works most easily with ASCII supersets, and is
effectively limited to 8-bit encodings.  Once everything is unicode, I
don't think that 8-bit restriction should apply any more.

> > I won't swear that localizations currently replace string.letters with
> > the appropriately ordered (slight) superset, but it is a valid use
> > case, and string* (or text*) is clearly the right place.

> The right solution for locale-dependent collation for sure isn't
> having a string containing all the letters in the right order. There
> are plenty of languages where that approach doesn't even work.

Theoretically, English is one of those non-working languages.   (Names
in bibliographic entries are supposed to be alphabetized according to
language of origin.)

In practice, ordered-list-of-chars works well enough, often enough.
It often works better than sorting by code point, which is the only
obvious alternative.

Unless I missed it (and I may have), unicode itself sort of ducks the
question about how to sort strings.  Python really needs to provide
*an* answer, but I'm not sure it is possible to provide the (single)
correct answer.

string.letters is one workaround, and I don't think we should remove
it until a better solution (or workaround) is available.

-jJ


More information about the Python-3000 mailing list