[Python-Dev] Unicode mapping tables

M.-A. Lemburg mal@lemburg.com
Wed, 01 Mar 2000 14:32:02 +0100


Guido van Rossum wrote:
> 
> > Here's what I'll do:
> >
> > * implement .capitalize() in the traditional way for Unicode
> >   objects (simply convert the first char to uppercase)
> > * implement u.title() to mean the same as Java's toTitleCase()
> > * don't implement s.title(): the reasoning here is that it would
> >   confuse the user when she get's different return values for
> >   the same string (titlecase chars usually live in higher Unicode
> >   code ranges not reachable in Latin-1)
> 
> Huh?  For ASCII at least, titlecase seems to map to ASCII; in your
> current implementation, only two Latin-1 characters (u'\265' and
> u'\377', I have no easy way to show them in Latin-1) map outside the
> Latin-1 range.

You're right, sorry for the confusion. I was thinking of other
encodings like e.g. cp437 which have corresponding characters
in the higher Unicode ranges.

> Anyway, I would suggest to add a title() call to 8-bit strings as
> well; then we can do away with string.capwords(), which does something
> similar but different, mostly by accident.

Ok, I'll do it this way then: s.title() will use C's toupper() and
tolower() for case mapping and u.title() the Unicode routines.

This will be in sync with the rest of the 8-bit string world
(which is locale aware on many platforms AFAIK), even though
it might not return the same string as the corresponding
u.title() call.

u.capwords() will be disabled in the Unicode implemetation...
it wasn't even implemented for the string implementetation,
so there's no breakage ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/