String methods understanding anything but ASCII?

Magnus Lie Hetland mlh at furu.idi.ntnu.no
Sun Jan 19 16:04:31 EST 2003


I just wondered -- is there hope that string methods such as upper()
or capitalize() will ever understand anything other than ascii? How
about, e.g. iso8859-1 (which does seem to be the default encoding)? As
a Scandinavian, I'd love to see 'ø'.upper() == 'Ø', for example.
It seems that the only way of achieving this sort of thing at present
is through ugly hacks of various kinds (especially with methods such
as capitalize() and title(), where a plain translate() can't do the
job).

So -- is there any hope of this? Or are there any convenient ways of
dealing with it, or perhaps a library somewhere?

Perhaps some extra parameters to the methods would help -- like the
new one to strip and friends? For example, one could have two strings
with matching lower- and uppercase letters as optional extra
arguments? E.g:

>>> print 'ø'.upper()
ø
>>> print 'ø'.upper('æøå','ÆØÅ')
Ø

The same parameter scheme could be used for capitalize(), title(),
istitle(), and so on, and so forth.

Seems like a nice compromise to me, i.e. you don't have to add the
domain-knowledge about various encodings to Python itself, but you
have means for supplying some to the methods...

Opinions? Does this warrant a PEP? Are there applications and/or
problems I haven't thought of?

-- 
Magnus Lie Hetland
http://hetland.org




More information about the Python-list mailing list