[Python-Dev] Unicode mapping tables

M.-A. Lemburg mal@lemburg.com
Tue, 29 Feb 2000 15:03:14 +0100


I am just coding the translate method for Unicode objects and
have come along a design question that may have some importance
with resp. to speed and memory allocation size.

Currently, mapping tables map characters to Unicode characters
and vice-versa. Now the .translate method will use a different
kind of table: mapping integer ordinals to integer ordinals.

Question: What is more of efficient: having lots of integers
in a dictionary or lots of characters ?

Another aspect of this question is: the translate method
will be able to handle sequences *and* mappings because it
looks up integers which can be interpreted as indexes as well
as dictionary keys. The character mapping codec uses characters
as key and thus only allows dictionaries to be used (the reason
is that in some future version it should be possible to
map single characters to multiple characters or even combinations
to bnew combinations).

BTW, I dropped the deletions argument from the translate method:
it is not needed, since a mapping to None will have the same effect.
Note that not specifying a mapping causes the characters to be
copied as-is. This has the nice side-effect of grealty reducing
the mapping table's size.

Note that there will be no .maketrans() method. The same functionality
can easily be coded in Python if needed and doesn't fit into the
OO-style nature of string and Unicode objects anymore.

--

Something else that changed is the way .capitalize() works. The
Unicode version uses the Unicode algorithm for it (see TechRep. 13
on the www.unicode.org site). Here's the new doc string:

S.capitalize() -> unicode

Return a capitalized version of S, i.e. words start with title case
characters, all remaining cased characters have lower case.

Note that *all* characters are touched, not just the first one.
The change was needed to get it in sync with the .iscapitalized()
method which is based on the Unicode algorithm too.

Should this change be propogated to the string implementation ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/