utf - string translation

Fredrik Lundh fredrik at pythonware.com
Wed Nov 29 17:11:27 EST 2006


John Machin wrote:

> Another point: there are many non-latin1 characters that could be
> mapped to ASCII. For example:
>     u"\u0141ukasziewicz".translate(unaccented_map())
> doesn't work unless an entry is added to the no-decomposition table:
>     0x0141: u"L", # LATIN CAPITAL LETTER L WITH STROKE
> 
> It looks like generating extra entries like that could be done, with
> the aid of unicodedata.name():
> 
> LATIN CAPITAL LETTER X WITH blahblah -> "X"
> LATIN SMALL LETTER X WITH blahblah -> "X".lower()
> 
> This would require a fair bit of care -- obviously there are special
> cases like LATIN CAPITAL LETTER O WITH STROKE. Eyeballing by regional
> experts is probably required.

see the comments over at

     http://effbot.org/zone/unicode-convert.htm

for an extended table, eyeballed by a regional expert (and since he 
makes the same point about OE vs Oe as you do, I'll probably have to 
change the code ;-)

</F>




More information about the Python-list mailing list