convert unicode characters to visibly similar ascii characters

Terry Reedy tjreedy at udel.edu
Tue Jul 1 15:46:45 EDT 2008



Peter Bulychev wrote:
> Hello.
> 
> I want to convert unicode character into ascii one.
> The method ".encode('ASCII') " can convert only those unicode 
> characters, which fit into 0..128 range.
> 
> But there are still lots of characters beyond this range, which can be 
> manually converted to some visibly similar ascii characters. For 
> instance, there are several quotation marks in unicode, which can be 
> converted into ascii quotation mark.
> 
> Can this conversion be performed in automatic manner? After googling 
> I've only found that there exists Unicode database, which stores 
> human-readable information on notation of all unicode characters 
> (ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt). And there also 
> exists the Python adapter for this database 
> (http://docs.python.org/lib/module-unicodedata.html). Using this 
> database I can do something like `if 
> notation.find('QUOTATION')!=-1:\n\treturn "'"`. I believe there is more 
> elegant way. Am I right?

I believe you will have to make up your own translation dictionary for 
the translations *you* want.  You should then be able to use that with 
the .translate() method.

tjr




More information about the Python-list mailing list