convert unicode characters to visibly similar ascii characters

John Machin sjmachin at lexicon.net
Tue Jul 1 20:29:52 EDT 2008


On Jul 2, 9:55 am, Jim <jim.heffe... at gmail.com> wrote:
> Peter Bulychev wrote:
> > I want to convert unicode character into ascii one.
>
> You have to make some arbitrary choices of what to translate.  Based
> on some materials on  effbot's site, and a recipe, I made
>  ftp://alan.smcvt.edu/hefferon/unicode2ascii.py
> which has at least some of what you are looking for.
>   $ grep HYPHEN unicode2ascii.py
>     u'\N{SOFT HYPHEN}':u'-',
>     u'\N{HYPHEN}':u'-',
>     u'\N{NON-BREAKING HYPHEN}':u'-',
>     u'\N{SOFT HYPHEN}': '-',
> No doubt I have some terrible gaffes and some things missing.
> Corrections appreciated.

Comments on the above grep output:
1. You have SOFT HYPHEN twice, mapping it to u'-' and '-'
2. The idea of a soft hyphen is as a hint to a hyphenator about where
to insert a hyphen if one is necessary and the hyphenator is suspected
of acting cluelessly without the hint. IMHO, asciification should
substitute u'', not u'-'.
3. Read PEP 8. s/:/: /

Cheers,
John



More information about the Python-list mailing list