string conversion latin2 to ascii

John Machin sjmachin at lexicon.net
Tue Nov 27 18:08:25 EST 2007


On Nov 28, 8:45 am, kyoso... at gmail.com wrote:
> On Nov 27, 3:35 pm, Martin Landa <landa.mar... at gmail.com> wrote:
>
> > Hi all,
>
> > sorry for a newbie question. I have unicode string (or better say
> > latin2 encoding) containing non-ascii characters, e.g.
>
> > s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_SOA"
>
> > I would like to convert this string to plain ascii (using some lookup
> > table for latin2)
>
> > to get
>
> > -> Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA
>
> > Thanks for any hits! Regards, Martin Landa
>
> With a little googling, I found this:
>
> http://www.peterbe.com/plog/unicode-to-ascii

and if the OP has the patience to read *ALL* the comments on that blog
entry, he will find that comment[-2] points to

http://effbot.python-hosting.com/file/stuff/sandbox/text/unaccent.py

and comment[-1] (from the blog owner) is "Brilliant! Thank you."

The bottom line is that there is no universal easy solution; you need
to handcraft a translation table suited to your particular purpose
(e.g. do you want u-with-umlaut to become u or ue?). The
unicodedata.normalize function is useful for off-line preparation of a
set of candidate mappings for that table; it should not be applied
either on-line or blindly.

Cheers,
John



More information about the Python-list mailing list