string conversion latin2 to ascii

kyosohma at gmail.com kyosohma at gmail.com
Wed Nov 28 10:04:02 EST 2007


On Nov 27, 5:08 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Nov 28, 8:45 am, kyoso... at gmail.com wrote:
>
>
>
>
>
>
>
> > On Nov 27, 3:35 pm, Martin Landa <landa.mar... at gmail.com> wrote:
>
> > > Hi all,
>
> > > sorry for a newbie question. I have unicode string (or better say
> > > latin2 encoding) containing non-ascii characters, e.g.
>
> > > s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_SOA"
>
> > > I would like to convert this string to plain ascii (using some lookup
> > > table for latin2)
>
> > > to get
>
> > > -> Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA
>
> > > Thanks for any hits! Regards, Martin Landa
>
> > With a little googling, I found this:
>
> >http://www.peterbe.com/plog/unicode-to-ascii
>
> and if the OP has the patience to read *ALL* the comments on that blog
> entry, he will find that comment[-2] points to
>
> http://effbot.python-hosting.com/file/stuff/sandbox/text/unaccent.py
>
> and comment[-1] (from the blog owner) is "Brilliant! Thank you."
>
> The bottom line is that there is no universal easy solution; you need
> to handcraft a translation table suited to your particular purpose
> (e.g. do you want u-with-umlaut to become u or ue?). The
> unicodedata.normalize function is useful for off-line preparation of a
> set of candidate mappings for that table; it should not be applied
> either on-line or blindly.
>
> Cheers,
> John

Sorry...I didn't know about translation tables or I would have
mentioned that instead. My bad.

Mike



More information about the Python-list mailing list