ascii to latin1

Robert Kern robert.kern at gmail.com
Mon May 8 20:25:13 EDT 2006


Luis P. Mendes wrote:

> example:
> if the word searched is 'televisão', I want that a search by either
> 'televisao', 'televisão' or even 'télévisao' (this last one doesn't
> exist in Portuguese) is successful.

The ICU library has the capability to transliterate strings via certain
rulesets. One such ruleset would transliterate all of the above to 'televisao'.
That transliteration could act as a normalization step akin to stemming.

There are one or two Python bindings out there. Google for PyICU. I don't recall
if it exposes the transliteration API or not.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco




More information about the Python-list mailing list