ascii to latin1
Robert Kern
robert.kern at gmail.com
Mon May 8 20:25:13 EDT 2006
Luis P. Mendes wrote:
> example:
> if the word searched is 'televisão', I want that a search by either
> 'televisao', 'televisão' or even 'télévisao' (this last one doesn't
> exist in Portuguese) is successful.
The ICU library has the capability to transliterate strings via certain
rulesets. One such ruleset would transliterate all of the above to 'televisao'.
That transliteration could act as a normalization step akin to stemming.
There are one or two Python bindings out there. Google for PyICU. I don't recall
if it exposes the transliteration API or not.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list
mailing list