trying to strip out non ascii.. or rather convert non ascii

Michael Torrie torriem at gmail.com
Wed Oct 30 13:54:05 EDT 2013


On 10/30/2013 10:08 AM, wxjmfauth at gmail.com wrote:
> My comment had nothing to do with Python, it was a
> general comment. A diacritical mark just makes a letter
> a different letter; a "ï " and a "i" are "as
> diferent" as a "a" from a "z". A diacritical mark
> is more than a simple ornementation.

That's nice, but you didn't actually read what Ned said (or the OP).
The OP doesn't care that "ï " and a "i" are as different as "a" and "z".
 For the purposes of his search he wants them treated as the same
letter.  A fuzzy searching treats them all the same. For example, a
search for "Godel, Escher, Bach" should find "Gödel, Escher, Bach" just
fine.  Even though "o" and "ö" are different characters.  And lo and
behold Google actually does this!  Try it.  It's nice for those of use
who want to find something and our US keyboards don't have the right marks.

https://www.google.ca/search?q=godel+escher+bach

After all this nonsense, that's what the original poster is looking for
(I think... can't be sure since it's been so many days now).  Seems to
me a python module does this quite nicely:

https://pypi.python.org/pypi/Unidecode



More information about the Python-list mailing list