trying to strip out non ascii.. or rather convert non ascii

wxjmfauth at gmail.com wxjmfauth at gmail.com
Wed Oct 30 14:38:31 EDT 2013


Le mercredi 30 octobre 2013 18:54:05 UTC+1, Michael Torrie a écrit :
> On 10/30/2013 10:08 AM, wxjmfauth at gmail.com wrote:
> 
> > My comment had nothing to do with Python, it was a
> 
> > general comment. A diacritical mark just makes a letter
> 
> > a different letter; a "ï " and a "i" are "as
> 
> > diferent" as a "a" from a "z". A diacritical mark
> 
> > is more than a simple ornementation.
> 
> 
> 
> That's nice, but you didn't actually read what Ned said (or the OP).
> 
> The OP doesn't care that "ï " and a "i" are as different as "a" and "z".
> 
>  For the purposes of his search he wants them treated as the same
> 
> letter.  A fuzzy searching treats them all the same. For example, a
> 
> search for "Godel, Escher, Bach" should find "Gödel, Escher, Bach" just
> 
> fine.  Even though "o" and "ö" are different characters.  And lo and
> 
> behold Google actually does this!  Try it.  It's nice for those of use
> 
> who want to find something and our US keyboards don't have the right marks.
> 
> 
> 
> https://www.google.ca/search?q=godel+escher+bach
> 
> 
> 
> After all this nonsense, that's what the original poster is looking for
> 
> (I think... can't be sure since it's been so many days now).  Seems to
> 
> me a python module does this quite nicely:
> 
> 
> 
> https://pypi.python.org/pypi/Unidecode


Ok. You are right. I recognize my mistake. Independently
from the top poster's task, I did not understand in that
way.

Let say it depends on the context, for a general
search engine, it's good that diacritics are ignored.
For, let say, a text processing system, it's good
to have only precised matches. It does not mean, other
matching possibilities may exist.

jmf




More information about the Python-list mailing list