trying to strip out non ascii.. or rather convert non ascii

wxjmfauth at gmail.com wxjmfauth at gmail.com
Wed Oct 30 04:49:28 EDT 2013


Le mardi 29 octobre 2013 06:24:50 UTC+1, Steven D'Aprano a écrit :
> On Mon, 28 Oct 2013 09:23:41 -0500, Tim Chase wrote:
> 
> 
> 
> > On 2013-10-28 07:01, wxjmfauth at gmail.com wrote:
> 
> >>> Simply ignoring diactrics won't get you very far.
> 
> >> 
> 
> >> Right. As an example, these four French words : cote, côte, coté, côté
> 
> >> .
> 
> > 
> 
> > Distinct words with distinct meanings, sure.
> 
> > 
> 
> > But when a naïve (naive? ☺) person or one without the easy ability to
> 
> > enter characters with diacritics searches for "cote", I want to return
> 
> > possible matches containing any of your 4 examples.  It's slightly
> 
> > fuzzier if they search for "coté", in which case they may mean "coté" or
> 
> > they might mean be unable to figure out how to add a hat and want to
> 
> > type "côté". Though I'd rather get more results, even if it has some
> 
> > that only match fuzzily.
> 
> 
> 
> The right solution to that is to treat it no differently from other fuzzy 
> 
> searches. A good search engine should be tolerant of spelling errors and 
> 
> alternative spellings for any letter, not just those with diacritics. 
> 
> Ideally, a good search engine would successfully match all three of 
> 
> "naïve", "naive" and "niave", and it shouldn't rely on special handling 
> 
> of diacritics.
> 
> 
> 
------

This is a non sense. The purpose of a diacritical mark is to
make a letter a different letter. If a tool is supposed to
match an ô, there is absolutely no reason to match something
else.

jmf




More information about the Python-list mailing list