trying to strip out non ascii.. or rather convert non ascii

Nobody nobody at nowhere.com
Sat Oct 26 23:21:46 EDT 2013


On Sat, 26 Oct 2013 20:41:58 -0500, Tim Chase wrote:

> I'd be just as happy if Python provided a "sloppy string compare"
> that ignored case, diacritical marks, and the like.

Simply ignoring diactrics won't get you very far.

Most languages which use diactrics have standard conversions, e.g.
ö -> oe, which are likely to be used by anyone familiar with the
language e.g. when using software (or a keyboard) which can't handle
diactrics.

OTOH, others (particularly native English speakers) may simply discard the
diactric. So to be of much use, a fuzzy match needs to handle either
possibility.




More information about the Python-list mailing list