trying to strip out non ascii.. or rather convert non ascii

wxjmfauth at gmail.com wxjmfauth at gmail.com
Mon Oct 28 10:01:16 EDT 2013


Le dimanche 27 octobre 2013 04:21:46 UTC+1, Nobody a écrit :
> 
> 
> 
> Simply ignoring diactrics won't get you very far.
> 
> 

Right. As an example, these four French words :
cote, côte, coté, côté .

> 
> Most languages which use diactrics have standard conversions, e.g.
> 
> ö -> oe, which are likely to be used by anyone familiar with the
> 
> language e.g. when using software (or a keyboard) which can't handle
> 
> diactrics.
> 
> 

I'm quite confortable with Unicode, esp. with the
Latin blocks.
Except this German case (I remember very old typewriters),
what are the other languages presenting this kind of
allowed feature ?

Just as a reminder. They are 1272 characters considered
as Latin characters (how to count them it not a simple
task), and if my knowledge is correct, they are covering
and/or are here to cover the 17 languages, to be exact,
the 17 European languages based on a Latin alphabet which
can not be covered with iso-8859-1.

And of course, logically, they are very, very badly handled
with the Flexible String Representation.

jmf




More information about the Python-list mailing list