[Python-Dev] PEP 393 Summer of Code Project

Antoine Pitrou solipsis at pitrou.net
Thu Sep 1 18:03:47 CEST 2011


Le jeudi 01 septembre 2011 à 08:45 -0700, Guido van Rossum a écrit :
> This is definitely thought of as a separate
> mark added to the e; ë is not a new letter. I have a feeling it's the
> same way for the French and Germans, but I really don't know.
> (Antoine? Georg?)

Indeed, they are not separate "letters" (they are considered the same in
lexicographic order, and the French alphabet has 26 letters).

But I'm not sure how it's relevant, because you can't remove an accent
without most likely making a spelling error, or at least changing the
meaning. Accents are very much part of the language (while ligatures
like "ff" are not, they are a rendering detail). So I would consider
"é", "ê", "ù", etc. atomic characters for the purpose of processing
French text. And I don't see how a decomposed form could help an
application.

Regards

Antoine.




More information about the Python-Dev mailing list