How to convert 'ö' to 'oe' or 'o' (or other similar things) in a string?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Sep 18 02:45:36 EDT 2016


On Sunday 18 September 2016 15:59, Thorsten Kampe wrote:

> * Martin Schöön (17 Sep 2016 20:20:12 GMT)
>> 
>> Den 2016-09-17 skrev Kouli <dev at kou.li>:
>> > Hello, try the Unidecode module - https://pypi.python.org/pypi/Unidecode.
>> >
>> > Kouli
>> >
>> > On Sat, Sep 17, 2016 at 6:12 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> >> Hi, I want to convert strings in which the characters with accents
>> >> should be converted to the ones without accents. Here is my current
>> >> code.
>> 
>> Side note from Sweden. Å, ä and ö are not accented characters in our
>> language. They are characters of their own.
> 
> I think he meant diacritics.

It doesn't matter whether you call them "accent" like most people do, or 
"diacritics" as linguists do. Either way, in some languages they are an 
integral part of the letter, like the horizonal stroke in English t or the 
vertical bar in English p and b, and in some languages they are modifiers, 
where there are rules that tell you how to write them without the modifier.

In English, i is a letter with a dot diacritic, sometimes called the "tittle". 
But unlike Turkish, we don't have a dotless i, ı, and to add to the confusion 
when we capitalise i we get I with no dot instead of İ. Dropping the dot, or 
adding one when you shouldn't, can *literally* get you killed:

http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-
more-in-jail

http://www.theinquirer.net/inquirer/news/1017243/cellphone-localisation-glitch


As far as I know, no natural language has a dotted capital J, but there is a 
dotless ȷ although I'm not sure what language it is from. (Possibly just used 
in mathematics?)



-- 
Steven
git gets easier once you get the basic idea that branches are homeomorphic 
endofunctors mapping submanifolds of a Hilbert space.




More information about the Python-list mailing list