Not x.islower() has different output than x.isupper() in list output...

DFS nospam at dfs.com
Wed May 4 10:09:11 EDT 2016


On 5/3/2016 11:28 PM, Steven D'Aprano wrote:
> On Wed, 4 May 2016 12:49 am, Jussi Piitulainen wrote:
>
>> DFS writes:
>>
>>> On 5/3/2016 9:13 AM, Chris Angelico wrote:
>>
>>>> It doesn't invert, the way numeric negation does.
>>>
>>> What do you mean by 'case inverted'?
>>>
>>> It looks like it swaps the case correctly between upper and lower.
>>
>> There's letters that do not come in exact pairs of upper and lower case,
>
> Languages with two distinct lettercases, like English, are called bicameral.
> The two cases are technically called majuscule and minuscule, but
> colloquially known as uppercase and lowercase since movable type printers
> traditionally used to keep the majuscule letters in a drawer above the
> minuscule letters.
>
> Many alphabets are unicameral, that is, they only have a single lettercase.
> Examples include Hebrew, Arabic, Hangul, and many others. Georgian is an
> interesting example, as it is the only known written alphabet that started
> as a bicameral script and then became unicameral.
>
> Consequently, many letters are neither upper nor lower case, and have
> Unicode category "Letter other":
>
> py> c = u'\N{ARABIC LETTER FEH}'
> py> unicodedata.category(c)
> 'Lo'
> py> c.isalpha()
> True
> py> c.isupper()
> False
> py> c.islower()
> False
>
>
> Even among bicameral alphabets, there are a few anomalies. The three most
> obvious ones are Greek sigma, German Eszett (or "sharp S") and Turkish I.
>
> (1) The Greek sigma is usually written as Σ or σ in uppercase and lowercase
> respectively, but at the end of a word, lowercase sigma is written as ς.
>
> (This final sigma is sometimes called "stigma", but should not be confused
> with the archaic Greek letter stigma, which has two cases Ϛ ϛ, at least
> when it is not being written as digamma Ϝϝ -- and if you're confused, so
> are the Greeks :-)
>
> Python 3.3 correctly handles the sigma/final sigma when upper- and
> lowercasing:
>
> py> 'ΘΠΣΤΣ'.lower()
> 'θπστς'
>
> py> 'ΘΠΣΤΣ'.lower().upper()
> 'ΘΠΣΤΣ'
>
>
>
> (2) The German Eszett ß traditionally existed in only lowercase forms, but
> despite the existence of an uppercase form since at least the 19th century,
> when the Germans moved away from blackletter to Roman-style letters, the
> uppercase form was left out. In recent years, printers in Germany have
> started to reintroduce an uppercase version, and the German government have
> standardized on its use for placenames, but not other words.
>
> (Aside: in Germany, ß is not considered a distinct letter of the alphabet,
> but a ligature of ss; historically it derived from a ligature of ſs, ſz or
> ſʒ. The funny characters you may or may not be able to see are the long-S
> and round-Z.)
>
> Python follows common, but not universal, German practice for eszett:
>
> py> 'ẞ'.lower()
> 'ß'
> py> 'ß'.upper()
> 'SS'
>
> Note that this is lossy: given a name like "STRASSER", it is impossible to
> tell whether it should be title-cased to "Strasser" or "Straßer". It also
> means that uppercasing a string can make it longer.
>
>
> For more on the uppercase eszett, see:
>
> https://typography.guru/journal/germanys-new-character/
> https://typography.guru/journal/how-to-draw-a-capital-sharp-s-r18/
>
>
> (3) In most Latin alphabets, the lowercase i and j have a "tittle" diacritic
> on them, but not the uppercase forms I and J. Turkish and a few other
> languages have both I-with-tittle and I-without-tittle.
>
> (As far as I know, there is no language with a dotless J.)
>
> So in Turkish, the correct uppercase to lowercase and back again should go:
>
> Dotless I: I -> ı -> I
>
> Dotted I: İ -> i -> İ
>
> Python does not quite manage to handle this correctly for Turkish
> applications, since it loses the dotted/dotless distinction:
>
> py> 'ı'.upper()
> 'I'
> py> 'İ'.lower()
> 'i'
>
> and further case conversions follow the non-Turkish rules.
>
> Note that sometimes getting this wrong can have serious consequences:
>
> http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail


Linguist much?





More information about the Python-list mailing list