convert Unicode to lower/uppercase?

Peter Otten __peter__ at web.de
Mon Sep 22 03:39:24 EDT 2003


"Martin v. Löwis" wrote:

> jallan wrote:
> 
>> But that really doesn't work properly. According to Unicode specs and
>> German usage the uppercase of "ß" is actually "SS", that is the single
>> character "ß" should uppercase to two characters.
> 
> Can you cite exact chapter and verse of the Unicode specs that say so?
> According to the Unicode database,
> 
> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
> 
> has neither an uppercase mapping, nor a lowercase mapping.

It seems like UnicodeData.txt does not give the full story. Quoting from
http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt:

[...]
# (For compatibility, the UnicodeData.txt file only contains case mappings
for
# characters where they are 1-1, and does not have locale-specific
mappings.)
[...]
# <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment>
[...]
# The German es-zed is special--the normal mapping is to SS.
# Note: the titlecase should never occur in practice. It is equal to
titlecase(uppercase(<es-zed>))

00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S
[...]

Thus, to comply with the standard, "ß".upper() --> "SS" is required.

> Also, in German, the uppercase mapping of ß is of ongoing debate.

My personal impression is that, even before the orthography reform in 1998,
the SZ variant was seldom used.
For the "official" rule see http://www.ids-mannheim.de/reform/a2-3.html.

Peter




More information about the Python-list mailing list