Py 3.3, unicode / upper()

Wed Dec 19 16:23:15 EST 2012

On Wed, Dec 19, 2012 at 1:55 PM,  <wxjmfauth at gmail.com> wrote:
> Yes, it is correct (or can be considered as correct).
> I do not wish to discuss the typographical problematic
> of "Das Grosse Eszett". The web is full of pages on the
> subject. However, I never succeeded to find an "official
> position" from Unicode. The best information I found seem
> to indicate (to converge), U+1E9E is now the "supported"
> uppercase form of U+00DF. (see DIN).

Is this link not official?

http://unicode.org/cldr/utility/character.jsp?a=00DF

That defines a full uppercase mapping to SS and a simple uppercase
mapping to U+00DF itself, not U+1E9E.  My understanding of the simple
mapping is that it is not allowed to map to multiple characters,
whereas the full mapping is so allowed.

> What is bothering me, is more the implementation. The Unicode
> documentation says roughly this: if something can not be
> honoured, there is no harm, but do not implement a workaroud.
> In that case, I'm not sure Python is doing the best.

But this behavior is per the specification, not a workaround.  I think
the worst thing we could do in this regard would be to start diverging
from the specification because we think we know better than the
Unicode Consortium.

> If "wrong", this can be considered as programmatically correct
> or logically acceptable (Py3.2)
>
>>>> 'Straße'.upper().lower().capitalize() == 'Straße'
> True
>
> while this will *always* be problematic (Py3.3)
>
>>>> 'Straße'.upper().lower().capitalize() == 'Straße'
> False

On the other hand (Py3.2):

>>> 'Straße'.upper().isupper()
False

vs. Py3.3:

>>> 'Straße'.upper().isupper()
True

There is probably no one clearly correct way to handle the problem,
but personally this contradiction bothers me more than the example
that you posted.