[issue33108] Unicode char 304 in lowercase has len = 2

Tue Mar 20 10:18:08 EDT 2018

Kiril Dimitrov <kiril.dimitroff at gmail.com> added the comment:

This is roughly my use case:
zip( "ßx", [0.5, 0.3]) is [('ß', 0.5), ('x', 0.3)]
zip("ßx".upper(), [0.5, 0.3])  will be [('S', 0.5), ('S', 0.3)] in later
case you never get to see the value for 'x'.

At least my expectation was that lower and upper should preserve text
length. At least this seemed to be the case in python2.7

2018-03-20 15:28 GMT+02:00 INADA Naoki <report at bugs.python.org>:

>
> INADA Naoki <songofacandy at gmail.com> added the comment:
>
> Another example:
>
> >>> s = "ß"
> >>> len(s)
> 1
> >>> len(s.upper())
> 2
> >>> s.upper()
> 'SS'
> >>> ord(s)
> 223
>
>
> > This breaks unicode text matching.
>
> What do you talking about? re module?
>
> ----------
> nosy: +inada.naoki
>
> _______________________________________
> Python tracker <report at bugs.python.org>
> <https://bugs.python.org/issue33108>
> _______________________________________
>

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue33108>
_______________________________________