'Straße' ('Strasse') and Python 2

Robin Becker robin at reportlab.com
Wed Jan 15 07:00:51 EST 2014


On 12/01/2014 07:50, wxjmfauth at gmail.com wrote:
>>>> sys.version
> 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)]
>>>> s = 'Straße'
>>>> assert len(s) == 6
>>>> assert s[5] == 'e'
>>>>
>
> jmf
>

On my utf8 based system


> robin at everest ~:
> $ cat ooo.py
> if __name__=='__main__':
>     import sys
>     s='A̅B'
>     print('version_info=%s\nlen(%s)=%d' % (sys.version_info,s,len(s)))
> robin at everest ~:
> $ python ooo.py
> version_info=sys.version_info(major=3, minor=3, micro=3, releaselevel='final', serial=0)
> len(A̅B)=3
> robin at everest ~:
> $


so two 'characters' are 3 (or 2 or more) codepoints. If I want to isolate so 
called graphemes I need an algorithm even for python's unicode ie when it really 
matters, python3 str is just another encoding.
-- 
Robin Becker




More information about the Python-list mailing list