'Straße' ('Strasse') and Python 2

Robin Becker robin at reportlab.com
Wed Jan 15 07:50:10 EST 2014


On 15/01/2014 12:13, Ned Batchelder wrote:
........
>> On my utf8 based system
>>
>>
>>> robin at everest ~:
>>> $ cat ooo.py
>>> if __name__=='__main__':
>>>     import sys
>>>     s='A̅B'
>>>     print('version_info=%s\nlen(%s)=%d' % (sys.version_info,s,len(s)))
>>> robin at everest ~:
>>> $ python ooo.py
>>> version_info=sys.version_info(major=3, minor=3, micro=3,
>>> releaselevel='final', serial=0)
>>> len(A̅B)=3
>>> robin at everest ~:
>>> $
>>
>>
........
> You are right that more than one codepoint makes up a grapheme, and that you'll
> need code to deal with the correspondence between them. But let's not muddy
> these already confusing waters by referring to that mapping as an encoding.
>
> In Unicode terms, an encoding is a mapping between codepoints and bytes.  Python
> 3's str is a sequence of codepoints.
>
Semantics is everything. For me graphemes are the endpoint (or should be); to 
get a proper rendering of a sequence of graphemes I can use either a sequence of 
bytes or a sequence of codepoints. They are both encodings of the graphemes; 
what unicode says is an encoding doesn't define what encodings are ie mappings 
from some source alphabet to a target alphabet.
-- 
Robin Becker




More information about the Python-list mailing list