[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

Glenn Linderman report at bugs.python.org
Wed Aug 24 09:04:54 CEST 2011


Glenn Linderman <v+python at g.nevcal.com> added the comment:

In msg142098  Ezio said:
> Keep in mind that we should be able to access and use lone surrogates too, therefore:
> s = '\ud800'  # should be valid
> len(s)  # should this raise an error? (or return 0.5 ;)?

I say:
For streams and data types in which lone surrogates are permitted, a lone surrogate should be treated as and counted as a character (codepoint).

For streams and data types in which lone surrogates are not permitted, the assigned should be invalid, and raise an error; len would then never see it, and has no quandary.

----------
nosy: +v+python

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12729>
_______________________________________


More information about the Python-bugs-list mailing list