[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Glenn Linderman
report at bugs.python.org
Wed Aug 24 09:04:54 CEST 2011
Glenn Linderman <v+python at g.nevcal.com> added the comment:
In msg142098 Ezio said:
> Keep in mind that we should be able to access and use lone surrogates too, therefore:
> s = '\ud800' # should be valid
> len(s) # should this raise an error? (or return 0.5 ;)?
I say:
For streams and data types in which lone surrogates are permitted, a lone surrogate should be treated as and counted as a character (codepoint).
For streams and data types in which lone surrogates are not permitted, the assigned should be invalid, and raise an error; len would then never see it, and has no quandary.
----------
nosy: +v+python
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12729>
_______________________________________
More information about the Python-bugs-list
mailing list