[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Antoine Pitrou
report at bugs.python.org
Sat Aug 13 22:26:21 CEST 2011
Antoine Pitrou <pitrou at free.fr> added the comment:
> Here's why I say that Python uses UTF-16 not UCS-2 on its narrow builds.
> Perhaps someone could tell me why the Python documentation says it uses
> UCS-2 on a narrow build.
There's a disagreement on that point between several developers. See an example sub-thread at:
http://mail.python.org/pipermail/python-dev/2010-November/105751.html
> Since you are already using a variable-width encoding, why the
> supercilious attitude toward UTF-8?
I think you are reading too much into these decisions. It's simply that no-one took the time to write an alternative implementation and demonstrate its superiority. I also believe the original implementation was UCS-2 and surrogate support was added progressively during the years. Hence the terminological mess and the ad-hoc semantics.
I agree that going with UTF-8 and a clever indexing scheme would be a better solution.
----------
nosy: +pitrou
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12729>
_______________________________________
More information about the Python-bugs-list
mailing list