[issue12892] UTF-16 and UTF-32 codecs should reject (lone) surrogates
Marc-Andre Lemburg
report at bugs.python.org
Mon Sep 2 17:53:23 CEST 2013
Marc-Andre Lemburg added the comment:
You should be able to squeeze out some extra cycles by
avoiding the bit calculations using a simple range check
for ch >= 0xd800:
+# if STRINGLIB_MAX_CHAR >= 0xd800
+ if (((ch1 ^ 0xd800) &
+ (ch1 ^ 0xd800) &
+ (ch1 ^ 0xd800) &
+ (ch1 ^ 0xd800) & 0xf800) == 0)
+ break;
+# endif
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12892>
_______________________________________
More information about the Python-bugs-list
mailing list