[issue12892] UTF-16 and UTF-32 codecs should reject (lone) surrogates

Marc-Andre Lemburg report at bugs.python.org
Mon Sep 2 17:53:23 CEST 2013


Marc-Andre Lemburg added the comment:

You should be able to squeeze out some extra cycles by
avoiding the bit calculations using a simple range check
for ch >= 0xd800:

+# if STRINGLIB_MAX_CHAR >= 0xd800
+            if (((ch1 ^ 0xd800) &
+                 (ch1 ^ 0xd800) &
+                 (ch1 ^ 0xd800) &
+                 (ch1 ^ 0xd800) & 0xf800) == 0)
+                break;
+# endif

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12892>
_______________________________________


More information about the Python-bugs-list mailing list