[Python-checkins] r82468 - python/branches/py3k/Objects/unicodeobject.c
ezio.melotti
python-checkins at python.org
Sat Jul 3 06:52:19 CEST 2010
Author: ezio.melotti
Date: Sat Jul 3 06:52:19 2010
New Revision: 82468
Log:
Update comment about surrogates.
Modified:
python/branches/py3k/Objects/unicodeobject.c
Modified: python/branches/py3k/Objects/unicodeobject.c
==============================================================================
--- python/branches/py3k/Objects/unicodeobject.c (original)
+++ python/branches/py3k/Objects/unicodeobject.c Sat Jul 3 06:52:19 2010
@@ -2450,11 +2450,11 @@
break;
case 3:
- /* XXX: surrogates shouldn't be valid UTF-8!
- see http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf
- (table 3-7) and http://www.rfc-editor.org/rfc/rfc3629.txt
- Uncomment the 2 lines below to make them invalid,
- codepoints: d800-dfff; UTF-8: \xed\xa0\x80-\xed\xbf\xbf. */
+ /* Decoding UTF-8 sequences in range \xed\xa0\x80-\xed\xbf\xbf
+ will result in surrogates in range d800-dfff. Surrogates are
+ not valid UTF-8 so they are rejected.
+ See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf
+ (table 3-7) and http://www.rfc-editor.org/rfc/rfc3629.txt */
if ((s[1] & 0xc0) != 0x80 ||
(s[2] & 0xc0) != 0x80 ||
((unsigned char)s[0] == 0xE0 &&
More information about the Python-checkins
mailing list