[Python-bugs-list] [ python-Bugs-433882 ] UTF-8: unpaired surrogates mishandled
noreply@sourceforge.net
noreply@sourceforge.net
Wed, 06 Feb 2002 10:11:05 -0800
Bugs item #433882, was opened at 2001-06-17 04:27
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=433882&group_id=5470
Category: Unicode
Group: None
Status: Open
Resolution: None
>Priority: 3
Submitted By: Nobody/Anonymous (nobody)
Assigned to: M.-A. Lemburg (lemburg)
Summary: UTF-8: unpaired surrogates mishandled
Initial Comment:
Two bugs:
1. UTF-8 encoding of unpaired high surrogate produces
an invalid UTF-8 byte sequence.
2. UTF-8 decoding of any unpaired surrogate produces
an exception ("illegal encoding") instead of the
corresponding 16-bit scalar value.
See attached file utf8bugs.py for example plus detailed
remarks.
----------------------------------------------------------------------
>Comment By: M.-A. Lemburg (lemburg)
Date: 2002-02-06 10:11
Message:
Logged In: YES
user_id=38388
I've checked in a patch which fixes bug 1 in the report.
I am unsure about "bug 2": I think that raising an exception is better than silently accepting bogus input data.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-16 03:50
Message:
Logged In: YES
user_id=38388
I'll look into this after I'm back from vacation on the 10.09.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2001-06-17 19:03
Message:
Logged In: YES
user_id=21627
I think the codec should reject unpaired surrogates both
when encoding and when decoding. I don't have a copy of
ISO 10646, but Unicode 3.1 points out
# ISO/IEC 10646 does not allow mapping of unpaired
surrogates, nor U+FFFE and U+FFFF (but it does allow other
noncharacters).
So apparently, encoding unpaired surrogates as UTF-8 is
not allowed according to ISO 10646. I think Python should
follow this rule, instead of the Unicode one.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=433882&group_id=5470