[Python-Dev] utf8 issue

Guido van Rossum guido@python.org
Mon, 26 Aug 2002 10:05:20 -0400


> Guido van Rossum <guido@python.org> writes:
> 
> > This might beling on SF, except it's already been solved in Python
> > 2.3, and I need guidance about what to do for Python 2.2.2.
> > 
> > In 2.2.1, a lone surrogate encoded into utf8 gives an utf8 string that
> > cannot be decode back.  In 2.3, this is fixed.  Should this be fixed
> > in 2.2.2 as well?
> 
> I think this was discussed really quite a long time ago, like six
> months or so.
> 
> > I'm asking because it caused problems with reading .pyc files: if
> > there's a Unicode literal containing a lone surrogate, reading the
> > .pyc file causes an exception:
> > 
> > UnicodeError: UTF-8 decoding error: unexpected code byte
> > 
> > It looks like revision 2.128 fixed this for 2.3, but that patch
> > doesn't cleanly apply to the 2.2 maintenance branch.  Can someone
> > help?
> 
> I think the reason this didn't get fixed in 2.2.1 is that it
> necessitates bumping MAGIC.
> 
> I can probably dig up more references if you want.

Please do.  Bumping MAGIC is a no-no between dot releases.  But I
don't understand why that is necessary?

--Guido van Rossum (home page: http://www.python.org/~guido/)