[I18n-sig] Re: validity of lone surrogates

Guido van Rossum guido@digicool.com
Wed, 27 Jun 2001 10:16:47 -0400


[Gaute]
> My take on this is that the various UTF codecs should follow the specs
> to the letter and reject antything else in default mode.  There should
> also be a "lenient" or "forgiving" mode in which the codec does its
> best to interpret and repair broken, nonsensical or irregular data.
> Off course, if an application uses this mode then it will have to be
> aware of the dangers involved, including the security aspects.

Python's codec mechanism has a nice API gimmick: you can pass an error
handling option.  Currently, this can be 'strict', 'ignore', or
'replace'.  I wonder if we could add a fourth mode, 'lenient', that
tries its best to encode anything passed in?

--Guido van Rossum (home page: http://www.python.org/~guido/)