[I18n-sig] Re: validity of lone surrogates
Guido van Rossum
guido@digicool.com
Wed, 27 Jun 2001 10:16:47 -0400
[Gaute]
> My take on this is that the various UTF codecs should follow the specs
> to the letter and reject antything else in default mode. There should
> also be a "lenient" or "forgiving" mode in which the codec does its
> best to interpret and repair broken, nonsensical or irregular data.
> Off course, if an application uses this mode then it will have to be
> aware of the dangers involved, including the security aspects.
Python's codec mechanism has a nice API gimmick: you can pass an error
handling option. Currently, this can be 'strict', 'ignore', or
'replace'. I wonder if we could add a fourth mode, 'lenient', that
tries its best to encode anything passed in?
--Guido van Rossum (home page: http://www.python.org/~guido/)