[I18n-sig] Re: validity of lone surrogates

Walter Dörwald walter@livinglogic.de
Wed, 27 Jun 2001 18:56:00 +0200


Guido van Rossum wrote:
> 
> [Gaute]
> > My take on this is that the various UTF codecs should follow the spec=
s
> > to the letter and reject antything else in default mode.  There shoul=
d
> > also be a "lenient" or "forgiving" mode in which the codec does its
> > best to interpret and repair broken, nonsensical or irregular data.
> > Off course, if an application uses this mode then it will have to be
> > aware of the dangers involved, including the security aspects.
> 
> Python's codec mechanism has a nice API gimmick: you can pass an error
> handling option.  Currently, this can be 'strict', 'ignore', or
> 'replace'.  I wonder if we could add a fourth mode, 'lenient', that
> tries its best to encode anything passed in?

How would this work together with the proposed encode error handling
callback feature (see patch #432401)? Does this patch have any change of
getting into Python (when it's finished)?

Bye,
	Walter Dörwald