[Python-Dev] PEP 383 update: utf8b is now the error handler

Thu May 7 07:43:30 CEST 2009

Michael Urman wrote:
> On Wed, May 6, 2009 at 15:42, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> Despite there being also an error handler called "surrogates".
> 
> Not that I have to be, but I'm not sold on the previous UTF-8 codec
> behavior becoming an error handler of the name "surrogates" for two
> reasons (I do respect the obvious PBP argument for the implementation,
> and have no better name - "lenient"?).

PBP?

> First, unless there's a way to stack error handlers, there's no way to
> access the old behavior combined with the "replace" handler.

Well, there is a way to stack error handlers, although it's not pretty:

_surrogates = codecs.lookup_errors("surrogates")
_replace = codecs.lookup_errors("replace")
def surrogates_then_replace(exc):
    try:
        return _surrogates(exc)
    except UnicodeError:
        return _replace(exc)
codecs.register_error("surrogates_then_replace",
                      surrogates_then_replace)

> The stacking argument also applies to the new utf8b behavior on encode
> (only, as it handles all errors on decode). This may be a YAGNI

Indeed - in particular, as, in the primary application of this error
handler (i.e. file IO operations), there is no way of specifying
an addition error handler anyway.

Regards,
Martin