[Python-Dev] PEP 383 update: utf8b is now the error handler
"Martin v. Löwis"
martin at v.loewis.de
Thu May 7 07:43:30 CEST 2009
Michael Urman wrote:
> On Wed, May 6, 2009 at 15:42, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> Despite there being also an error handler called "surrogates".
>
> Not that I have to be, but I'm not sold on the previous UTF-8 codec
> behavior becoming an error handler of the name "surrogates" for two
> reasons (I do respect the obvious PBP argument for the implementation,
> and have no better name - "lenient"?).
PBP?
> First, unless there's a way to stack error handlers, there's no way to
> access the old behavior combined with the "replace" handler.
Well, there is a way to stack error handlers, although it's not pretty:
_surrogates = codecs.lookup_errors("surrogates")
_replace = codecs.lookup_errors("replace")
def surrogates_then_replace(exc):
try:
return _surrogates(exc)
except UnicodeError:
return _replace(exc)
codecs.register_error("surrogates_then_replace",
surrogates_then_replace)
> The stacking argument also applies to the new utf8b behavior on encode
> (only, as it handles all errors on decode). This may be a YAGNI
Indeed - in particular, as, in the primary application of this error
handler (i.e. file IO operations), there is no way of specifying
an addition error handler anyway.
Regards,
Martin
More information about the Python-Dev
mailing list