API for custom Unicode error handlers

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Oct 4 09:56:14 EDT 2013


I have some custom Unicode error handlers, and I'm looking for advice on 
the right API for dealing with them.

I have a module containing custom Unicode error handlers. For example:

# Python 3
import unicodedata
def namereplace_errors(exc):
    c = exc.object[exc.start]
    try:
        name = unicodedata.name(c)
    except (KeyError, ValueError):
        n = ord(c)
        if n <= 0xFFFF:
            replace = "\\u%04x"
        else:
            assert n <= 0x10FFFF
            replace = "\\U%08x"
        replace = replace % n
    else:
        replace = "\\N{%s}" % name
    return replace, exc.start + 1


Before I can use the error handler, I need to register it using this:


import codecs
codecs.register_error('namereplace', namereplace_errors)

And now:

py> 'abc\u04F1'.encode('ascii', 'namereplace')
b'abc\\N{CYRILLIC SMALL LETTER U WITH DIAERESIS}'


Now, my question:

Should the module holding the error handlers automatically register them? 
In other words, if I do:

import error_handlers

just importing it will have the side-effect of registering the error 
handlers. Normally, I dislike imports that have side-effects of this 
sort, but I'm not sure that the alternative is better, that is, to put 
responsibility on the caller to register some, or all, of the handlers:

import error_handlers
error_handlers.register(error_handlers.namereplace_errors)
error_handlers.register_all()


As far as I know, there is no way to find out what error handlers are 
registered, and no way to deregister one after it has been registered.

Which API would you prefer if you were using this module?


-- 
Steven



More information about the Python-list mailing list