[Python-Dev] registering unicode codecs

Thu Nov 24 20:34:37 CET 2005

While running regrtest with -R to find reference leaks I found a usage
issue.  When a codec is registered it is stored in the interpreter
state and cannot be removed.  Since it is stored as a list, if you
repeated add the same search function, you will get duplicates in the
list and they can't be removed.  This shows up as a reference leak
(which it really isn't) in test_unicode with this code modified from
test_codecs_errors:

import codecs
def search_function(encoding):
    def encode1(input, errors="strict"):
        return 42
    return (encode1, None, None, None)

codecs.register(search_function)

###

Should the search function be added to the search path if it is
already in there?  I don't understand a benefit of having duplicate
search functions.

Should users have access to the search path (through a
codecs.unregister())?  If so, should it search from the end of the
list to the beginning to remove an item?  That way the last entry
would be removed rather than the first.

n