general problem when subclassing a built-in class

Wed Dec 22 20:22:36 EST 2010

Suppose that you want to implement a subclass of built-in class, to
meet some specific design requirements.

Where in the Python documentation can one find the information
required to determine the minimal[1] set of methods that one would
need to override to achieve this goal?

In my experience, educated guesswork doesn't get one very far with
this question.

Here's a *toy example*, just to illustrate this last point.

Suppose that one wants to implement a subclass of dict, call it
TSDict, to meet these two design requirements:

1. for each one of its keys, an instance of TSDict should keep a
timestamp (as given by time.time, and accessible via the new method
get_timestamp(key)) of the last time that the key had a value
assigned to it;

2. other than the added capability described in (1), an instance
of TSDict should behave *exactly* like a built-in dictionary.

In particular, we should be able to observe behavior like this:

>>> d = TSDict((('uno', 1), ('dos', 2)), tres=3, cuatro=4)
>>> d['cinco'] = 5
>>> d
{'cuatro': 4, 'dos': 2, 'tres': 3, 'cinco': 5, 'uno': 1}
>>> d.get_timestamp('uno')
1293046586.644436

OK, here's one strategy, right out of OOP 101:

#---------------------------------------------------------------------------
from time import time

class TSDict(dict):
    def __setitem__(self, key, value):
        # save the value and timestamp for key as a tuple;
        # see footnote [2]
        dict.__setitem__(self, key, (value, time()))

    def __getitem__(self, key):
        # extract the value from the value-timestamp pair and return it
        return dict.__getitem__(self, key)[0]

    def get_timestamp(self, key):
        # extract the timestamp from the value-timestamp pair and return it
        return dict.__getitem__(self, key)[1]
#---------------------------------------------------------------------------

This implementation *should* work (again, at least according to
OOP 101), but, in fact, it doesn't come *even close*:

>>> d = TSDict((('uno', 1), ('dos', 2)), tres=3, cuatro=4)
>>> d['cinco'] = 5
>>> d
{'cuatro': 4, 'dos': 2, 'tres': 3, 'cinco': (5, 1293059516.942985), 'uno': 1}
>>> d.get_timestamp('uno')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/tsdict.py", line 23, in get_timestamp
    return dict.__getitem__(self, key)[1]
TypeError: 'int' object is not subscriptable

>From the above you can see that TSDict fails at *both* of the design
requirements listed above: it fails to add a timestamp to all keys in
the dictionary (e.g. 'uno', ..., 'cuatro' didn't get a timestamp), and
get_timestamp bombs; and it also fails to behave in every other
respect exactly like a built-in dict (e.g., repr(d) reveals the
timestamps and how they are kept).

So back to the general problem: to implement a subclass of a built-in
class to meet a given set of design specifications.  Where is the
documentation needed to do this without guesswork?

Given results like the one illustrated above, I can think of only two
approaches (other than scrapping the whole idea of subclassing a
built-in class in the first place):

1) Study the Python C source code for the built-in class in the hopes
   of somehow figuring out what API methods need to be overridden;

2) Through blind trial-and-error, keep trying different implementation
   strategies and/or keep overriding additional built-in class methods
   until the behavior of the resulting subclass approximates
   sufficiently the design specs.

IMO, both of these approaches suck.  Approach (1) would take *me*
forever, since I don't know the first thing about Python's internals,
and even if I did, going behind the documented API like that would
make whatever I implement very likely to break with future releases of
Python.  Approach (2) could also take a very long time (probably much
longer than the implementation would take if no guesswork was
involved), but worse than that, one would have little assurance that
one's experimentation has truly uncovered all the necessary details;
IME, programming-by-guesswork leads to numerous and often nasty bugs.

Is there any other way?

TIA!

~kj

[1] The "minimal" bit in the question statement is just another way of
    specifying a "maximal" reuse of the built-in's class code.

[2] For this example, I've accessed the parent's methods directly
    through dict rather than through super(TSDict, self), just to keep
    the code as uncluttered as possible, but the results are the same
    if one uses super.