isinstance() considered harmful

Wed Jan 23 17:11:22 EST 2002

I had previously posted this to the kragen-tol mailing list; that copy
is at
http://lists.canonical.org/pipermail/kragen-tol/2002-January/000659.html

There is an HTML version at
http://pobox.com/~kragen/isinstance/

My apologies for the somewhat idiosyncratic markup.

isinstance() considered harmful
===============================

This is mostly in reference to \(href
http://www.python.org Python) programs, and it may
be more or less true with reference to programs in
other languages.

Python is an object-oriented programming language,
which is a buzzword, and therefore can mean a wide
variety of different things.  What it actually
means, in Python's case, is that you can implement
your objects any way you please; it's the
interface they support that determines whether or
not they'll work with existing code --- that is,
the methods they implement and the semantics of
those methods.

For example, you can write the first version of a
data type in pure Python; you can then rewrite it
as a Python extension that creates CObjects, and
expect that it will work everywhere the Python
version did, as long as it's implemented
correctly.  This is a useful property; it often
results in being able to apply old code to
completely new problems.

For this promise to hold, the code your object
works with must refrain from peeking behind the
interface the object supposedly supports.  There
are probably some times that this is not
desirable; transparent persistence and object
migration systems can probably work better most of
the time by not respecting published interfaces.

Some languages, like Java and C++, offer explicit
interface support.  Python is not among them.  It
offers \(href
http://www.shindich.com/sources/patterns/implied.html
implied interfaces) in places where other
languages would use explicit interfaces.  This has
a variety of effects, good and bad.

In Python, what classes your object is derived
from is not a part of your object's interface.

\(b Every use of isinstance is a violation of this
promise), large or small.  Whenever isinstance is
used, control flow forks; one type of object goes
down one code path, and other types of object go
down the other --- even if they implement the same
interface!

Bjarne Stroustrup often cited concerns like these
when defending C++'s decision not to provide
isinstance.  (Now, of course, with RTTI in the C++
standard, C++ does provide isinstance.)

Sometimes, of course, violating this promise is
worth the payoffs --- isinstance, like goto, is
not pure evil.  But it is a trap for new
programmers.  Beware!  Don't use isinstance unless
you know what you're doing.  It can make your code
non-extensible and break it in strange ways down
the line.

Reasons for using isinstance
----------------------------

Isinstance is used for a variety of reasons:
\(ul
- to determine whether an object supports a
  particular interface
- to keep people from writing possibly incorrect
  programs, or to indicate possible bugs
  (type-checking)
- to test the implementation of an object to see
  if certain optimizations can be applied
- to write regression tests with knowledge of the 
  implementation
- to define things that should really be methods
  outside of the class they operate on
- for no apparent reason whatsoever
)

To determine whether an object supports an interface
----------------------------------------------------

Using isinstance() to determine whether an object
supports a particular interface is always a bad
idea.  You are, in essence, including inheritance
from a particular class in the interface; if
anyone wants to implement that interface, they
must derive their class from the class you
specify, which means they'll inherit its bugs,
they'll probably have to understand its
invariants, and, in Python, they'll have to be
careful not to collide with names it defines.
(This is especially bad if one of those names is
__getattr__ or __setattr__, but that's another
problem.)

It's not just overly conservative; it's also
overly liberal.  Someone can override methods in
the interface with broken methods --- in Python,
they can even override them with non-callable
objects.  An object's derivation from a particular
class doesn't guarantee that it implements all the
protocol that class does.  (Still, breaking
protocols your base classes implement is almost
always a bad idea.)

Type-checking to find bugs
--------------------------

Using isinstance() for type-checking to find bugs
is a special case of the above.  The bug you might
catch is that the object being checked doesn't
implement the interface the rest of your code
expects.

Here's a perfect example from
distutils.cmd.Command.__init__:
        if not isinstance(dist, Distribution):
            raise TypeError, "dist must be a Distribution instance"

In fact, all Command requires of 'dist' is that it
have attributes verbose, dry_run, get_command_obj,
reinitialize_command, and run_command, with the
appropriate semantics.  There's no reason that it
should have to be derived from Distribution, which
is a monster class nearly a thousand lines long.

Determining whether optimizations are applicable
------------------------------------------------

Using isinstance() to determine whether or not a
particular abstraction-violating optimization is
applicable is a reasonable thing to do, but it is
often better to package that optimization into a
method and test for its existence:

    try: optmeth = obj.optmeth
    except AttributeError:
         do_it_the_slow_way(obj, stuff)
         return
    optmeth(stuff)

You can also use hasattr() for this test.

This allows your abstraction-violating
optimization to work without violating 
abstraction --- simply by providing a more
intimate interface into your objects.

There's an example of this in the standard library
in test_coercion.py:

    class CoerceNumber:
        def __init__(self, arg):
            self.arg = arg
        def __coerce__(self, other):
            if isinstance(other, CoerceNumber):
                return self.arg, other.arg
            else:
                return (self.arg, other)

The __coerce__ method would be better written as:
    def __coerce__(self, other):
        num = self.arg
        try: return num, other.arg
        except AttributeError: return num, other

As it's written, if you had instances of two
textual copies of the CoerceNumber class, you
wouldn't even be able to add them together in
Python 1.5, and in Python 2, the interpreter would
have to call both of their __coerce__ methods to
add them together.  (Something like this is
actually not uncommon; see the comments below
about reload.)

Probably the best way to write this in Python 2.x
--- although it's marginally slower --- is as
follows:

     def __coerce__(self, other): return self.arg, other

Of course, this probably doesn't matter; it isn't
very likely that someone will try to inherit from
CoerceNumber or add a CoerceNumber instance to an
instance of some other similar type, since it is,
after all, only a test case.

(__coerce__ is kind of a tough routine to write,
in general, though, and seems like one place where
dispatching on the type of other arguments might
be the least of all possible evils.)

More instances of this are scattered through the
UserDict, UserList, and UserString modules.

Adding methods to someone else's classes
----------------------------------------

Sometimes one person wants to write a piece of
code that accesses some data in someone else's
class.  So they check to see if the argument
they're being applied to is of the correct type,
and then proceed to mess with it in ways its
public interface doesn't allow.

Generally, this is asking for trouble down the
road, but it can be useful for short-term hacks,
or for code that is unlikely to change.

Sometimes people even do it to their own classes,
and that's just bad code.  Here's an example from
distutils.command.build_ext:

        # The python library is always needed on Windows.  For MSVC, this
        # is redundant, since the library is mentioned in a pragma in
        # config.h that MSVC groks.  The other Windows compilers all seem
        # to need it mentioned explicitly, though, so that's what we do.
        # Append '_d' to the python import library on debug builds.
        from distutils.msvccompiler import MSVCCompiler
        if sys.platform == "win32" and \
           not isinstance(self.compiler, MSVCCompiler):
            template = "python%d%d"
            if self.debug:
                template = template + '_d'
            pythonlib = (template %
                   (sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
            # don't extend ext.libraries, it may be shared with other
            # extensions, it is a reference to the original list
            return ext.libraries + [pythonlib]

This stuff should probably live in a method of the
compiler object, so you can say
    return self.compiler.get_libraries(ext.libraries)
but possibly the best solution is to say
    if sys.platform == "win32" and not self.compiler.ismsvc():
or even
    if sys.platform == "win32" and not hasattr(self.compiler, 'msvc'):

Writing regression tests
------------------------

If you implement an __add__ method that returns a
new instance of the same class, and you decide to
change it to return an instance of a different
class, you should probably modify your regression
tests to cover the new class.  So it's perfectly
reasonable for them to fail in this case.
However, they should probably compare __class__
rather than using isinstance().

For no apparent reason
----------------------

I've seen isinstance used to check whether
something was None or a real object, and I've seen
it used in places where it apparently had no real
effect.

reload
------

Python's reload() function reloads modules of code
into the running interpreter, re-executing their
contents.  This generally results in functions and
classes in those modules being replaced with fresh
versions, often identical fresh versions.
References to those functions and classes
elsewhere are not replaced, however, so old
objects retain their old classes.

The result is that the following code prints 0
twice, assuming that everything works perfectly:
    import somemod
    x = somemod.someclass()
    reload(somemod)
    print isinstance(x, somemod.someclass)
    y = somemod.someclass()
    print isinstance(y, x.__class__)

This means that any code whose correctness depends
on being able to tell that x is an instance of
somemod.someclass will fail in the presence of
reloading, and any code whose performance depends
on it will perform poorly in the presence of
reloading.

Prevalence of isinstance in the standard library
------------------------------------------------

It appears to be recognized that isinstance is
usually a bad idea; rather like Euclid's parallel
postulate, the proof of this is in the frequency
of its use.

In Python 2.1.1, there are 98586 lines in 447 .py
files in the standard library; 68 of those lines
mention 'isinstance'.

19 of those 68 are in the test suite, which
contains 19364 of the total lines; seven of these
(10% of the total standard-library use!) are
verifying that isinstance() itself works.

24 of these are in User*.py, which uses isinstance
to see if it's safe to bypass public interfaces
and use other objects' data members directly.
They should use try-except or hasattr instead.