[Tutor] Restricting the type of passed-in objects

Tim Peters tim.one@home.com
Mon, 14 May 2001 04:12:21 -0400


[Roeland Rengelink]
> Being a data structure junkie myself I can sympathize. And there is
> definitely something elegant about trees, especially about the way you
> can implement various operations by recursive decent.
>
> Trees are also very frustrating to implement in Python. Not because it
> is difficult, but because it's so easy to do the same thing more
> efficiently with lists or dictionaries.

Testify, brother!  You're right, of course.  Python dicts are very hard to
beat speedwise, for any problem where they suffice and the data fits in
memory.  Tree approaches usually don't win until the data is so large it
needs to spill to disk, and then trees *designed* for out-of-RAM storage are
the way to go (like B-tree variants, not binary trees).

Historical tidbit:  Guido worked on the implementation of the ABC language
before Python.  ABC used balanced (AVL) binary trees under the covers for
almost everything, with provable log-time worst-case behavior.  In actual
practice, ABC ran very slowly, mostly because the overhead of balanced tree
operations swamps all other considerations until the trees grow quite large.
So Python uses contiguous vectors to implement lists, and simple hash tables
to implement dicts, and so on:  very low overhead even for tiny sizes.  dicts
are faster than trees for huges RAM-like sizes too, but you can definitely
get into real speed trouble using a plain list if it grows large and you
insert or delete "in the middle".

> For example. In a related thread Tim showed a simple BST tree with two
> operations insert() and traverse().

Also contains(), but it's a trivial variation on insert().

> A far more efficient implementation of that tree is:
>
> import bisect
> class BST_tree:

Oops!  Stop right there.  In Python *practice*, I confess I usually use
bisect directly, not bothering to wrap it in a class.  Note that bisect does
take O(len(x)) time for an insert into x, and when len(x) is greater than a
few hundred that can be a real problem.  But it can't be beat for small
len(x).

Similarly there's mounds and mounds of Python code out there that uses dicts
directly to implement sets:

small_primes = {2:1, 3:1, 5:1, 7:1, 11:1}
small_evens = {2:1, 4:1, 6:1, 8:1, 10:1}

union = small_primes.copy()
union.update(small_evens)

and so forth.

For all the rest, "do the simplest thing that could possibly work" is
especially good advice in Python:  once you get the hang of it, it's so easy
to refactor Python code that there's no *need* to try to out-guess every
possibility in advance.  The last thing the world needs is yet another
prefectly general framework *so* perfectly general nobody can figure out how
to apply it to a specific problem <wink>.

Python's os.path.walk() is a minor but good example of that:  newbies have
been known to scratch their heads over the walk() docs for hours before
figuring out how to use it.  It's much simpler to just write the obvious
loop, using list.append() and list.pop() to maintain an explicit stack of
directories to visit.

most-small-abstractions-aren't-worth-the-bother-of-abstracting-ly
    y'rs  - tim