Memoization and encapsulation

drewp at bigasterisk.com drewp at bigasterisk.com
Thu Jan 5 00:05:44 EST 2006


> Faster is:
>
>class FastHashableList(list):
>   def __hash__(self):
>      return id(self)

But that's wrong. Two equal objects must hash to the same value, and
you're not guaranteeing that at all. My degenerate __hash__ does
guarantee that, and I still claim the performance impact from
collisions will actually be very low if each unhashable argument is not
too diverse, which was the common scenario for me. That is, my Foo
objects were unhashable, but there were only like two Foo objects in
the program that might be passed to the memoized function (but with
many possible combinations of additional float-type arguments).

> There isn't a straightforward way to make memoised work because -
> well, it's not a straightforward thing to make work.

Seems straightforward to me, as long as equality is defined for the
argument types. A hash that spreads the argument values across a hash
table is a fine optimization, but I don't think a memoizer needs to
have optimal performance in that area to be useful. In many of the
cases I have come across, even the dumbest list search (i.e. what you
get if all args always hash to 0) will be much faster than actually
rerunning the original function.

> x = HL([1, 2, 3])       # Or FHL
> print broken(x)
> x.append(4)
> print broken(x)
>
> Will your memoised handle this case correctly?

For that one to work my fancy memoizer would have to make a copy of the
arguments. The comparisons will work fine, though. [1,2,3] !=
[1,2,3,4].

> Which is also why lists aren't hashable - there's no good definition
> of the hash of a list that isn't broken for some common use case.

Only if broken means 'slow'. My hunch is that lists weren't made
hashable because people would naively use them heavily as dict keys and
suffer poorer performance than if they used tuples.

I would still like to see a well-done memoizer that doesn't suffer from
the unhashable bug and maybe has a way to avoid the mutable object bug
that you presented in your last post.

The version I keep mentioning as "my fancy memoizer" is actually just a
hacked one that adds a __hash__ to the first argument, since that's all
I needed at the time.




More information about the Python-list mailing list