proposal for slice hashing

Will Bradshaw defenastrator at gmail.com
Mon May 11 15:58:47 EDT 2020


I recently ran into a situation where I needed to hash a slice and found this to be unsupported. This seems silly as there is little benefit of slices not supporting hashing and large downsides.  This coupled with the fact that one cannot subclass slices is a significant and needless annoyance.

It seems to me a hash of a slices would simply be a hash of a slice would (in python) be:
    def __hash__(self):
        return hash((self.start, self.stop, self.step))
I can see no reason why this is not the case


Context:

I have an object that presents itself as a multidimensional array in the numpy style but is computes it's values on __getitem__ calls as it represents an infinite field of numbers. However this operation is expensive. In many cases the same slice of the field will be needed repeatedly. This lends itself to using an lru cache on __getitem__() however this does not work as slices cannot be stored in the dictionary key used for the lru_cache.*


The only options as of now are:
    1. use 3 layers of wrappers to pack the slices into a custom type that supports hashing pass this mangled version of the arguments through lru_cache wrapper into a function that reverses the process of the first wrapper and passes through to the underlying implementation. (see below "code for workaround" as example)
       - This is kinda jank and arguably slow. Though in my case the cost of the calculation operation dwarfs this cost by an several orders of magnitude.
       - mapping may be unreliable and is a rather long and impenetrable mess. 
    2. implementing my own custom caching for this situation which does not scale well and is a heck of a lot of work.
    3. implement a special case for slices in the lru_cache function. However, this is just moving the problem into the functools library.


* While I do realize that further efficiency gains would be found by caching results in a sparse and tracking what is and is not filled in the field thus keeping what is essentially a by cell cache that would be significantly more work and unlikely to yield much better results in my case and still doesn't solve the general use case.


Code for Workaround:

from functools import lru_cache


class _hashable_slice(object):
    __slots__ = ['slice']

    def __init__(self, s: slice):
        self.slice = s

    def __hash__(self):
        return hash((self.slice.start, self.slice.stop, self.slice.step))

    def __eq__(self, other):
        return other == self.slice


def slice_lru_cache(*lru_args, **lru_kwargs):
    lru_wrapper = lru_cache(*lru_args, **lru_kwargs)

    def decorator(f):
        @lru_wrapper
        def inner(*args, **kwargs):
            def unpack(x):
                if isinstance(x, _hashable_slice):
                    return x.slice
                if isinstance(x, (tuple, list)):
                    return type(x)(unpack(v) for v in x)
                else:
                    return x

            return f(*(unpack(v) for v in args),
                     **{k: unpack(v) for k, v in kwargs.items()})
        
        def wrapper(*args, **kwargs):
            def pack(x):
                if isinstance(x, slice):
                    return _hashable_slice(x)
                if isinstance(x, (tuple, list)):
                    return type(x)(pack(v) for v in x)
                else:
                    return x
            return inner(*(pack(v) for v in args),
                         **{k: pack(v) for k, v in kwargs.items()})
        wrapper.__getattr__ = lambda name: getattr(inner, name)
        return wrapper
    return decorator


More information about the Python-list mailing list