[Python-Dev] Hashable memoryviews

Sun Nov 13 12:56:23 CET 2011

Antoine Pitrou <solipsis at pitrou.net> wrote:
> > > I would propose the following algorithm:
> > > 1) try to calculate the original object's hash; if it fails, consider
> > >    the memoryview unhashable (the buffer is probably mutable)
> > 
> > With slices or the new casts (See: http://bugs.python.org/issue5231,
> > implemented in http://hg.python.org/features/pep-3118#memoryview ),
> > it is possible to have different hashes for equal objects:
> > 
> > >>> b1 = bytes([1,2,3,4])
> > >>> b2 = bytes([4,3,2,1])
> > >>> m1 = memoryview(b1)
> > >>> m2 = memoryview(b2)[::-1]
> 
> I don't understand this feature. How do you represent a reversed buffer
> using the buffer API, and how do you ensure that consumers (especially
> those written in C) see the buffer reversed?

In this case, view->buf points to the last memory location and view->strides
is -1. In general, any PEP-3118 compliant consumer must only access elements
of a buffer either directly via PyBuffer_GetPointer() or in an equivalent
manner.

Basically, this means that you start at view->buf (which may be *any*
location in the memory block) and follow the strides until you reach
the desired element.

Objects/abstract.c:
===================

void*
PyBuffer_GetPointer(Py_buffer *view, Py_ssize_t *indices)
{
    char* pointer;
    int i;
    pointer = (char *)view->buf;
    for (i = 0; i < view->ndim; i++) {
        pointer += view->strides[i]*indices[i];
        if ((view->suboffsets != NULL) && (view->suboffsets[i] >= 0)) {
            pointer = *((char**)pointer) + view->suboffsets[i];
        }
    }
    return (void*)pointer;
}

> Regardless, it's simply a matter of getting the hash algorithm right
> (i.e. iterate in logical order rather than memory order).

If you know how the original object computes the hash then this would
work. It's not obvious to me how this would work beyond bytes objects
though.

> > >>> a = array.array('L', [0])
> > >>> b = b'\x00\x00\x00\x00\x00\x00\x00\x00'
> > >>> m_array = memoryview(a)
> > >>> m_bytes = memoryview(b)
> > >>> m_cast = m_array.cast('B')
> > >>> m_bytes == m_cast
> > True
> > >>> hash(b) == hash(a)
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > TypeError: unhashable type: 'array.array'
> 
> In this case, the memoryview wouldn't be hashable either.

Hmm, the point was that one could take the hash of m_bytes but not
of m_cast, even though they are equal. Perhaps I misunderstood your
proposal. I assumed that hash requests would be redirected to the
original exporting object.

As above, it would be possible to write a custom hash function for
objects with type 'B'.

Stefan Krah