[issue15573] Support unknown formats in memoryview comparisons

Sat Aug 11 20:58:07 CEST 2012

Nick Coghlan added the comment:

OK, I think I finally understand what Martin is getting at from a semantic point of view, and I think I can better explain the background of the issue and why Stefan's proposed solution is both necessary and correct.

The ideal definition of equivalence for memory view objects would actually be:

memoryview(x) == memoryview(y)

if (and only if)

memoryview(x).tolist() == memoryview(y).tolist()

Now, in practice, this approach cannot be implemented, because there are too many format definitions (whether valid or invalid) that memoryview doesn't understand (and perhaps will never understand) and because it would be completely infeasible on large arrays with complex format definitions.

Thus, we are forced to accept a *constraint* on memoryview's definition of equality: individual values are always compared via raw memory comparison, thus values stored using different *sizes* or *layouts* in memory will always compare as unequal, even if they would compare as equal in Python

This is an *acceptable* constraint as, in practice, you don't perform mixed format arithmetic and it's not a problem if there's no automatic coercion between sizes and layouts.

The Python 3.2 memoryview effectively uses memcmp() directly treating everything as a 1D array of bytes data, completely ignoring both shape *and* format data. Thus:

>>> ab = array('b', [1, 2, 3])
>>> ai = array('i', [1, 2, 3])
>>> aL = array('L', [1, 2, 3])
>>> ab == ai
True
>>> ab == ai == aL
True
>>> memoryview(ab) == memoryview(ai)
False
>>> memoryview(ab) == memoryview(aL)
False
>>> memoryview(ai) == memoryview(aL)
False

This approach leads to some major false positives, such as a floating point value comparing equal to an integer that happens to share the same binary representation:

>>> af = array('f', [1.1])
>>> ai = array('i', [1066192077])
>>> af == ai
False
>>> memoryview(af) == memoryview(ai)
True

The changes in 3.3 are aimed primarily at *eliminating those false positives* by taking into account the shape of the array and the format of the contained values. It is *not* about changing the fundamental constraint that memoryview operates at the level of raw memory, rather than Python objects, and thus cares about memory layout details that are irrelevant after passing through the Python abstraction layer.

This contrasts with the more limited scope of the array.array module, which *does* take into account the Python level abstractions. Thus, there will always be a discrepancy between the two definitions of equality, as memoryview cares about memory layout details, where array.array does not.

The problem at the moment is that Python 3.3 currently has *spurious* false negatives that aren't caused by that fundamental constraint that comparisons must occur based directly on memory contents. Instead, they're being caused by memoryview returning False for any equality comparison for a format it doesn't understand. That's unacceptable, and is what Stefan's patch is intended to fix.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15573>
_______________________________________