[Python-Dev] Bad interaction of __index__ and sequence repeat

Travis Oliphant oliphant.travis at ieee.org
Fri Jul 28 18:15:51 CEST 2006


Nick Coghlan wrote:
> David Hopwood wrote:
>> Armin Rigo wrote:
>>> Hi,
>>>
>>> There is an oversight in the design of __index__() that only just
>>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>>> machine with >= 2GB of RAM:
>>>
>>>     >>> s = 'x' * (2**100)       # works!
>>>     >>> len(s)
>>>     2147483647
>>>
>>> This is because PySequence_Repeat(v, w) works by applying 
>>> w.__index__ in
>>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>>> result to fit in a Py_ssize_t.
>>
>> Clipping the result sounds like it would *never* be a good idea. What 
>> was
>> the rationale for that? It should throw an exception.
>
> A simple demonstration of the clipping behaviour that works on 
> machines with limited memory:
>
> >>> (2**100).__index__()
> 2147483647
> >>> (-2**100).__index__()
> -2147483648
>
> PEP 357 doesn't even mention the issue, and the comment on long_index 
> in the code doesn't give a rationale - it just notes that the function 
> clips the result.
I can't think of the rationale so it was probably an unclear one and 
should be thought of as a bug.  The fact that it isn't discussed in the 
PEP means it wasn't thought about clearly.  I think I had the vague idea 
that .__index_() should always succeed.  But, this shows a problem with 
that notion.
>
> I'm inclined to call it a bug, too, but I've cc'ed Travis to see if he 
> can shed some light on the question - the implementation of long_index 
> explicitly suppresses the overflow error generated by 
> _long_as_ssize_t, so the current behaviour appears to be deliberate.

If it was deliberate, it was a hurried decision and one that should be 
re-thought and probably changed.  I think the idea came from the fact 
that out-of-bounds slicing returns empty lists and since __index__ was 
primarily developed to allow integer-like objects to be used in slicing 
it adopted that behavior.  In fact it looks like the comment above 
_long_index contains words from the comment above _PyEval_SliceIndex 
showing the direct borrowing of the idea.   But, _long_index is clearly 
the wrong place to handle the situation since it is used by more than 
just the slicing code.  An error return is already handled by the 
_Eval_SliceIndex code anyway.

I say it's a bug that should be fixed.  Don't clear the error, raise it.


-Travis



More information about the Python-Dev mailing list