[Python-Dev] Bad interaction of __index__ and sequence repeat

Travis Oliphant oliphant.travis at ieee.org
Mon Jul 31 20:28:09 CEST 2006


Nick Coghlan wrote:
> Nick Coghlan wrote:
>> Armin Rigo wrote:
>>> Hi,
>>>
>>> There is an oversight in the design of __index__() that only just
>>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>>> machine with >= 2GB of RAM:
>>>
>>>     >>> s = 'x' * (2**100)       # works!
>>>     >>> len(s)
>>>     2147483647
>>>
>>> This is because PySequence_Repeat(v, w) works by applying 
>>> w.__index__ in
>>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>>> result to fit in a Py_ssize_t.  This means that the above problem 
>>> exists
>>> with all sequences, not just strings, given enough RAM to create such
>>> sequences with 2147483647 items.
>>>
>>> For reference, in 2.4 we correctly get an OverflowError.
>>>
>>> Argh!  What should be done about it?
>>
>> I've now got a patch on SF that aims to fix this properly [1].
>
> I revised this patch to further reduce the code duplication associated 
> with the indexing code in the standard library.
>
> The patch now has three new functions in the abstract C API:
>
>   PyNumber_Index (used in a dozen or so places)
>     - raises IndexError on overflow
>   PyNumber_AsSsize_t (used in 3 places)
>     - raises OverflowError on overflow
>   PyNumber_AsClippedSsize_t() (used once, by _PyEval_SliceIndex)
>     - clips to PY_SSIZE_T_MIN/MAX on overflow
>
> All 3 have an int * output argument allowing type errors to be flagged 
> directly to the caller rather than through PyErr_Occurred().
>
> Of the 3, only PyNumber_Index is exposed through the operator module.
>
> Probably the most interesting thing now would be for Travis to review 
> it, and see whether it makes things easier to handle for the Numeric 
> scalar types (given the amount of code the patch deleted from the 
> builtin and standard library data types, hopefully the benefits to 
> Numeric will be comparable).


I noticed most of the checks for PyInt where removed in the patch.  If I 
remember correctly, I left these in for "optimization."   Other than 
that, I think the patch is great.

As far as helping with NumPy,  I think it will help to be able to remove 
special-checks for all the different integer-types.  But, this has not 
yet been done in the NumPy code.

-Travis




More information about the Python-Dev mailing list