[Python-Dev] __index__ clipping

Guido van Rossum guido at python.org
Thu Aug 10 16:26:54 CEST 2006


On 8/10/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Guido van Rossum wrote:
> >> It seems like Nick's recent patches solved the problems that were
> >> identified.
> >
> > Nick, can you summarize how your patches differ from my proposal?
>
> nb_index and __index__ are essentially exactly as you propose.

Then I don't understand why Travis is objecting against my proposal!

I'll review the rest later (right now I'm just doing email triage :-).

--Guido

> To make an
> object implemented in C usable as an index you would take either the nb_int
> slot or the nb_long slot and put the same function pointer into the nb_index
> slot. For a Python object, you would write either '__index__ = __int__' or
> '__index__ = __long__' as part of the class definition.
>
> operator.index is provided to support writing __getitem__, __setitem__ and
> __delitem__ methods - it raises IndexError on overflow so you don't have to
> catch and reraise to convert an OverflowError to an IndexError.
>
> On the C API side, the 3 functions you suggest are all present (although the
> version returning a Python object is accessed via PyObject_CallMethod), and
> there's a 4th variant that raises IndexError instead of OverflowError (this
> version is convenient when writing mp_subscript and mp_ass_subscript functions).
>
> Avoiding Py_ssize_t -> PyInt -> Py_ssize_t conversions for all integer types
> implemented in C would be nice, but I don't think it's practical (the latest
> version of the patch does at least avoid it for the builtin integer types).
>
> Cheers,
> Nick.
>
>
>
> P.S. Here's the detailed rationale for the form the patch has evolved to [1]:
>
> In addition to allowing (2**100).__index__() == 2**100, having nb_index return
> a Python object resulted in a decent reduction in code duplication -
> previously the coercion logic to get a Python integer or long value down to a
> Py_ssize_t was present in 3 places (long_index, instance_index,
> slot_nb_index), and would also have needed to be duplicated by any other C
> implemented index type whose value could exceed the range of a Py_ssize_t.
> With the patch, that logic appears only inside abstract.c and extension types
> can just return a PyLong value and let the interpreter figure out how to
> handle overflow. The biggest benefit of this approach is that a single slot
> (nb_index) can be used to implement four different overflow behaviours in the
> core (return PyLong, raise OverflowError, raise IndexError, clip to
> Py_ssize_t), as well as providing a hook to allow extension module authors to
> define their own overflow handling.
>
> If the nb_index slot does not return a true Python integer or long, TypeError
> gets raised. Subclasses are not accepted in order to rule out Armin's
> favourite set of recursion problems :)
>
> The C level API is based on the use cases in the standard library, with one of
> the functions generalised a bit to allow extension modules to easily handle
> type errors and overflow differently if they want to.
>
> The three different use cases for nb_index in the standard library are:
>    - concrete sequence indices (want IndexError on overflow)
>    - 'true integer' retrieval (want OverflowError on overflow)
>    - slice endpoints (want to clip to Py_ssize_t max/min values)
>
> The proposed fix (Travis & Neal provided some useful comments on earlier
> versions) includes a C API function for each of these different use cases:
>
>    PyNumber_Index(PyObject *obj, int *type_err)
>    PyNumber_AsSsize_t(PyObject *obj, int *type_err)
>    PyNumber_AsClippedSsize_t(PyObject *obj, int *type_err, int *clipped)
>
> type_err is an output variable to say "obj does not provide nb_index" in order
> to get rid of boilerplate dealing with PyErr_Occurred() in mp_subscript and
> mp_ass_subscript implementations (those methods generally didn't want a
> TypeError raised at this point - they wanted to go on and check if the object
> was a slice object instead). It's also useful if you want to provide a
> specific error message for TypeErrors (sequence repetition takes advantage of
> this). You can also leave the pointer as NULL and the functions will raise a
> fairly generic TypeError for you. PyObject_GetItem and friends, use the
> functions that way.
>
> Avoiding repeated code is also why there are two non-clipping variants, one
> raising IndexError and one raising OverflowError. Raising OverflowError in
> PyNumber_Index broke half a dozen unit tests, while raising IndexError for
> things like sequence repetition turned out to break different unit tests.
>
> The clipping variant is for slice indices. The interpreter core doesn't
> actually care whether or not the result gets clipped in this case (it sets the
> last parameter to NULL), but I kept the output variable in the signature for
> the benefit of extension authors.
>
> All 3 of the C API methods return Py_ssize_t. The "give me a Python object"
> case isn't actually needed anywhere in the core, but is available to extension
> modules via:
>    PyObject_CallMethod(obj, "__index__", NULL)
>
> As Travis notes, indexing with something other than a builtin integer will be
> slightly slower due to the temporary object created by calling the nb_index
> slot (version 4 of the patch avoids this overhead for ints, version 5 avoids
> it for longs as well). I don't think this is avoidable - a non-PyObject return
> value really doesn't provide the necessary flexibility to detect and handle
> overflow correctly.
>
> [1] http://www.python.org/sf/1530738
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list