[issue36551] Optimize list comprehensions with preallocate size and protect against overflow

anthony shaw report at bugs.python.org
Tue Apr 9 03:04:14 EDT 2019


anthony shaw <anthony.p.shaw at gmail.com> added the comment:

>I am -1 for this optimization because it affects only one particular case (neither other kinds of comprehensions, nor generator expressions, nor list comprehensions with 
 conditions) and even in this case it is small. It is possible to add a lot of other optimizations for other cases which will sped up them to 50% or 100%, but we do not do 
 this, because every such optimization has a cost. It increases the amount of code which should be maintained and covered by tests, it adds small overhead in common cases to 
 speed up an uncommon case, and increasing the code base can negatively affect surrounding code (just because the CPU cache and registers are used inappropriate and the 
 compiler optimizes less important paths).

Understood, I had hoped this change would have a broader impact. The additional opcode is not ideal either. 

> In addition, while this change speed up list comprehensions for long list, it slows down them for short lists. Short lists are more common.

I've been profiling this today, basically, this implementation receives the `list_iter`, `range_iter`, etc.

There is no standard object model for an iterator's length, _PyObject_HasLen would return false because it neither implements tp_as_sequence nor, tp_as_mapping (rightly so).

What this has uncovered (so hopefully there's some value from this whole experience!) is that __length_hint__ for iterators is _really_ inefficient. Take a list_iterator for example:

PyObject_LengthHint will call, _PyObject_HasLen, which returns false, which then goes to call
_PyObject_LookupSpecial, then 
_PyObject_CallNoArg, which calls 
listiter_len, which calls 
PyList_GET_SIZE which returns a Py_ssize_t, which is then converted to a PyLong via
PyLong_FromSsize_t, which is then returned back to PyObject_LengthHint, which then
PyLong_AsSsize_t is run to convert the PyLong back into a Py_ssize_t

The Py_ssize_t is then finally returned to the caller!

My conclusion was that the list comprehension should be initialized to the length of the target, before GET_ITER is run. This would remove the overhead for range objects, because you could simply call _PyObject_HasLen, which would return true for dict, list, tuple and set, but false for range objects (which is what you want).

The issue is that GET_ITER is called outside the code object for the comprehension, so you'd have to pass an additional argument to the comprehension generator.

This is way outside of my expertise, but the only way I can see to find a sizeable benefit, with minimal code and no edge cases which are slower.

Thanks for your time

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36551>
_______________________________________


More information about the Python-bugs-list mailing list