[Pandas-dev] Index Constructor Performance

Sat Mar 27 11:44:32 EDT 2021

In optimizing the non-cython groupby.apply (
https://github.com/pandas-dev/pandas/issues/40263,
https://github.com/pandas-dev/pandas/pull/40171#issuecomment-789116039)
code I'm finding that an awful lot of overhead is coming from
Index._simple_new*.  This email is about what it would take to get rid of
that overhead.

* Note that the particular code snippet being profiled is chosen to be
worst-case for the non-cython path.  It ends up creating a _lot_ of very
small Index objects.  We don't particularly care about this case, but I'm
thinking about this as micro-optimization of code that affects just about
every use case under the sun.

All of the options I have in mind involve moving some of the constructors
to cython.  There is a tradeoff in how invasive that is vs how much perf
benefit we gain from it.

For a baseline, we can trim 10-13% off the benchmark linked above by
implementing in cython and mixing into NumericIndex (implementation
abbreviated for brevity; the full implementation is 65 lines in cython):

```
@cython.freelist(32)
cdef class NumpyIndex:
    cdef:
        public ndarray _data

    @classmethod
    def _simple_new(cls, values, name=None): ...

    cpdef NumpyIndex _getitem_slice(cls, slice slobj): ...
```

10-13% is pretty good, but this only affects Int64Index, UInt64Index, and
Float64Index.  See Appendix 1 for discussion of what it would take to
extend this to other subclasses.

To get much further than this would require using __cinit__, which (absent
some gymnastics) would require the FooIndex.__new__ methods to behave a lot
more like the existing FooIndex._simple_new methods.  TL;DR: this really
isn't feasible absent a) refactoring RangeIndex to not subclass Int64Index
(easy) and b) breaking API changes on the constructors for affected Index
subclasses (hard).

Appendix 1: Extending to Other Subclasses
a) mixing libindex.NumpyIndex into pd.Index doesn't work because
ExtensionIndex._data is not an ndarray.  AFAICT to get the performance
benefit for object-dtype would require implementing a separate subclass
e.g. ObjectIndex.

b) RangeIndex would not benefit, but something similar could be done for it
following https://github.com/cython/cython/issues/4040 (or if we basically
re-implement range ourselves in cython)

c) MultiIndex could be made to benefit from this by changing ._codes to be
a 2D ndarray instead of a FrozenList of ndarrays.  This actually would
allow for some nice cleanups in MultiIndex.  The downside is that the
memory footprint may be bigger with mismatched level sizes.

d) With modest additional effort, this can be extended to
DTI/TDI/PI/CategoricalIndex.

Appendix 2: __cinit__
__cinit__ gets called implicitly before __init__ or __new__, and with
whatever arguments are passed to init/new, i.e. we can't do validation
before passing arguments like we could with an explicit
super().__init__(...) call.

For NumpyIndex we _could_ define __cinit__ without breaking the world, but
we wouldn't get much use out of it unless we also tightened what we accept
in the constructor

Appendix 3: Notes on cython-related constraints
- We cannot mix a cython cdef class into pd.Index because that will break
3rd party subclasses that use object.__new__(cls) (in particular im
thinking of xarray's CFDatetimeIndex)
- a python class cannot inherit from two separate cython cdef classes.
i.e. if we mix something into NumericIndex, that precludes mixing something
else into Int64Index
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20210327/2e4a7035/attachment.html>