[Pandas-dev] Pandas Sprint Recap

Tom Augspurger tom.augspurger88 at gmail.com
Fri Jul 13 13:45:36 EDT 2018


Thanks Pietro,

We didn't discuss indexing much, beyond agreeing that there's work to be
done, and that fixing it was too large
a task for 1.0.

As for whether an individual issue is a bug or feature, we'll have to
continue using our judgement. I think we'll
inevitably break users' code in a 1.x release as we fix bugs.

We'll need to discuss workflows for these large changes (e.g. ripping out
the block manager) that will be API
breaking, but may take some time to land. Keeping a separate branch in sync
is a pain, but may be the least
painful alternative.

One thing I want to reiterate: it's not going to take another 11 years to
reach pandas 2.0 :) Just because we don't
solve indexing for 1.0 doesn't mean we won't ever be able to fix it.

Tom

On Fri, Jul 13, 2018 at 12:12 PM, Pietro Battiston <me at pietrobattiston.it>
wrote:

> Hi Tom,
>
> first, thanks to all those who participated in the sprint, and for the
> recap.
>
> Il giorno dom, 08/07/2018 alle 16.26 -0500, Tom Augspurger ha scritto:
> > [...]
> > I've posted a document on our wiki with a summary of the topics
> > discussed. https://github.com/pandas-dev/pandas/wiki/Pandas-Sprint-(J
> > uly,-2018)
> >
> > If people have questions or comments, feel free to post here and
> > we'll clarify that document.
>
> Something that scares me - but maybe because I'm missing something
> obvious - is what exactly qualifies as "deprecation". Is it something
> which was once presented as a distinct feature and is then disabled, or
> any general change to what any API call performs (that is, anything
> requiring a deprecation cycle - that is)?
>
> There are many bugs - in particular, in indexing code - which might
> potentially break existing code when fixed. Some of them will have non-
> trivial deprecation paths/detection strategies. The first ones that
> come to my mind are #18631, #12827, #9519. The last one, in particular,
> implies changing the result of potentially tons of calls to .loc on a
> non-unique index.
>
> My view is that those (and many more, including several that will be
> found) will be best fixed through a total rewrite of indexing code
> (i.e., all code in indexing.py, and some code in internals.py), which I
> assumed would happen before 1.0, and which I certainly won't be able to
> do before 0.24.0 (September 2018).
> I'm clearly not claiming that nobody else can do it (nor that the bugs
> can necessarily only be fixed through a complete rewrite)... but since
> I did not get any feedback on
> https://github.com/pandas-dev/pandas/wiki/(Tentative)-rules-for-restruc
> turing-indexing-code
> ... I assume that nobody is focusing/planning to focus on this in the
> near future (or was it somehow discussed in the sprint?).
>
> I perfectly understand the desire to stop postponing 1.0 to a vague
> future, if it's just a matter of recognizing that pandas is worth
> using.
> But if it's a statement/commitment about code robustness/quality, and
> relatedly API stability... then I think we it is risky to leave the
> indexing API, and more in general the core codebase (as opposed to
> important but more lateral features such as new dtypes) out of the
> picture (e.g. out of #21894).
>
> Cheers,
>
> Pietro
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20180713/c01b67e6/attachment.html>


More information about the Pandas-dev mailing list