[Pandas-dev] Pandas Sprint Recap

Wed Jul 18 15:05:15 EDT 2018

Il giorno mer, 18/07/2018 alle 14.45 -0400, Wes McKinney ha scritto:
> 
> 
> On Wed, Jul 18, 2018, 2:08 PM Pietro Battiston <me at pietrobattiston.it
> > wrote:
> > Il giorno mer, 18/07/2018 alle 13.56 -0400, Wes McKinney ha
> > scritto:
> > > > You mean Arrow-based R data frames, right? Or are you thinking
> > > > about a
> > > 
> > > sort of cross-language dplyr?
> > > 
> > > Precisely a cross-language computational system (I've been
> > talking
> > > about this publicly for well over 3 years now, e.g. here in April
> > > 2015
> > > https://www.slideshare.net/wesm/dataframes-the-good-bad-and-ugly)
> > .
> > > Same implementation (in C/C++/LLVM), different front end. dplyr
> > > already has interfaces to SQL, for example.
> > 
> > _Through R_, right?
> 
> Replying all this time
> 
> Well, dplyr is an R package. My point was that it was not designed
> around R-specific semantics per se. This is explained in the slide
> deck I linked. 

I think clearly distinguishing API problems/solutions from
implementation problems/solutions can only help this discussion (and I
don't just mean this thread).

Your slides describe a nice plan for what concerns solutions. But my
limited understanding is that the dplyr _syntax_ is more innovative
than the dplyr _semantics_, from which pandas doesn't have that much to
learn. Then, sure, a shared codebase is cool.

Pietro