[Python-ideas] The pipe protocol, a convention for extensible method chaining

Wed May 27 07:56:14 CEST 2015

Hi Steve,

On Mon, May 25, 2015 at 7:54 PM, Steven D'Aprano <steve at pearwood.info>
wrote:

> Are you sure this actually works in practice?
>
> Since pipe() returns the result of calling the passed in function, not
> the dataframe, it seems to me that you can't actually chain this unless
> it's the last call in the chain.

This is a good point. We're pretty sure it will work in practice, because
many functions that take dataframes return other dataframes -- or other
objects that will implement a .pipe() method. The prototypical use case is
actually closer to:

df.pipe(reformat_my_data)

Plotting and saving data with method chaining is convenient, but usually as
the terminal step in a data analysis flow. None of the existing pandas
methods for plotting or exporting return a dataframe, and it doesn't seem
to be much of an impediment to method chaining.

That said, we've also thought about adding a .tee() method for exactly this
use case -- it's like pipe, but returns the original object instead of
modifying it.

What's the point of the redirection to __pipe_func__? Under what
> circumstances would somebody use __pipe_func__ instead of just passing a
> callable (a function or other object with __call__ method)? If you don't
> have a good use case for it, then "You Ain't Gonna Need It" applies.
>

Our main use case was for APIs that can't accept a DataFrame as their first
argument, but that naturally can be understood as modifying dataframes.

Here's an example based on the Seaborn plotting library:

def scatterplot(x, y, data=None):
    # make a 2D plot of x vs y

If `x` or `y` are strings, Seaborn looks them up as columns in the provided
dataframe `data`. But `x` and `y` can also be directly provided as columns.
This API is in unfortunate conflict with passing in `data` as the first,
required argument.

> I think that is completely unnecessary. (It also abuses a reserved
> namespace, but you've already said you don't care about that.) Instead
> of passing:
>
>     .pipe(myobject, args)  # myobject has a __pipe_func__ method
>
> just make it explicit and write:
>
>     .pipe(myobject.some_method, args)
>

This is a fair point. Writing something like:

.pipe(seaborn.scatterplot.df, 'x', 'y')

is not so much worst than omitting the .df.

> Yes. I love chaining in, say, bash, and it works well in Ruby, but it's
> less useful in Python. My attempt to help bring chaining to Python is
> here
>
> http://code.activestate.com/recipes/578770-method-chaining/
>
> but it relies on methods operating by side-effect, not returning a new
> result. But generally speaking, I don't like methods that operate by
> side-effect, so I don't use chaining much in practice. I'm always on the
> look-out for opportunities where it makes sense though.
>

I think this is where we have an advantage in the PyData world. We tend to
work less with built-in data structures and prefer to make our methods pure
functions, which together make chaining much more feasible.

Cheers,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/b65af67a/attachment.html>