[Python-ideas] The pipe protocol, a convention for extensible method chaining

Tue May 26 11:46:36 CEST 2015

On May 25, 2015 6:45 PM, "Stephan Hoyer" <shoyer at gmail.com> wrote:
>
> In the PyData community, we really like method chaining for data analysis
pipelines:
>
> (iris.query('SepalLength > 5')
>  .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
>          PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
>  .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))
>
>
> Unfortunately, method chaining isn't very extensible -- short of monkey
patching, every method we want to use has exist on the original object. If
a user wants to supply their own plotting function, they can't use method
chaining anymore.

>
> You may recall that we brought this up a few months ago on python-ideas
as an example of why we would like macros.
>
> To get around this issue, we are contemplating adding a pipe method to
pandas DataFrames. It looks like this:
>
> def pipe(self, func, *args, **kwargs):
>     pipe_func = getattr(func, '__pipe_func__', func)
>     return pipe_func(self, *args, **kwargs)
>
>
> We would encourage third party libraries with objects on which method
chaining is useful to define a pipe method in the same way.
>
> The main idea here is to create an easy way for users to do method
chaining with their own functions and with functions from third party
libraries.
>
> The business with __pipe_func__ is more magical, and frankly we aren't
sure it's worth the complexity. The idea is to create a "pipe protocol"
that allows functions to decide how they are called when piped. This is
useful in some cases, because it doesn't always make sense for functions
that act on piped data to accept that data as their first argument.
>
> For more motivation and examples, please read the opening post in this
GitHub issue: https://github.com/pydata/pandas/issues/10129
>
> Obviously, this sort of protocol would not be an official part of the
Python language. But because we are considering creating a de-facto
standard, we would love to get feedback from other Python communities that
use method chaining:
> 1. Have you encountered or addressed the problem of extensible method
chaining?

* https://pythonhosted.org/pyquery/api.html
* SQLAlchemy

> 2. Would this pipe protocol be useful to you?

What are the advantages over just returning 'self'? (Which use cases are
not possible with current syntax?)

In terms of documenting functional composition, I find it easier to test
and add comment strings to multiple statements.

Months ago, when I looked at creating pandasrdf (pandas #3402), there is
need for a (...).meta.columns w/ columnar URIs, units, (metadata: who,
what, when, how). Said metadata is not storable with e.g. CSV; but is with
JSON-LD, RDF, RDFa, CSVW.

It would be neat to be able to track provenance metadata through [chained]
transformations.

> 3. Is it worth allowing piped functions to override how they are called
by defining something like __pipe_func__?

"There should be one-- and preferably only one --obvious way to do it."

> Note that I'm not particularly interested in feedback about how we
shouldn't be defining double underscore methods. There are other ways we
could spell __pipe_func__, but double underscores seems to be pretty
standard for ad-hoc protocols.
> Thanks for your attention.
> Best,
> Stephan
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150526/a5dd4c3e/attachment-0001.html>