[Python-ideas] The pipe protocol, a convention for extensible method chaining

Stephan Hoyer shoyer at gmail.com
Tue May 26 01:38:20 CEST 2015


In the PyData community, we really like method chaining for data analysis
pipelines:

(iris.query('SepalLength > 5')
 .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
         PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
 .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))


Unfortunately, method chaining isn't very extensible -- short of monkey
patching, every method we want to use has exist on the original object. If
a user wants to supply their own plotting function, they can't use method
chaining anymore.

You may recall that we brought this up a few months ago on python-ideas as
an example of why we would like macros.

To get around this issue, we are contemplating adding a pipe method to
pandas DataFrames. It looks like this:

def pipe(self, func, *args, **kwargs):
    pipe_func = getattr(func, '__pipe_func__', func)
    return pipe_func(self, *args, **kwargs)


We would encourage third party libraries with objects on which method
chaining is useful to define a pipe method in the same way.

The main idea here is to create an easy way for users to do method chaining
with their own functions and with functions from third party libraries.

The business with __pipe_func__ is more magical, and frankly we aren't sure
it's worth the complexity. The idea is to create a "pipe protocol" that
allows functions to decide how they are called when piped. This is useful
in some cases, because it doesn't always make sense for functions that act
on piped data to accept that data as their first argument.

For more motivation and examples, please read the opening post in this
GitHub issue: https://github.com/pydata/pandas/issues/10129

Obviously, this sort of protocol would not be an official part of the
Python language. But because we are considering creating a de-facto
standard, we would love to get feedback from other Python communities that
use method chaining:
1. Have you encountered or addressed the problem of extensible method
chaining?
2. Would this pipe protocol be useful to you?
3. Is it worth allowing piped functions to override how they are called by
defining something like __pipe_func__?
Note that I'm not particularly interested in feedback about how we
shouldn't be defining double underscore methods. There are other ways we
could spell __pipe_func__, but double underscores seems to be pretty
standard for ad-hoc protocols.
Thanks for your attention.
Best,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150525/55c106d2/attachment-0001.html>


More information about the Python-ideas mailing list