[IPython-dev] Integrating pandas into pylab

Wed Oct 26 09:25:02 EDT 2011

I'm in agreement with most of the ideas here.  As the author of pylab,
I've caught a lot of flack for dumping everything into a single
namespace as it is unpythonic, and in classes Fernando and I have been
pretty careful of late to use namespaces, and mpl facilitates this by
factoring the plotting part of pylab into pyplot.  It's a little more
confusing for students at first, but ultimately it helps the students
to know where things come from when they move from the interactive
environment to scripting.

That said, the major problem is that the current organization of the
major packages is not logical or intuitive.  numpy has arrays,
algorithms and IO, scipy has algorithms and IO, matplotlib has
plotting and algorithms and IO, pandas has datastructures, IO,
algorithms and plotting (albeit all organized around the dataframe).
And so on.  I think there is room for a namespace package that
integrates across these and makes it more intuitive.  The proper top
level namespaces are something like: array (or statstructures more
generally), algo, plot, io.  In this model, you would pull the
relevant components from numpy, scipy, mpl, pandas, scikits, ETS, etc
into the relevant namespaces.  Making everything work together would
be exceedingly tricky...  Note this is pretty much what scipy used to
be: at some point they jettisoned plotting and focused on the algo
part.  It's also a hard and big project and may not attract much
usage.  If someone pursues this, I would not call it pylab, as this
will just foster confusion.

Forgetting about the big problem, and focusing on Thomas original and
much more limited question of getting pandas into pylab, there are
three easy solutions:

* matplotlib.pylab can conditionally try/import * from pandas.  This
is the path of least resistance, and several of our developers may
object because they want less, not more, namespace dumping.
Nonetheless, it can probably be done.

* ipython can have it's own configurable "pylab" import which does an
import * from matplotlib.pylab and anything else you or the users want
by default.  The downside of configurable is that it makes it easier
to share histories, etc.

* pandas could be incorporated into numpy.  This is my favorite
solution since it would get pandas onto as many desktops as soon as
possible and we could all write code that relies on it being there.
Obviously pandas is on a much faster release schedule than numpy right
now, and should live on its own, but in six months time or so when Wes
is ready to take a breather, it would be great to see pandas
incorporated.  Then matplotlib.pylab would get it by default.

>> * It should be removed from matplotlib.

This is highly unlikely.  We are loathe to break backwards
compatibility, and this would be *major* breakage.  It's easier and
less confusing to simply use a different name.

JDH