[IPython-dev] Integrating pandas into pylab

Thu Oct 27 10:45:53 EDT 2011

On Wed, Oct 26, 2011 at 12:03 PM, Brian Granger <ellisonbg at gmail.com> wrote:

>>  Making everything work together would
>> be exceedingly tricky...  Note this is pretty much what scipy used to
>> be: at some point they jettisoned plotting and focused on the algo
>> part.  It's also a hard and big project and may not attract much
>> usage.
>
> What are the main areas that would make it tricky?  I don't have
> enough experience with numpy/scipy/matplotlib to know this.

What I was thinking about is you have these different data structures
(arrays, structured arrays, record arrays, dataframes) potentially all
being located in "datastructures" or "io", and then users are trying
to call "algo.something" methods on them, or pass them to "plot"
methods, and these methods may be specialized for one type of
structure but not another.  Eg, matplotlib.mlab.csv2rec, pandas
read_csv, and np.loadtxt can all parse a CSV file and are candidates
for inclusion into an IO module.  But each returns a different data
structure.  To the naive user, they are back where they started,
needing to understand numpy, matplotlib, scipy and pandas in order to
understand what they got back from the io function.  And then passing
it off to a stats function or plotting routine would require further
understanding of what kind of data structure that interface supports.
Unless you wanted to wrap and encapsulate the underlying APIs like
SAGE does, a huge task, you don't really get around the many packages
with overlapping functionality problem by reorganizing the namespaces.

JDH