[IPython-dev] Project idea: Automatic lab notebook for iPython

Brian Granger ellisonbg at gmail.com
Mon Apr 22 18:16:01 EDT 2013


Peter,

Thanks for getting in touch.

> Here is a new project idea: automatic lab notebook for iPython and
> iPython Notebook, which would keep track of how each of your output
> files was produced, linking this "history" (or a "lineage") of an object
> across different iPython sessions and different iPython notebooks, and
> storing it persistently. This is frequently referred to in the Computer
> Science literature as "provenance."

We have talked to a few other groups about this.  Overall, I think the
IPython notebook would be a great environment for provenance
capabilities.

> It will enable you to ask questions like "what did I do to produce this
> plot?" - and for example, it will tell you that you downloaded the input
> data set on Monday from such and such website, you ran all these
> commands to process the data on Tuesday, and then produced this plot on
> Thursday from a different iPython session. Note that this goes beyond
> (and is complementary in purpose to) iPython Notebook, since the history
> of a file is tracked across different sessions and Notebooks, and when
> you ask a question, you will get only the relevant information,
> suppressing any additional things that you did that are unrelated to the
> file in which you are interested.
>
> We are in touch with computational scientists all the way from
> bioinformatics to physics that are very interested in this feature! We
> met their needs partially by developing a cross-platform, multi-lingual
> library (https://code.google.com/p/core-provenance-library/) that they
> can use to annotate their Python (and non-Python) scripts in order to
> track the lineage of their objects.

There are a few question to answer about these capabilities:

* Should someone implement provenance capabilities for the IPython
notebook?  Who?

For this, I think the answer is yes! but that it shouldn't be the core
IPython developers.  We have a huge number of things we are already
working on and are trying to remain extremely focused.  For details
about what we are working on for the next 2 years see our roadmap:

https://github.com/ipython/ipython/wiki/Roadmap:-IPython

Even beyond this two year horizon, we have a huge amount of work at
the core of the notebook.  Because of this, when people approach us
with additional idea we are basically saying "sounds really cool, you
should do it on your own and let us know if there are ways we can
structure the codebase better to enable your work."  The important
thing here is that we want this type of thing to be possible and we
realize that will require us making changes to IPython to make it
easier.  Btu we are unable to work on everything ourselves.

* Should these efforts be part of the standard IPython notebook or a
separate code base/project?

My personal feeling is that the provenance capabilities should start
out as a third party project that is layered on top of IPython.  It
would add a lot of complexity that most of our users don't need, so I
don't think it makes sense for this to be part of the standard
notebook.

> Our vision is that this will be all done fully automatically, without
> requiring the users to manually annotate their scripts. But
> unfortunately neither of us who are involved in this project has the
> resources or the knowledge of the iPython code-base to tackle this
> challenge. We need your help to make this happen! We have some ideas
> about how we might go about this, but we need someone who knows more
> about iPython to talk them over and to spearhead the actual development.
> Please let us know if you can help!

I hope I don't sound like too much of a kill-joy, but I think you are
going to have to find the resource to do this yourself.  I don't think
it will be a massive amount of work, but you are going to need someone
who can really dive into the notebook architecture and IPython
codebase and work on this full time.  Don't get me wrong though, I
*love* the idea.

Cheers,

Brian

> Thank you,
>
> Peter Macko
>
> Harvard School of Engineering and Applied Sciences
> 33 Oxford St.
> Cambridge, MA 02138
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev



--
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu and ellisonbg at gmail.com



More information about the IPython-dev mailing list