[IPython-dev] Extending the Notebook File Format

Brian Granger ellisonbg at gmail.com
Fri Nov 23 19:17:28 EST 2012


On Mon, Nov 19, 2012 at 4:03 PM, Carl Smith <carl.input at gmail.com> wrote:
> Firstly, I'd like to say sorry to everyone on this list. I've been
> pretty vocal lately, mostly suggesting features that I'm only loosely
> committed to working on, or criticising features other people worked
> hard on. I really don't mean to be difficult. That said, I did want to
> bring up one more issue :)

No problem, we don't mind lots of ideas flying around.  It doesn't
mean we will implement every things, but it is good to discuss these
things.

> The current file format, .ipynb, works well for actual notebooks, but
> doesn't include the ancillary files, so it's hard to share your work.
> Obviously, users can put the notebook in the root directory of a
> project tree, ensure everything they want to share is in there
> somewhere, then zip it, but doing this by hand is not exactly
> convenient, and doesn't deal with stuff like config files well.
>
> I'd like to suggest creating a second file format, for packaged
> notebooks, that is just the current file format, but with the file and
> whatever else you'd like to include inside a zipped directory. IPython
> should recognise these package files and handle them, unzipping them
> behind the scenes.

When we originally designed the notebook format, we thought about
doing something like this.  We even wrote an NSF proposal that had
details on a notebook+data package format.  But I think our thinking
has evolved on this issue somewhat.  The big change in our thinking
has been driven by git/github.  At this point, git/github have become
a nearly universal way of collaborating and sharing code+data with
others.  Last Spring at PyCon, Fernando and I had many long
discussions about how we wanted to encourage sharing in the notebook.
Our conclusion was something like this:

* Everyone should already be putting their code+data on git/github or
other version control systems.
* These systems implement a nearly universal approach to sharing sets
of files that scale from person-to-person to very large teams.
* The default and recommended way of sharing in the notebook should be
through git repos (or other VCSs)
* We should build tools to encourage and support those usage patterns.
 An example of this type of integration would be a button to publish a
notebook to a gist.
* We should not try to reinvent the wheel in this area.
* This is consistent with ipython's notion of a notebook project,
namely that it is just a regular directory on your file system.

With these things in mind, we have mostly moved away from the idea of
a special ipython package format.

The other *massive* complete showstopper problem with having a package
format that consists of zipping up files is this:

It can't be version controlled.

In my mind, the future of open reproducible computing sits on top of
the foundation of version control.  Without that, all of our best
practices fall apart.

> A user could still create a regular notebook, and work loosely, so
> they wouldn't need to worry about project directories. They could save
> this as a regular notebook, just as they can now. Nothing should
> change on that front, but, if the user was interested in sharing or
> publishing their notebook along with their data, they would be
> encouraged to create a notebook project. Only a project could be
> reliably packaged.

Are there any usage cases that a git repo can't cover?

> It'd be nice to have a projectify tool that could take a notebook and
> check any paths that it uses, any extensions, any customisations or
> config files and raise warnings with dialogues if they're not where
> they need to be, so you could usually turn a sketchy notebook into a
> tidy project with a few clicks of the OK button.

I think there are multiple issues in this paragraph that really need
to be considered separately:

* What constitutes and ipython project
* Installation/libraries/environment
* Incorporating config into the notebook format.

> The Notebook interface would include New File and New Project options,
> so users will normally start with a regular file if they're just
> hacking stuff, and would start with a project if they intend to share
> it, but being able to turn a notebook into a project would be nice.
> The same tool would be able to check a project before it was packaged
> to ensure everything was still all present and correct.
>
> When working on a project, hitting Save would just update the notebook
> file, but hitting Package would run the validator and go through the
> motions, before zipping the project, ready for sharing. This process
> would copy any needed files from the user's current notebook profile
> too, so all that stuff would also be preserved in the package in a way
> that IPython knows how to handle.
>
> Longer term, this format could be extended to include an optional
> config file. If it was present, it'd be used for things like creating
> packages with many notebooks, that could be read like a book, or could
> include information on mounting big datasets or installing
> dependencies and so on, which would be especially handy for consistent
> environments like NotebookCloud or Wakari, where these types of things
> can be done pretty reliably with a script.

I think that part of the success of the ipython notebook is how we
don't try to layer anything on top of your files system.  Notebook
files are just plain files on your file system, there is no database
that adds additional layers of info on top of that, immediate
integration with git, etc.  While I think there are nice ideas in what
you are proposing, it would pull us away from this beautiful and
simple model that has served us so well.

Cheers,

Brian

> Again, sorry to be so noisy just lately, and thanks for being so good
> about it. I'm just sharing ideas in the hope that there's enough value
> in some of them to justify the bandwidth I've consumed.
>
> Cheers all
>
> Carl
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev



--
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu and ellisonbg at gmail.com



More information about the IPython-dev mailing list