[IPython-dev] magics and metadata

Tue Jun 19 22:25:00 EDT 2012

On Tue, Jun 19, 2012 at 5:01 PM, MinRK <benjaminrk at gmail.com> wrote:
>
>
> On Tue, Jun 19, 2012 at 4:20 PM, Brian Granger <ellisonbg at gmail.com> wrote:
>>
>> When the metadata PR come up, I was originally going to vote -1 on it
>> because of this issue.  I sat on it for a while and in the end decided
>> that it was OK because I think the need for metadata is already upon
>> us even though we don't have an actual usage case in our own code base
>> (for example, we don't have a metadata UI in the notebook web app).
>>
>> There is a fine line to walk here.  On one hand, I completely agree
>> with you that we should try to future-proof the notebook format to
>> minimize disruptive format changes.  On the other hand, adding things
>> too soon leads to even more potential disruption for the following
>> reason.  As I developed the notebook format and notebook UI last
>> summer, there were multiple situations where I added something to the
>> notebook format before I actually used it in the UI.  In many of these
>> cases, when I did get around to developing the UI for it, I realized
>> that my original thoughts on that element were incomplete.  It wasn't
>> until I wrote the UI that used the data that I realized exactly what
>> the format of that data needed to be.  As a result, I had to go back
>> and modify the notebook format.  After a few iterations of this, I
>> realized that this approach was broken and started to enforce the
>> following simple rule on myself: don't add it to the notebook format
>> until I am ready to write the UI code that uses it.  That rule served
>> me very well last summer.
>>
>> This is why for example the notebook and cells do not currently have
>> any timestamp information (even though I think we will eventually want
>> it).  The one notebook feature (which I regret adding to the format)
>> that doesn't have a UI is the multiple worksheets.  We absolutely want
>> that as a feature, I just wish I had waited to add it to the notebook
>> format.  When we do implement the mulitple worksheet UI, it is likely
>> we will want to go back and make changes to the notebook format to
>> better reflect the UI (for example, we will probably want to persist
>> which worksheet is active/open).
>
>
> I couldn't agree less.  There is simply no reason that adding support for
> multiple worksheets in future versions of IPython should render single-sheet
> notebooks unreadable in 0.13, just like adding new metadata should not make
> the notebook artificially unreadable.

I am not sure I am following you on this.  Are you suggesting that
0.14 notebooks (let's say we bump to a v4 nbformat with expanded
worksheet support) should be readable in 0.13?

>>
>>
>> For the cell and worksheet metadata, I knew we would eventually need
>> it and I didn't want to hold up the beta release any longer.  But
>> there are still unanswered questions related to it:
>>
>> * What types of things go in the metadata?
>>
>> * Is this an area for us to write data to, or for advanced users to
>> write data to?
>> * Is it entirely unstructured, or will we require a discussion for
>> each new key/value entry into it.
>>
>>
>> It is not at all clean that the current metadata design will hold up
>> to our answers of these questions.  But in the end, I sort of wanted
>> to add the metadata as it is now, so we could being to see how we and
>> others start to use it.  But just because we added the metadata to the
>> notebook format definitely doesn't mean that future-proofs this part
>> of the notebook format.
>>
>>
>> Hope this clarifies things a bit.
>
>
> Sure, while it is extremely clear that we need cell metadata, we cannot be
> 100% certain that
> a simple dict will solve 100% of the cases we encounter.  But adding it now
> means that we have at least a *chance*
> of making a release that is not backwards-incompatible.

Yes, I agree with this.

>>
>>
>> Back to the question of output-level metadata.  When a bit of code
>> remains unused for almost a year, I start to question whether we
>> really need it.  I not convinced we don't need it, I am not sure.  In
>> light of this, I don't think that adding it to the notebook format
>> makes sense.  When one of us finds a good purpose for this metadata,
>> let's add it to the nbformat them.
>
>
> I believe the only current use is in the parallel display republishing,
> where the engine ID is added to the display data
> so that frontends could theoretically draw display data differently based on
> which engine it came from.

Yes, we have discussed this.  The only other situation where I
remember thinking about this is if we wanted to use metadata to help a
frontend interpret JSON display data.  There are numerous reasons code
might display JSON data, and that code would have to help the frontend
to know what to do with that data.

Do you think the engine ID idea makes sense to implement or should
that information just be passed in the formatted display data itself?
We could also handle by creating a custom JS widget that knows how to
intelligently display data from multiple engines.

>>
>>
>> The other philosophical line of reasoning that I am being guided by
>> here is simplicity.  It would be very easy to over design the notebook
>> format and add all sorts of feature that we might need.  I think this
>> is a wrong direction to go.  We want a notebook format that is as
>> compact and minimal as possible, where each and every bit of data is
>> there for a well-defined and justified reason.
>
>
> I think it's simple: We have had ideas over and over and over again for
> features requiring metadata attached to cells (hashes, links, timestamps,
> etc.), so this is clearly a feature we have a need for right now.

Yes - maybe I wasn't completely clear.  I do think that having cell
and worksheet metadata right now does make sense.

>  It would
> be totally silly for adding timestamps to require updating the nbformat in a
> backward-incompatible way.

And I am definitely not suggesting that it would or should.

>  And the biggest advantage of using json is that
> adding keys has no effect on backwards *readability*.  It's only adding
> values/types that can cause problems, and should force new versions (e.g.
> changing worsheet to worksheets, or adding new cell types).

Yes, JSON indeed turned out to be much nicer than XML for this type of
thing exactly because of this.

But I am wondering what your thought are about newer notebook versions
being readable by older IPython versions.  I have always thought that
we would promise that older nbformats would *always* be readable by
newer IPython versions, but that we would make no promises about newer
nformats being readable by older IPython versions.  I just want to
clarify what other people are thinking in this respect.

Cheers,

Brian

> -MinRK
>
>>
>> Cheers,
>>
>> Brian
>>
>>
>>
>> On Tue, Jun 19, 2012 at 3:25 PM, MinRK <benjaminrk at gmail.com> wrote:
>> >
>> >
>> > On Tue, Jun 19, 2012 at 3:23 PM, Brian Granger <ellisonbg at gmail.com>
>> > wrote:
>> >>
>> >> On Tue, Jun 19, 2012 at 3:19 PM, MinRK <benjaminrk at gmail.com> wrote:
>> >> >
>> >> >
>> >> > On Tue, Jun 19, 2012 at 3:18 PM, Brian Granger <ellisonbg at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Tue, Jun 19, 2012 at 2:59 PM, Fernando Perez
>> >> >> <fperez.net at gmail.com>
>> >> >> wrote:
>> >> >> > On Tue, Jun 19, 2012 at 1:17 PM, MinRK <benjaminrk at gmail.com>
>> >> >> > wrote:
>> >> >> >> Yes - we put metadata on outputs for a reason, presumably.  If
>> >> >> >> this
>> >> >> >> shouldn't be saved, it should probably be removed from the API.
>> >> >> >
>> >> >> > I can't recall precisely what we had in mind when we put it in,
>> >> >> > but
>> >> >> > something that springs to mind as potentially useful, for example,
>> >> >> > would be to specify a desired priority order for the various types
>> >> >> > of
>> >> >> > outputs. Right now when a client can display several kinds of
>> >> >> > output
>> >> >> > it just makes a choice, but we could let objects provide a hint of
>> >> >> > the
>> >> >> > preferred order, based on what they know about the relative
>> >> >> > quality
>> >> >> > of
>> >> >> > each.
>> >> >>
>> >> >> I originally put it there to allow objects to provide hints to the
>> >> >> frontend on how it should display a representation.  This is similar
>> >> >> to how the payloads can indicate where it came from.
>> >> >>
>> >> >> > So I'd vote for not removing this, as it may prove useful...
>> >> >>
>> >> >> I also think it could be useful, although it seems a bit excessive
>> >> >> to
>> >> >> store metadata for each output.  Here is what I propose.  We simply
>> >> >> leave it alone until we have an actual use case that will help us
>> >> >> figure out exactly what this should look like.  Without a concrete
>> >> >> usage case, it is difficult to know what is needed.
>> >> >
>> >> >
>> >> > But this doesn't answer the immediate question: Should this metadata
>> >> > dict be
>> >> > included in the nbformat
>> >>
>> >> I would vote no - not until we have a real usage case.  I don't like
>> >> to add things to the notebook format until we are actually using them.
>> >
>> >
>> > Then should we remove all of the metadata stuff we just added?  The
>> > whole
>> > point was to prepare the nbformat for future changes to we don't have to
>> > update the nbformat, which is incredibly painful and should be done as
>> > rarely as possible.
>> >
>> > -MinRK
>> >
>> >>
>> >>
>> >> >>
>> >> >>
>> >> >> > f
>> >> >> > _______________________________________________
>> >> >> > IPython-dev mailing list
>> >> >> > IPython-dev at scipy.org
>> >> >> > http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Brian E. Granger
>> >> >> Cal Poly State University, San Luis Obispo
>> >> >> bgranger at calpoly.edu and ellisonbg at gmail.com
>> >> >> _______________________________________________
>> >> >> IPython-dev mailing list
>> >> >> IPython-dev at scipy.org
>> >> >> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > IPython-dev mailing list
>> >> > IPython-dev at scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Brian E. Granger
>> >> Cal Poly State University, San Luis Obispo
>> >> bgranger at calpoly.edu and ellisonbg at gmail.com
>> >> _______________________________________________
>> >> IPython-dev mailing list
>> >> IPython-dev at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >
>> >
>> >
>> > _______________________________________________
>> > IPython-dev mailing list
>> > IPython-dev at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >
>>
>>
>>
>> --
>> Brian E. Granger
>> Cal Poly State University, San Luis Obispo
>> bgranger at calpoly.edu and ellisonbg at gmail.com
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>

-- 
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu and ellisonbg at gmail.com