[IPython-dev] magics and metadata

Wed Jun 20 19:28:53 EDT 2012

On Wed, Jun 20, 2012 at 4:11 PM, Aaron Meurer <asmeurer at gmail.com> wrote:

> On Jun 20, 2012, at 5:04 PM, MinRK <benjaminrk at gmail.com> wrote:
>
>
>
> On Wed, Jun 20, 2012 at 3:09 PM, Aaron Meurer <asmeurer at gmail.com> wrote:
>
>> On Jun 20, 2012, at 11:06 AM, Brian Granger <ellisonbg at gmail.com> wrote:
>>
>> > On Tue, Jun 19, 2012 at 7:49 PM, MinRK <benjaminrk at gmail.com> wrote:
>> >>
>> >>
>> >> On Tue, Jun 19, 2012 at 7:25 PM, Brian Granger <ellisonbg at gmail.com>
>> wrote:
>> >>>
>> >>> On Tue, Jun 19, 2012 at 5:01 PM, MinRK <benjaminrk at gmail.com> wrote:
>> >>>>
>> >>>>
>> >>>> On Tue, Jun 19, 2012 at 4:20 PM, Brian Granger <ellisonbg at gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> When the metadata PR come up, I was originally going to vote -1 on
>> it
>> >>>>> because of this issue.  I sat on it for a while and in the end
>> decided
>> >>>>> that it was OK because I think the need for metadata is already upon
>> >>>>> us even though we don't have an actual usage case in our own code
>> base
>> >>>>> (for example, we don't have a metadata UI in the notebook web app).
>> >>>>>
>> >>>>> There is a fine line to walk here.  On one hand, I completely agree
>> >>>>> with you that we should try to future-proof the notebook format to
>> >>>>> minimize disruptive format changes.  On the other hand, adding
>> things
>> >>>>> too soon leads to even more potential disruption for the following
>> >>>>> reason.  As I developed the notebook format and notebook UI last
>> >>>>> summer, there were multiple situations where I added something to
>> the
>> >>>>> notebook format before I actually used it in the UI.  In many of
>> these
>> >>>>> cases, when I did get around to developing the UI for it, I realized
>> >>>>> that my original thoughts on that element were incomplete.  It
>> wasn't
>> >>>>> until I wrote the UI that used the data that I realized exactly what
>> >>>>> the format of that data needed to be.  As a result, I had to go back
>> >>>>> and modify the notebook format.  After a few iterations of this, I
>> >>>>> realized that this approach was broken and started to enforce the
>> >>>>> following simple rule on myself: don't add it to the notebook format
>> >>>>> until I am ready to write the UI code that uses it.  That rule
>> served
>> >>>>> me very well last summer.
>> >>>>>
>> >>>>> This is why for example the notebook and cells do not currently have
>> >>>>> any timestamp information (even though I think we will eventually
>> want
>> >>>>> it).  The one notebook feature (which I regret adding to the format)
>> >>>>> that doesn't have a UI is the multiple worksheets.  We absolutely
>> want
>> >>>>> that as a feature, I just wish I had waited to add it to the
>> notebook
>> >>>>> format.  When we do implement the mulitple worksheet UI, it is
>> likely
>> >>>>> we will want to go back and make changes to the notebook format to
>> >>>>> better reflect the UI (for example, we will probably want to persist
>> >>>>> which worksheet is active/open).
>> >>>>
>> >>>>
>> >>>> I couldn't agree less.  There is simply no reason that adding support
>> >>>> for
>> >>>> multiple worksheets in future versions of IPython should render
>> >>>> single-sheet
>> >>>> notebooks unreadable in 0.13, just like adding new metadata should
>> not
>> >>>> make
>> >>>> the notebook artificially unreadable.
>> >>>
>> >>> I am not sure I am following you on this.  Are you suggesting that
>> >>> 0.14 notebooks (let's say we bump to a v4 nbformat with expanded
>> >>> worksheet support) should be readable in 0.13?
>> >>
>> >>
>> >> I think I am saying the opposite - with the current state of 0.13,
>> adding
>> >> multi-worksheet support to the *javascript* should not result in
>> >> incrementing the notebook version.
>> >
>> > With the current state of the notebook format, I think we can probably
>> > pull this off.  So far, the only changes to the notebook format I can
>> > imagine will be minor version incrementing ones.
>> >
>> >>>
>> >>>
>> >>>>>
>> >>>>>
>> >>>>> For the cell and worksheet metadata, I knew we would eventually need
>> >>>>> it and I didn't want to hold up the beta release any longer.  But
>> >>>>> there are still unanswered questions related to it:
>> >>>>>
>> >>>>> * What types of things go in the metadata?
>> >>>>>
>> >>>>> * Is this an area for us to write data to, or for advanced users to
>> >>>>> write data to?
>> >>>>> * Is it entirely unstructured, or will we require a discussion for
>> >>>>> each new key/value entry into it.
>> >>>>>
>> >>>>>
>> >>>>> It is not at all clean that the current metadata design will hold up
>> >>>>> to our answers of these questions.  But in the end, I sort of wanted
>> >>>>> to add the metadata as it is now, so we could being to see how we
>> and
>> >>>>> others start to use it.  But just because we added the metadata to
>> the
>> >>>>> notebook format definitely doesn't mean that future-proofs this part
>> >>>>> of the notebook format.
>> >>>>>
>> >>>>>
>> >>>>> Hope this clarifies things a bit.
>> >>>>
>> >>>>
>> >>>> Sure, while it is extremely clear that we need cell metadata, we
>> cannot
>> >>>> be
>> >>>> 100% certain that
>> >>>> a simple dict will solve 100% of the cases we encounter.  But adding
>> it
>> >>>> now
>> >>>> means that we have at least a *chance*
>> >>>> of making a release that is not backwards-incompatible.
>> >>>
>> >>> Yes, I agree with this.
>> >>>
>> >>>>>
>> >>>>>
>> >>>>> Back to the question of output-level metadata.  When a bit of code
>> >>>>> remains unused for almost a year, I start to question whether we
>> >>>>> really need it.  I not convinced we don't need it, I am not sure.
>>  In
>> >>>>> light of this, I don't think that adding it to the notebook format
>> >>>>> makes sense.  When one of us finds a good purpose for this metadata,
>> >>>>> let's add it to the nbformat them.
>> >>>>
>> >>>>
>> >>>> I believe the only current use is in the parallel display
>> republishing,
>> >>>> where the engine ID is added to the display data
>> >>>> so that frontends could theoretically draw display data differently
>> >>>> based on
>> >>>> which engine it came from.
>> >>>
>> >>> Yes, we have discussed this.  The only other situation where I
>> >>> remember thinking about this is if we wanted to use metadata to help a
>> >>> frontend interpret JSON display data.  There are numerous reasons code
>> >>> might display JSON data, and that code would have to help the frontend
>> >>> to know what to do with that data.
>> >>>
>> >>> Do you think the engine ID idea makes sense to implement or should
>> >>> that information just be passed in the formatted display data itself?
>> >>> We could also handle by creating a custom JS widget that knows how to
>> >>> intelligently display data from multiple engines.
>> >>
>> >>
>> >> Right now I do both since the metadata is totally ignored, but I think
>> it's
>> >> better to have less markup in the output itself.  It is precisely the
>> same
>> >> reason we don't embed the rendered prompt in the output of execute
>> replies -
>> >> frontends have their own way of rendering them (in the prompt column,
>> etc.).
>> >>  The metadata could be used to do that for parallel results, rather
>> than the
>> >> current behavior of having fakee prompts in the general output area.
>> >
>> > OK if you think we want to go this route for displaying the engine
>> > IDs, then we should i) keep the display data metadata in the message
>> > itself and ii) move towards persisting that information in the
>> > nbformat.
>> >
>> >>>
>> >>>>>
>> >>>>>
>> >>>>> The other philosophical line of reasoning that I am being guided by
>> >>>>> here is simplicity.  It would be very easy to over design the
>> notebook
>> >>>>> format and add all sorts of feature that we might need.  I think
>> this
>> >>>>> is a wrong direction to go.  We want a notebook format that is as
>> >>>>> compact and minimal as possible, where each and every bit of data is
>> >>>>> there for a well-defined and justified reason.
>> >>>>
>> >>>>
>> >>>> I think it's simple: We have had ideas over and over and over again
>> for
>> >>>> features requiring metadata attached to cells (hashes, links,
>> >>>> timestamps,
>> >>>> etc.), so this is clearly a feature we have a need for right now.
>> >>>
>> >>> Yes - maybe I wasn't completely clear.  I do think that having cell
>> >>> and worksheet metadata right now does make sense.
>> >>>
>> >>>>  It would
>> >>>> be totally silly for adding timestamps to require updating the
>> nbformat
>> >>>> in a
>> >>>> backward-incompatible way.
>> >>>
>> >>> And I am definitely not suggesting that it would or should.
>> >>>
>> >>>>  And the biggest advantage of using json is that
>> >>>> adding keys has no effect on backwards *readability*.  It's only
>> adding
>> >>>> values/types that can cause problems, and should force new versions
>> >>>> (e.g.
>> >>>> changing worsheet to worksheets, or adding new cell types).
>> >>>
>> >>> Yes, JSON indeed turned out to be much nicer than XML for this type of
>> >>> thing exactly because of this.
>> >>>
>> >>> But I am wondering what your thought are about newer notebook versions
>> >>> being readable by older IPython versions.  I have always thought that
>> >>> we would promise that older nbformats would *always* be readable by
>> >>> newer IPython versions, but that we would make no promises about newer
>> >>> nformats being readable by older IPython versions.  I just want to
>> >>> clarify what other people are thinking in this respect.
>> >>
>> >>
>> >> Incrementing the nbformat means making notebooks unreadable in old
>> versions,
>> >> yes.
>> >> This is very painful if we are doing it every six months.  I am only
>> trying
>> >> to make
>> >> reasonable efforts that the current nbformat is prepared for changes we
>> >> *know* we intend to make soon,
>> >> so that incrementing the nbformat is reserved for changes we don't
>> already
>> >> have planned, and aren't
>> >> already prepared for.
>> >> Obviously, if we have a change that we cannot fit into the current
>> format,
>> >> then we increment.
>> >
>> > I honestly can't think of any upcoming changes to the notebook format
>> > that we have thought about which would require a major version
>> > increment like you are talking about.  I think there are lots of minor
>> > ones that we can do using minor version increments.  I like the minor
>> > versioning scheme we have now as it clarifies our policies on this.
>> > So I think overall, the notebook format is pretty future safe for the
>> > time being.  I hope we can stick with the 3.x nbformats for a few
>> > IPython releases.
>>
>> I'm curious what the effective difference between a minor version and
>> a major version would be to me, the user. Would you try to make minor
>> versions backward compatible if possible, either by not putting in new
>> keys if they don't need to be there or by somehow trying to future
>> proof the notebook to new unexpected notebook format changes?
>>
>
> Major version: totally unreadable, don't even try
> Minor revision: newer features are obviously unavailable, but the format
> is fundamentally readable
>
> The minor version stuff is not meant to make it impossible, or even any
> harder, to update the nbformat.  Only to give us a mechanism for expressing
>  "this notebook is newer, and may use features you don't have, but at least
> you can still read it", which we did not have before - there was no
> distinction between "created by exactly this version" and "totally
> unreadable".
>
>
> So you are going to attempt to keep minor versions backwards compatible?
>  Or maybe I'm misunderstanding what you mean by "readable".
>

Backward-compatible only in that the general file format remains readable.
 Obviously, if you make use of features that depend on the changes in the
minor-revision, that part of your notebook will not work.  But if the
fundamental format of the notebook does not change, users of 0.13 can open
the new notebooks, and will get a warning that it was created by newer
IPython.

>
> Aaron Meurer
>
>
>
>> Because as far as I, the user, am concerned, if a newer notebook
>> format version doesn't work at all in older versions of IPython (such
>> as is the case with notebook format v3 and IPython 0.12), then it
>> hardly matters how "major" or "minor" the changes were. Or maybe you
>> are thinking more for the benefit of people like Sage who are building
>> on top of the notebook API?
>>
>
>> By the way, I completely agree with Brian that future proofing is
>> usually a waste of time. But also be careful against overly "past
>> proofing". I would much rather see new features added to the notebook,
>> even every release, than to have them held back simply for the
>> purposes of keeping things backwards compatible. Also, if jumping the
>> gun on future proofing is a waste of time, so is spending a lot of
>> effort on making sure that new notebook versions work correctly in
>> older, unsupported releases.
>>
>
> I totally agree that we should not spend significant effort on future (or
> past) proofing, and we haven't.  Nor is there any reason this would cause
> resistance to new features that do require updating the nbformat.  If a
> hoop must be leapt through to keep the nbformat, then the nbformat should
> be updated.  We have a hoop threshold of zero.  This only aims to prevent
> *known, planned, imminent features* from necessarily forcing that
> unpleasantness (they still may, since they haven't actually been
> implemented).
>
> -MinRK
>
>
>>
>> Aaron Meurer
>>
>> >
>> >> But where we are right now, adding to the metadata on cells or adding
>> >> multiple worksheets will *not* require
>> >> bumping the nbformat.
>> >
>> > Right.
>> >
>> > Cheers,
>> >
>> > Brian
>> >
>> >>>
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Brian
>> >>>
>> >>>> -MinRK
>> >>>>
>> >>>>>
>> >>>>> Cheers,
>> >>>>>
>> >>>>> Brian
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Jun 19, 2012 at 3:25 PM, MinRK <benjaminrk at gmail.com>
>> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Jun 19, 2012 at 3:23 PM, Brian Granger <
>> ellisonbg at gmail.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> On Tue, Jun 19, 2012 at 3:19 PM, MinRK <benjaminrk at gmail.com>
>> wrote:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Tue, Jun 19, 2012 at 3:18 PM, Brian Granger
>> >>>>>>>> <ellisonbg at gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Jun 19, 2012 at 2:59 PM, Fernando Perez
>> >>>>>>>>> <fperez.net at gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>> On Tue, Jun 19, 2012 at 1:17 PM, MinRK <benjaminrk at gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>> Yes - we put metadata on outputs for a reason, presumably.  If
>> >>>>>>>>>>> this
>> >>>>>>>>>>> shouldn't be saved, it should probably be removed from the
>> >>>>>>>>>>> API.
>> >>>>>>>>>>
>> >>>>>>>>>> I can't recall precisely what we had in mind when we put it in,
>> >>>>>>>>>> but
>> >>>>>>>>>> something that springs to mind as potentially useful, for
>> >>>>>>>>>> example,
>> >>>>>>>>>> would be to specify a desired priority order for the various
>> >>>>>>>>>> types
>> >>>>>>>>>> of
>> >>>>>>>>>> outputs. Right now when a client can display several kinds of
>> >>>>>>>>>> output
>> >>>>>>>>>> it just makes a choice, but we could let objects provide a hint
>> >>>>>>>>>> of
>> >>>>>>>>>> the
>> >>>>>>>>>> preferred order, based on what they know about the relative
>> >>>>>>>>>> quality
>> >>>>>>>>>> of
>> >>>>>>>>>> each.
>> >>>>>>>>>
>> >>>>>>>>> I originally put it there to allow objects to provide hints to
>> >>>>>>>>> the
>> >>>>>>>>> frontend on how it should display a representation.  This is
>> >>>>>>>>> similar
>> >>>>>>>>> to how the payloads can indicate where it came from.
>> >>>>>>>>>
>> >>>>>>>>>> So I'd vote for not removing this, as it may prove useful...
>> >>>>>>>>>
>> >>>>>>>>> I also think it could be useful, although it seems a bit
>> >>>>>>>>> excessive
>> >>>>>>>>> to
>> >>>>>>>>> store metadata for each output.  Here is what I propose.  We
>> >>>>>>>>> simply
>> >>>>>>>>> leave it alone until we have an actual use case that will help
>> us
>> >>>>>>>>> figure out exactly what this should look like.  Without a
>> >>>>>>>>> concrete
>> >>>>>>>>> usage case, it is difficult to know what is needed.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> But this doesn't answer the immediate question: Should this
>> >>>>>>>> metadata
>> >>>>>>>> dict be
>> >>>>>>>> included in the nbformat
>> >>>>>>>
>> >>>>>>> I would vote no - not until we have a real usage case.  I don't
>> like
>> >>>>>>> to add things to the notebook format until we are actually using
>> >>>>>>> them.
>> >>>>>>
>> >>>>>>
>> >>>>>> Then should we remove all of the metadata stuff we just added?  The
>> >>>>>> whole
>> >>>>>> point was to prepare the nbformat for future changes to we don't
>> have
>> >>>>>> to
>> >>>>>> update the nbformat, which is incredibly painful and should be done
>> >>>>>> as
>> >>>>>> rarely as possible.
>> >>>>>>
>> >>>>>> -MinRK
>> >>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> f
>> >>>>>>>>>> _______________________________________________
>> >>>>>>>>>> IPython-dev mailing list
>> >>>>>>>>>> IPython-dev at scipy.org
>> >>>>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Brian E. Granger
>> >>>>>>>>> Cal Poly State University, San Luis Obispo
>> >>>>>>>>> bgranger at calpoly.edu and ellisonbg at gmail.com
>> >>>>>>>>> _______________________________________________
>> >>>>>>>>> IPython-dev mailing list
>> >>>>>>>>> IPython-dev at scipy.org
>> >>>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> _______________________________________________
>> >>>>>>>> IPython-dev mailing list
>> >>>>>>>> IPython-dev at scipy.org
>> >>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Brian E. Granger
>> >>>>>>> Cal Poly State University, San Luis Obispo
>> >>>>>>> bgranger at calpoly.edu and ellisonbg at gmail.com
>> >>>>>>> _______________________________________________
>> >>>>>>> IPython-dev mailing list
>> >>>>>>> IPython-dev at scipy.org
>> >>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> IPython-dev mailing list
>> >>>>>> IPython-dev at scipy.org
>> >>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Brian E. Granger
>> >>>>> Cal Poly State University, San Luis Obispo
>> >>>>> bgranger at calpoly.edu and ellisonbg at gmail.com
>> >>>>> _______________________________________________
>> >>>>> IPython-dev mailing list
>> >>>>> IPython-dev at scipy.org
>> >>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> IPython-dev mailing list
>> >>>> IPython-dev at scipy.org
>> >>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Brian E. Granger
>> >>> Cal Poly State University, San Luis Obispo
>> >>> bgranger at calpoly.edu and ellisonbg at gmail.com
>> >>> _______________________________________________
>> >>> IPython-dev mailing list
>> >>> IPython-dev at scipy.org
>> >>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> IPython-dev mailing list
>> >> IPython-dev at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> >>
>> >
>> >
>> >
>> > --
>> > Brian E. Granger
>> > Cal Poly State University, San Luis Obispo
>> > bgranger at calpoly.edu and ellisonbg at gmail.com
>> > _______________________________________________
>> > IPython-dev mailing list
>> > IPython-dev at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/ipython-dev
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20120620/205971a7/attachment.html>