[IPython-dev] [sympy] Re: using reST for representing the notebook cells+text

Mikhail Terekhov termim at gmail.com
Wed Feb 24 18:35:50 EST 2010


On Wed, Feb 24, 2010 at 4:04 PM, Robert Kern <robert.kern at gmail.com> wrote:
>
> I am almost certain that their use cases and workloads are much
> different than the notebook's would be. Python's parser isn't exactly
> a speed demon, either. A general statement like "XML is slow" followed
> by an unrelated anecdote is not terribly convincing. Show me
> experiments. I've attached mine. Python ends up being about 3 times
> slower than the equivalent XML for a variety of file sizes.
>

Believe it or not, I can't find any example of a python project that started
implementing scientific notebook as an XML document and then switched to
something else :) I've used a "scientific analogy" principle. Seriously,
the Subversion is a real project and they really suffered from the decision to
use XML as a storage for the workspace meta data and they really switched away
from XML. No anecdotes.

IMHO the relation is quite simple - the things like Mathematica's
notebooks tend to
multiply and form libraries or collections. In this case XML parsing
could became a
problem.

Your example is not quite correct but it is a good start :} It
actually illustrates two
important points. First is that writing serializer that produces XML
representation
is easy. The second and more important is that after parsing XML you've got
nothing but internal XML representation and the only thing you can do with it is
to write back to a file, you still need to implement all the
functionality as a python
objects and convert the XML tree into python objects tree. Only after
that there
will be any point in benchmarking. At the contrary, the python's
version is complete
and if you had the real Notebook implementation then you were just
ready to use it.
Also note that there is no need to write, debug and support _any_
parser/reader,
python provides that for free.

>
> I'm not talking about other projects adopting anything. I'm talking
> about basic capabilities of other languages, like JavaScript's builtin
> support for parsing XML. That enables *us* to build things in
> JavaScript.
>
>> BTW the fact that
>> everyone can parse XML doesn't mean that every one can _use_ the
>> data right away.
>
> Nor am I saying that. I am saying that it is enormously easier to
> build the JavaScript parser for the XML representation rather than the
> Python one.
>

That is the real question - why JavaScript needs to read _interenal_
representation of the nb if it is not going to implement all the
needed functionality
to use it?

>> One have to have an internal logic/library/API specific
>> to the data represented by some particular XML document. If you take
>> this into account then the value of the exchange document format
>> somewhat reduces. It is still not zero though and IMHO it is easy to
>> teach classes proposed by Brian to produce XML representation just
>> for the mythical interchange with something :)
>
> The need for interchange is not at all mythical. Web front ends are
> exactly what we are talking about in this thread.
>

Sure, and It looks like in his very interesting approach the
JavaScript part is a
client that queries python server for information about nb it needs and there is
no need for JS to read nb or even know how it is stored on disk.

More general: internal representation does not have to be tightly coupled to
interfaces to external systems. Simplicity and reliability of the
internal representation
(in this case - just a regular python compiler versus custom XML
parser) outweighs
the need to write relatively simple export/interface functions that
give a view on the nb.
As Ondrej's work shows they are needed anyway and of coarse they can use XML if
it is easy for the client.

>>> JavaScript being the hugely important player here. Certainly, you are
>>
>> Again, it is important to define to what degree the interoperability with
>> something like JavaScript is needed. If you plan to work on/modify/execute
>> the same nbs in Python and in JavaScript then you have to implement
>> compatible engine/API in Python _and_ in JavaScript. Are you sure you
>> want to do that? If only the representation or "computed" notebook is
>> needed for display purposes by JavaScript, then it is something different
>> and could be implemented through specialized repr methods.
>
> Or you could use the same mechanism for both instead of duplicating efforts.
>

Unfortunately one have to duplicate something in either case. nb->XML would
duplicate nb->repr, but as your example shows the nb->XML is quite
straightforward.
In case XML->nb one have to duplicate python compiler which is unnecessary
in case repr->nb.

>>> going to have a Python API that will represent that tree of text nodes
>>> as Python objects, but I just don't see the point of making the repr()
>>> of that be the lingua franca format of the notebook file. It's just a
>>> wasted opportunity.
>>
>> The point is that nb became a first class python object - just a module,
>> no need for specialized parser and you can work with it as with regular
>> Python module - just import and use it. The only difference is that nb is
>> mutable - if you modified it then you have to save it.
>
> I really don't see why having the file format be Python code makes it
> any more of a first class object. The objects are the first class

You are right - not the first class, just a native python object.

> objects. As long as loading to those objects is easy, the format just

In a sense I agree, the only difference is that from the programming POV
loading cost for repr->nb is zero (all is done by the regular python compiler)
and XML->nb requires a special loader that should be maintained and updated
when the application changes.

> doesn't matter. Loading an object by importing is actually a very
> inflexible and difficult to work with method compared to a function
> call.
>

If one prefers functions one can always to use __import__() or imp.load_module()
functions instead of import statement.


Regards,
--
Mikhail Terekhov



More information about the IPython-dev mailing list