[IPython-dev] Kernel-client communication

Fernando Perez fperez.net at gmail.com
Wed Sep 8 03:02:30 EDT 2010


Hi Almar,

sorry for the delayed reply, I hunkered down on the code and dropped
everything else for the weekend :)

On Fri, Sep 3, 2010 at 1:30 AM, Almar Klein <almar.klein at gmail.com> wrote:
> I've been thinking about your protocol last night. (arg! This whole thing is
> getting to my head too much!)

I think I know the feeling :)

> I see some advantages of loose messages such as your using in favor of my
> channels approach. Specifically, the order of stdout and stderr messages is
> not preserved, as they are send over a different channel (same socket
> though). On a 1to1 system, this is not much of a problem, but for multiple
> users it definitely will be, specifically if you also want to show the other
> users input (pyin).

Yes, and those notions of collaborative, distributed work are very
much central to our ideas.

> Anyway, I took another good look at your approach and I have a few
> remarks/ideas. Might be useful, might be total BS.

Hans already made some very good comments pretty much 100% in line
with what I would have said, so I won't repeat his.

> * You say the prompt is a thing the client decides what it looks like. I
> disagree. The kernel gives a prompt to indicate that the user can give
> input, and maybe also what kind of input. A debugger might produce a
> different prompt to indicate that the user is in debug mode. See also the
> doc on sys.ps1 and sys.ps2 (http://docs.python.org/library/sys.html): the
> user (or interpreter) can put an object on it that evaluates to something
> nice when str()'nged.

Following up on Hans' comments, yes: a history synchronization
mechanism is available (or at least the basic info is there).  The
client keeps its history local for fast things like 'up arrow', but
when you type '%history', that's executed in the kernel.  A client who
joins a kernel can always request a full history to bring its local
copy up to date.

> * You plan on keeping history in the kernel. In this case I think this is
> the task of the client. Otherwise you'd get your own history mixed with that
> of someone else using that kernel? History is, I think, a feature of the
> client to help the programmer. I see no use for storing it at the kernel.
>
> * I really think you can do with less sockets. I believe that the (black)
> req/rep pair is not needed. You only seem to use it for when raw_input is
> used. But why? When raw_input is used, you can just block and wait for some
> stdin (I think that'll be the execute_request message). This should not be
> too hard by replacing sys.stdin with an object that has a readline method
> that does this. If two users are present, and one calls raw_input, they can
> both provide input (whoever's first). To indicate this to the *other* user,
> however, his prompt should be replaced with an empty string, so his cursor
> is positioned right after the <text> in raw_input('<text>').

Keep in mind that the direction of those sockets (the normal xreq/xrep
pair for client input and the req/rep for kernel stdin) is opposite,
and that's because they represent fundamentally different operations.

I'm not worried about 'too many sockets', I would worry about having
more sockets, or *less* sockets, than we have separate, independent
concepts and streams of information.  It seems that now, we do have
one socket pair for each type of information flow, and this way we
only multiplex when the data may vary but the communication semantics
are constant (such as having multiple streams in the pub/sub).  I
think this actually buys us a lot of simplicity, because each
connection has precisely the socket and semantics of the type of
information flow it needs to transfer.

> * I think you can do with even less sockets :)  But this is more of a wild
> idea. Say that John set up an experiment at work and wants to check the
> results in the bar on his Android (sorry I stole your example here,
> Fernando). Now his experiment crashed, producing a traceback in the client
> at his work PC. But now he cannot see the traceback as he just logged in!
> -----  So what about storing all stdout, stderr and pyin (basically all
> "terminal-output") at the kernel? And rather than pub/sub, use the existing
> req/rep to obtain this stuff. Maybe you can even pass new terminal-output
> along with other replies. The client should indicate in the request a
> sequence number to indicate to the kernel what messages were already
> received. This does mean, however, that the client would have to
> periodically query the kernel. But maybe this can also be done automatically
> by writing a thin layer on top of the zmq interface. Oh, and you'd need to
> encapsulate multiple terminal-messages in a single reply.

Rather than forcing the kernel to store all that info and play back
multiple data streams, I'd separate this (valid) idea into its own
entity.  Someone could easily write a client whose job is simply to
listen to a kernel and store all streams of information, nicely
organized for later replay as needed.  Because this client would only
be storing the string data of the messages, there's no danger of
making the kernel leak memory by asking it to hold on to every object
it has ever produced (a very dangerous proposition).  And such a
logger could then save that info, replay it for you, make it available
over the web, selectively SMS you if a message on a certain stream
matches something, etc.

For example, with this design it becomes *trivial* to write a little
program that subscribes only to the pyerr channel, monitors
exceptions, and sends you an SMS if an exception of a particular type
you care about arrives.  Something like that could probably be written
in 2 hours.  And that, thanks to the clear separation of information
flow semantics across the various channels, that make it very easy to
grab/focus only on the data you need.

It's to some extent trying to carry the ideas from Unix software:
rather than making one single object (a super socket) that does 20
different things, create a few simple ones that do each one job well,
with relative decoupling and isolation from each other, and then use
the parts you need to build what you want.

So far I have to say it's working fantastically well, though we could
always be proven wrong by some horrible bottleneck we haven't yet
foreseen :)

But thanks for your feedback and ideas: only if we can explain and
clarify our thoughts sufficiently to justify them, can we be sure that
we actually understand what we're doing.

Regards,

f



More information about the IPython-dev mailing list