[IPython-dev] History

Fernando Perez fperez.net at gmail.com
Fri Feb 18 10:48:56 EST 2011


Hi all,

sorry that this will be brief and not very thought-through, but I can
only sneak in short periods while at the conference...

Many thanks to Thomas for getting this work going!  But I think, since
we now have a bit more manpower and good momentum going, it's worth
thinking a little about the key points we want to hit so we end up
with something really solid.  Comments below...

On Wed, Feb 16, 2011 at 5:10 PM, Thomas Kluyver <takowl at gmail.com> wrote:

> - Each command is stored instantly, so we do away with the need for an
> autosave timer thread. A crash at any stage should leave your entire history
> intact up to the last command completed.

Instant saving has one problem: frequent disk usage prevents hard
drives from spinnning down when on battery.  The idea of an auto-save
thread on a timer with a user-controllable delay has the advantage
that the user can control their power consumption profile to fit their
needs.

On an international flight when you're trying to squeeze every last
bit of your battery, this matters a lot.  We don't want to turn
ipython into the thing that eats up your battery life to death just by
virtue of running very simple interactive commands that in principle
are purely CPU/memory resident, but because we generate lots of
sideband disk activity.

So we should keep this consideration in mind in the design.  If we
don't think about it now, it will be much harder to retrofit a decent
power profile later on.

> - We store only raw history on disk (I think raw history is what we're
> looking for 90% of the time). If we want translated history, we redo the
> translation on the fly (this should require minimal computation, unless I've
> missed something).
> I've been having a think about how we store history:
>
> At present: Commands entered are stored in two lists (raw and "translated" -
> i.e. turning magic commands into function calls). These are persisted to
> disk at the next input every time 60 seconds elapse, with storage in a JSON
> file, which is reloaded into the same lists (and into readline history) when
> starting IPython. Each command entered is also immediately persisted in the
> "shadow history", a collection of files in .ipython managed by the
> pickleshare DB. The output objects are also stored in a dictionary by prompt
> number, but I'm less concerned with that here.
>
> Uses:
> - Readline history (getting previous commands via up arrow)
> - Various magic commands (save, macro, hist) can access ranges of input,
> using the prompt numbers from the current session.
> - %hist -g allows searching shadow and current history with glob syntax.
> - %rep can access ranges of this session's history, or single lines from
> shadow history.
> (These are all I've found so far - please let me know if there are others)

No, we must store the translated history for two reasons:

1. Some translations are dynamic and context-dependent, so they can
not be recomputed later (though these are the minority).

2. More importantly, the translation process is relatively cpu
intensive, while disk space is the absolutely cheapest resource in
existence (at least at the data storage volumes we're talking about
here).  So it makes sense to store on disk these results once we have
computed them, rather than recomputing them all over later on reload.

> - History is indexed by session number and prompt number. This provides a
> sensible behaviour if we have two IPython shells open together - the second
> one to be opened will be the latter session (and will be able to access
> commands entered in the other session as soon as they are completed).
> - For magic commands, accessing a line from a previous session could look
> like "-1#9" (9th line of immediately previous session).
> - On starting IPython, we load the last (~40 lines/~2 sessions) from the
> database into readline history.
>
> Thoughts? Have I overlooked some key reason we use the existing system? Is
> there a better alternative to SQLite? Would you design it differently? I've
> not written any code for this yet, so I'm open to ideas. But if people think
> that makes sense, I'm volunteering to make it happen.

Finally, but importantly, I'm somewhat reluctant to go to sqlite until
we've fully shown that accomplishing our design goals is
hard/impossible with a simple json persisted data structure.  While
sqlite is indeed lightweight and available to us, it's also a more
complex api to program against than simply storing a list/dict on
disk.  I'm OK accepting that complexity price *if we need it*, but I'm
not convinced we do yet.  Specifically, we need to answer: what is
precisely the functionality that we want, that is hard/impossible to
implement on json and easy/possible with sqlite?

Cheers,

f



More information about the IPython-dev mailing list