[IPython-dev] Some Thoughts on Notebook Security

Fri Dec 14 01:34:07 EST 2012

Hi Brian

I think we're discussing different things altogether. The reason it
makes sense to distinguish between static views and everything else is
that static notebooks are not notebooks, they're just webpages. I know
you disagree with treating them differently when dealing with
security, but I know you're happy to hear other people's views too :)

> Most people running notebooks are just going to throw the notebook
> server up on a port and point people to it.  Many will not even create
> an actual domain, just use a raw IP address.  This will be true even
> when the notebook server gains multiuser capabilities.  I don't want
> to have to  tell people - to run the notebook server, you have to run
> servers on two separate domains with an infrastructure that allows
> those servers to work together.  We want the multiuser support to be
> as simply as:
>
> ipython notebook --multiuser ...
>
> on a single machine.

There wouldn't be two servers in that scenario. When a user wanted to
create a static view, IPython would save the notebook, run nbconvert
on it, producing HTML output, drop that into a template to create a
single HTML document with the images all embedded as base64 or SVG
literals and so on. That webpage can be passed around like any file
and viewed in any browser.

An (optional) hosting service, which would operate externally to any
of this, would be a simple server that could just host these static
views for people to look at, and people could easily embed them in
other webpages.

An (optional) web service with user accounts and the ability to share
actual notebooks and stuff like that, would never host user created
static views, or any user generated HTML, to prevent XSS attacks on
the main web service. It would instead use a hosting service to serve
notebooks from a different domain, and embed them wherever I wanted
them to appear, inside iframes.

> * When people want to create or view notebooks containing Javascript
> code, they have to go to a separate insecure domain.  This domain has
> to be completely sandboxed.

No one would create anything at the hosting domain, just post things
to it and view stuff from it. There's no sandbox at all. It's more or
less a file server.

The alternative, ripping out all the dangerous HTML, will likely
include removing all onhover, onload type events and any img tags, and
other stuff that makes requests to URLs on the user's behalf. It'll be
hard to get right.

Any plugin system, as I see it, can only provide generic code, so each
notebook that uses JS will contain some JS specific to that notebook
which will have to be checked by hand before it's trusted. You can't
just use common libraries, you need some unique code to put those
libraries to use, often fairly large and complex code. Maybe I don't
understand how this will work??

If we deal with static views of notebooks as just HTML documents, we
can deal with this side of things as a relatively simple, commonplace,
web security issue, and not have to take a baseball bat to user's
code.

Actual notebook files are a radically different thing, but these files
are not dangerous because they contain JS, they're dangerous because
they allow JS, Python and other code to forget the browser and execute
straight on the system. There's little point looking to web/browser
security on that front.

If the user knows a notebook file's JS is safe, by whatever means,
they still have to be very careful about the code cells, which can use
a range of languages, and IPython syntax, to make one thing look like
another. You only need to edit one character to make totally honest
code become deadly. I could register a GitHub username that's one
character different to a trusted one, then include a line in a
notebook that appears to grab a bit of handy code from the friendly
guys at IPython, but actually clones my repo. The obvious next two
lines are `cd newrepo` and `sudo python setup.py install`, where
setup.py is whatever I want it to be. They'll happily provide their
root password and it's all over.

Even if all notebooks have safe JS, it doesn't make them safe by a
long shot, and we can't sanitise code cells.

In any case, I think trust should be primarily based on people. If you
sent me some executable and told me to run it, I'd run it. At the end
of the day, you're Brain Granger; you don't go around sending people
malicious code (or at least, if you did, it'd be funny).

If I whitelist IPython's GitHub account as trustworthy, I should be
able to run any code from IPython's GitHub repos without any further
confirmation. It's from people that I trust. Whitelisting sources,
based on people and organisations, is more natural than whitelisting
plugins, which could always be hosted by trustworthy people anyway.

Security is not the same thing as trust, I know, but trust is another
key element in sharing code safely.

It's an open problem, and one we should be willing to clash over a
bit, so I hope you don't mind the rant. It's just after 6am here (up
with the kids) so excuse me if I'm a bit blunt. I need to go get
another cuppa.

P.S. All the best, and congratulations with securing funding. IPython
has a really exciting future, with or without JS in static views :)
I'm sure you'll have an awesome new year.

Carl