[IPython-dev] Some Thoughts on Notebook Security

Thu Dec 13 22:55:45 EST 2012

Matthias,

On Wed, Dec 12, 2012 at 11:37 AM, Matthias BUSSONNIER
<bussonniermatthias at gmail.com> wrote:
>
> Le 12 déc. 2012 à 18:46, Brian Granger a écrit :
>
>> OK trying to catch up with this thread...
>>
>> * Matthias brings up a great point that we need to consider
>> forged/hostile notebooks that were not created with our notebook
>> server.  The solution to this is to make sure that our notebook server
>> can "clean" notebooks before displaying them.  The solutions that I
>> posted above will address this completely and allow people to simply
>> open notebooks from the web without worrying about them.
>>
>> * I still feel like I don't have a straight answer to my question:
>> will the solutions I proposed solve all of the security problems while
>> allowing us to serve authenticated notebooks on single domains?  If
>> now, what security problems would remain?
>>
>>>
>>> To fix this, we need to enable the HTML sanitizer that comes with the
>>> JS Markdown rendered that we are using.  This is what StackOverflow
>>> uses to sanitize their markdown and should completely remove any
>>> security risks coming from within markdown cells.
>>
>>
>>> * In CodeCell output, the Javascript repr is dynamically passed
>>> into eval.  This only happens when code is run, not when the notebook
>>> is loaded, so it is less critical, but still needs to be fixed.
>>
>>> Oh, yes forgot about that, we will have to clean that HTML as well.
>>
> Which can be resumes in clean html/js in all output.
> (as rendered md cell are actually "output")
>
> Then yes.
>
>
>>
>> * I am confused about the multiple domain solution that is being
>> proposed.  The idea is that notebooks containing arbitrary Javascript
>> would be served from a separate domain that does not offer any
>> authentication.  The authenticated notebooks would live on a second
>> domain that doesn't allow Javascript.  But then how is a user supposed
>> to author a notebook with Javascript and have that notebook not be
>> anonymous?  What if a user wants to author a notebook with Javascript
>> that needs to be private?
>
> let see.
> Multi domain just allow to have fine-grained privileges in cookies.
>
> You can login to wordpress with your google account right ?
> But then it only have access to your identity, not your mails right ?
>
> Then you do the same under the hood with subdomains.
> (for the sake of simplicity i'll use domains names that have meanings, and a simplified logic.)
>
> user logs in https://iamroot.ipython.org.
> He/She have acces to his/her dashboard.
>
> Get to notification "Other User want you to see a notebook QUX [open]"
>
> click on Open opens the notebook in another tabs.
>
> https://comment-only.ipython.org/view/QUX.ipynb
> The user is still "logged in", but any request to https://iamroot.ipython.org will fail because of xss policies.
> And obviously the only thing malicious JS could do is post comment on the current notebook.
>
> The point is to restrain the number of actions available from a certain domain name.
> As you control the authorization of a domain name server side, nothing prevent a user to grant more rights to a specific "domain" or notebook.

I understand that multiple domains allow a fine grained approach to
authentication.

Most people running notebooks are just going to throw the notebook
server up on a port and point people to it.  Many will not even create
an actual domain, just use a raw IP address.  This will be true even
when the notebook server gains multiuser capabilities.  I don't want
to have to  tell people - to run the notebook server, you have to run
servers on two separate domains with an infrastructure that allows
those servers to work together.  We want the multiuser support to be
as simply as:

ipython notebook --multiuser ...

on a single machine.

But even if we forced everyone to use a multiple domain notebook
server I don't understand how it would work.  Here is my
understanding:

* There would be a secure domain that would not allow any dynamic
Javascript in notebooks.  We would simply clean it out everytime in
the notebook app.
* Because this domain is "safe" this is where people would want to
author and keep all of their notebooks.
* Because dynamic Javascript is cleaned out every time, this domain
could not be used to create Javascript containing notebooks.
* In my mind, life on this domain is exactly as I propose above = we
simply don't allow JS because it could do bad things.
* Notebooks on this domain can safely be given access to the users
regular kernels - IOW, these notebooks are completely equivalent to
having shell access to the users files.

* When people want to create or view notebooks containing Javascript
code, they have to go to a separate insecure domain.  This domain has
to be completely sandboxed.
* The unsecure domain can't be allowed to edit/remove/etc. notebooks
from the secure domain, other wise hostile Javascript code could
completely destroy the safe notebooks.
* Notebooks authored on this domain have to stay on this domain, so
users will have to manage two sets of notebooks.
* Kernels on this domain have to be completely sandboxed as
potentially hostile Javascript code could use Kernel.execute to do
anything on the kernel.  These kernels could not even be allowed
network connectivity as users could inject Js code that write python
code that launches DOS attacked on the server or other hosts on the
net.
* Notebooks that users share with each other have to remain on this
domain as they might contain hostile JS code.
* Hostile JS code on this domain would have the ability to completely
screw up notebooks on this domain.  So I could write a notebook that
will delete all of your other JS containing notebooks if you open it?

Is this a correct summary of how the multiple domains would work?  Am
I missing or misunderstanding things?  Would we give notebooks on the
insecure domain more capabilities or access to the safe domain side of
things?  Is dynamic Javascript less risky than I am understanding so
we wouldn't have to isolate kernels?

Cheers,

Brian

> It does not prevent collaborating in anyways,
> But if you share kernel you share filesystem, so there are no point in really using subdomain at this stage.
> Still it gives another layer of security as if  there is a way to inject javascript, this javascript will not find any
> malicious things to do.
>
> Or it allows you "test" untrusted plugins..
>
> I let you imagine other stuff.
> --
> Matthias
>
>
>
>> * I don't see any fundamental different between "static notebooks" and
>> "notebooks with kernels" - both can have Javascript and the new
>> Javascript plugins will work on both.  Both have the same overall
>> security issues and I don't think it makes sense to try and handle
>> them separately.
>>
>> Cheers,
>>
>> Brian
>>
>>
>>
>>
>> On Tue, Dec 11, 2012 at 9:15 AM, Matthias BUSSONNIER
>> <bussonniermatthias at gmail.com> wrote:
>>>
>>> Le 11 déc. 2012 à 16:10, Carl Smith a écrit :
>>>
>>>> Hi Brian
>>>>>
>>>>> The idea is that the extra Javascript cool-stuff will be installed by
>>>>> the person who runs the notebook server once and for all notebooks on
>>>>> that server.  Similar to how python packages are installed = you do
>>>>> this before you start python.  To get data from python to the
>>>>> Javascript plugins we will use JSON objects and trigger the callbacks
>>>>> to handle them.
>>>>
>>>> This seems to be dependent on a kernel, which static notebooks don't
>>>> have? If I generate a static notebook, which is just a web page, then
>>>> post that page to a hosting service, or email it to someone, how would
>>>> the plugins work? Maybe we're looking at two slightly different
>>>> scenarios. I'm focussed on static views only. The host should not have
>>>> to allow anything more than posting and getting HTML documents.
>>>
>>> IIRC, you can embed several repr in the ipynb file.
>>> So you could provide a plugin that can "render" object on static view.
>>> (like d3.js graph, you don't need the kernel to do that)
>>>
>>>> ================
>>>>
>>>> Hi Matthias
>>>>
>>>>> Static notebooks, served from a different domain, could be rendered
>>>>> inside iframes, enabling us to embed them inside other webpages and
>>>>> applications. These notebooks would still be superficially served by
>>>>> our own servers, so the UX wouldn't be effected.
>>>>>
>>>>> keep in mind that iframe are not sandboxed, and you can inject js on parent
>>>>> frame.
>>>>> Unless you use the sandbox attributes, which is part of html5 but not
>>>>> implemented in every
>>>>> browser… And not yet infallible, it is more a "we'll help you embed other
>>>>> pages by providing a separate
>>>>> js namespace but we don't guaranty yes that the VM is unbreachable"
>>>>
>>>
>>>> I pretty sure iframes are sandboxed in the sense that a parent page
>>>> and an iframe can not communicate unless they have the same origin,
>>>> and this is an old feature. The new sandbox attribute in HTML5 is for
>>>> a different purpose.
>>>
>>> But this is still kind of a problem, as usually the static view will be served from the "same origin"
>>> as the rest of the website.
>>>
>>> If you don't want to make a notebook fully public, you have to have some kind of authentication
>>> that allows you to load it.
>>>
>>> I'm still doubt a little about what frames are supposed to do and what they actually do.
>>> I'm not an expert on that, but it is still worth digging.
>>>
>>>>> Responsible disclosure don't want to say much more but but having a
>>>>> statically display
>>>>> notebook is often link to having a "sharing/import" button which is
>>>>> dangerous.
>>>>> And could lead to self propagating notebook through account that can infect
>>>>> other
>>>>> notebooks, or share itself on twitter...
>>>>
>>>> Any buttons, like for importing a notebook, would live in the parent
>>>> page and would have no access to, nor allow access from, the iframe.
>>>> The parent page would know which static notebook it embeds in the
>>>> iframe though, so it could provide buttons that connect to the actual
>>>> notebook in question, which is a totally different file to the static
>>>> notebook being rendered in the iframe anyway.
>>>
>>> I understand what you want to do, I guess the definition of "static" is blurry.
>>> If you want a perfectly static version (does it make sense in html) you can go with iframe.
>>> If you want the ability to comment on a particular cell, then you have to build iframe for
>>> every cell.
>>> and you lose the ability to comment "inline" as github does.
>>>
>>>
>>>>
>>>>> Multi domain is a real good idea. I have a clear view in my head on how we
>>>>> could use that in a way close to OAuth to allow javascript by still having
>>>>> "logged-in" users.
>>>>> It wouldn't be as seamless a something like github, but close.
>>>>
>>>> I think we're looking at things differently: You seem to be
>>>> considering static views as something generated on the fly and on
>>>> demand, nbviewer style. I'm thinking about running nbconvert on a
>>>> notebook, then keeping the output as a webpage to be copied and passed
>>>> around freely. Once the static notebook exists, it's a done deal.
>>>> There's no chance of any changes to IPython breaking it. It's a
>>>> independent webpage. Updating it would amount to deleting it and
>>>> replacing it with a new version.
>>>
>>> I don't think those are quite different.
>>> You can have a "perfectly static" version that embeds bad js and require some kind of authentication to be seen.
>>> The line between "on the fly" and static is thin.
>>>
>>>>
>>>>> The **big** question is:
>>>>> Are viewer logged in (in any way) to the given server, and if so do they
>>>>> have the right to do anything else with those credentials ?
>>>>> If it is just a public notebook viewer, then it's fine.
>>>>>
>>>>> If you want something more "interactive" (sharing/ permissions…etc, and the
>>>>> display any JS ) you won't have much choice.
>>>>> Or you will have a painful multi-login.
>>>>
>>>> I'm very much against hosting user submitted notebooks on any domain
>>>> with cookie based authentication. It needs to be divided into 'trusted
>>>> domain', where no user's JS will ever be served, and 'hosting domain'
>>>> that has no account system of it's own. The trusted domain would
>>>> control the hosting domain, as a kind of slave.
>>>
>>> Yep, kind of what I have in mind.
>>> The hosting domain can have "tokens"
>>> Publish this comment on this notebook on the behalf of ...
>>> The you have to "validate" those action on the "trusted domain".
>>> --
>>> Matthias
>>>
>>>
>>>
>>>>
>>>> That's just my take on all this.
>>>>
>>>> Cheers
>>>>
>>>> Carl
>>>> _______________________________________________
>>>> IPython-dev mailing list
>>>> IPython-dev at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>
>>> _______________________________________________
>>> IPython-dev mailing list
>>> IPython-dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>
>>
>>
>> --
>> Brian E. Granger
>> Cal Poly State University, San Luis Obispo
>> bgranger at calpoly.edu and ellisonbg at gmail.com
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev

--
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu and ellisonbg at gmail.com