[pypy-dev] Running untrusted code in pypy

Mon Feb 19 21:27:02 CET 2007

Vinj Vinj wrote:
 >> How is the data shared? Using files or somehow
 >> differently?
 >
 > Slices of numeric python arrays

Then the foremost problem will probably be that PyPy is not close to
supporting numeric arrays :-). I guess that will change at one point.

 >> custom mark-and-sweep garbage collector. This
 >> collector collects quite a
 >> bit information while it is running, especially how
 >> much non-dead memory
 >> is used currently. This would make it possible to
 >> impose a hard limit there.
 >
 > Ok. But this limit would be for the entire app and not
 > per user model. This should be fine, I would just take
 > the penalty of the OS/interpretor than releasing back
 > all the unused memory.

There can be more advanced solutions: The GC has "memory pools" and you
could have a solution where the main app has its own (unlimited) pool
and the user models have limited pools.

 > I think again os based timeout interrupts would work
 > fine? Do you see
 > any downside of using os level interrupts? Any way
 > that the application
 > would not be able to catch them?

There is a downside to os-level interrupts (both in CPython and in
PyPy): Only the intepreter main loop checks for interrupts, that means
that this does not work against 2**(something big) if I see it
correctly. I cannot think of a good way to fix this, I fear.

 >> You could fix the recursion limit.
 >
 > Again this would be for the entire application and not
 > per user model.
 >
 >> Another thing I can see there is accessing the file
 >> system in malicious
 >> ways. Can be fixed on the OS level, I guess. You
 >> could not include
 >> things like socket into your PyPy interpreter
 >> executable.
 >
 > This is the tricky part. The main python application
 > used a lot of
 > cPython libraries, so not including them in the
 > interpreter was not an
 > option. I was hoping that there would be some other
 > way which could tell
 > the pypy interpreter, before it executes a certain
 > piece of code, that
 > access to the following list of modules ([x, y, z...])
 > is allowed.

This is something which is quite hard to enforce, given Python's very
introspective nature. There are some ideas to support a rather strict
distinction between two different sorts of code within the same process
with PyPy. You would have two interpreters in the same executable, one
for trusted, one for sandboxed code. The sandboxed interpreter would
only get access to a very limited set of modules and builtins. The
trusted interpreter could somehow "control" what sort of operations the
untrusted part would be allowed to do.

This is quite a mess to implement correctly (but easier than in CPython,
I suppose), but might give a general solution to this set of problems.

Cheers,

Carl Friedrich