[pypy-dev] Rollback interpreter state to fork for unittesting

Wed Mar 19 11:54:02 CET 2014

On Wed, Mar 19, 2014 at 1:21 PM, Armin Rigo <arigo at tunes.org> wrote:
> Hi Anatoly,
>
> On 19 March 2014 10:42, anatoly techtonik <techtonik at gmail.com> wrote:
>>> It's possible to do something like that in RPython, if you ignore all
>>> the additional complications like tracking raw-memory too; it looks
>>> like an infinite amount of painful work to me, but well, it's not my
>>> time :-)
>>
>> Fair point. =) I am thinking about bytecode machine. Virtualization
>> software like virtualbox allow to save state at run-time and restore it
>> later at the exact point - continue to run the system from the moment
>> it was saved. And they do this in incremental way - keeping track of
>> what memory and disk have been touched.
>>
>> So, can interpreter, while playing bytecode, do keep track of these
>> things and save/restore the state the same way? Is that possible
>> currently? If not, then why and what can be done?
>
> It's not fundamentally easier or harder to do than it would be doing
> the same thing on CPython or any custom C program.
>
> While I can imagine coming up with a proof of concept very quickly,
> that would save and restore only the GC-managed objects; the real pain
> starts when needing to track changes done to general low-level memory,
> which is not possible in general.  You would instead need some gross
> hack that copies the entire content of the memory of a process to
> emulate a fork(), which could also be done for CPython or any custom C
> program.  How to do it concretely on a specific OS like Windows is
> left as an exercice to the reader, but as a starting point, look at
> how Cygwin implements fork().

I don't know C well enough to read that code. Is it possible to describe this
in C independent manner? If I understand correctly, the problem starts
when you interact with some specific OS API calls and calls to .dll and
.so modules that use low level memory API during dynamic imports?

Are there other reasons? I'd like to get the idea what is the exact scope
when the rollback is still possible?

> The only advantage of PyPy, if you want, is that we can *add* an extra
> small complication on top of that, which is the aforementioned custom
> way to track the content of the GC objects.  Given that this is
> hopefully the biggest part of the memory, doing so would give a boost
> to the performance of the fork() emulation written as described above.

I don't feel confident that this is enough. Tracking GC memory is a cool
thing, and it would help to understand the problem better it is also helpful
to get notifications when something is done outside of interpreter sandbox.

The goal is like to track that Python bytecode was safe to rollback up to a
forking point (after unittest initialization is finished, for example).

The next step would be to annotate the exact system calls to calm down
the interpreter (and developers) and tell them what is the nature of these
calls and how to deal with them on rollback.