[pypy-dev] Object pinning

Kunshan Wang kunshan.wang at anu.edu.au
Wed Dec 21 23:56:17 EST 2016


Hi folks,

I have a question regarding object pinning in RPython.

Consider the following snippet from rpython/rtyper/lltypesystem/rstr.py

    @jit.oopspec('stroruni.copy_contents(src, dst, srcstart, dststart,
length)')
    @signature(types.any(), types.any(), types.int(), types.int(),
types.int(), returns=types.none())
    def copy_string_contents(src, dst, srcstart, dststart, length):
        """Copies 'length' characters from the 'src' string to the 'dst'
        string, starting at position 'srcstart' and 'dststart'."""
        # xxx Warning: don't try to do this at home.  It relies on a lot
        # of details to be sure that it works correctly in all cases.
        # Notably: no GC operation at all from the first cast_ptr_to_adr()
        # because it might move the strings.  The keepalive_until_here()
        # are obscurely essential to make sure that the strings stay alive
        # longer than the raw_memcopy().
        assert length >= 0
        ll_assert(srcstart >= 0, "copystrc: negative srcstart")
        ll_assert(srcstart + length <= len(src.chars), "copystrc: src ovf")
        ll_assert(dststart >= 0, "copystrc: negative dststart")
        ll_assert(dststart + length <= len(dst.chars), "copystrc: dst ovf")
        # from here, no GC operations can happen
        asrc = _get_raw_buf(SRC_TP, src, srcstart)
        adst = _get_raw_buf(DST_TP, dst, dststart)
        llmemory.raw_memcopy(asrc, adst, llmemory.sizeof(CHAR_TP) * length)
        # end of "no GC" section
        keepalive_until_here(src)
        keepalive_until_here(dst)
    copy_string_contents._always_inline_ = True
    copy_string_contents = func_with_new_name(copy_string_contents,
                                              'copy_%s_contents' % name)

There is a region where heap objects in the RPython heap is accessed
externally by native programs.  I understand that GC must neither
recycle the object nor move it in the memory.  But I have two questions
about how object pinning is done in RPython:

(1) From the perspective of the RPython user (e.g. high-level language
implementer, interpreter writer, library writer, ...), what is the
"protocol" to follow when interacting with native programs (such as
passing a buffer to the `read` syscall)?  I have seen idiomatic use of
`cast_ptr_to_adr` followed by `keepalive_until_here`.  But there is also
`pin` and `unpin` functions in the rpython/rlib/rgc.py module.  What is
the expected way for *the user* to pin objects for native access?

(2) From the perspective of the RPython developer (those who develop the
translation from RTyped CFGs to C source code, assembly code and machine
code), how does the C backend actually enforce the no-GC policy between
"from here, no GC operations can happen" and "end of 'no GC' section"?
As I observed, keepalive_until_here essentially generates a no-op inline
assembly that "uses" the variable so that the C compiler keeps that
variable alive.  But what is preventing GC from happening?


Some background: We are building a new backend for RPython on the Mu
micro virtual machine (https://gitlab.anu.edu.au/mu/mu-client-pypy).
This VM has built-in GC and exception handling, so they don't need to be
injected after RTyper.  But the micro VM also keeps the representation
of object references opaque.  The only way to "cast" references to
addresses is using the "object pinning" operation which returns its
address.  The idiom in Mu when passing a buffer to native programs is
that you "pin" the object, give the pointer to the native functions
(such as `memcpy`, `read` and `write`), and then "unpin" it (nested
pinning is allowed, and it needs to be unpinned as many times as it was
pinned).  The GC will neither reclaim the object nor move it when it is
pinned, but object pinning does not prevent GC from happening: GC can
still move other objects, but not the pinned ones.

So the crux is how to translate the RPython primitives into the Mu
counterparts.  If `cast_ptr_to_adr` and `keep_alive_until_here` is a
well-obeyed idiom, we can simply translate them to `pin` and `unpin`,
respectively.

The problem is, it only works for non-GC types.  Mu cannot copy
references in this way.  There are several problems: (1) Mu does not
want to expose the byte-by-byte representation of references.  So
references may not really be addresses, and may not be copied naively as
a word (it could contain bit flags, too).  (2) Mu does not prevent the
movement of other object. So if an array of references is pinned, the
objects its elements point *to* may still be moved.  I guess the reason
why the snippet above prevents *all* kinds of GC activity is because it
may be used to copy GC-managed object references, too.  If this is the
case, we have to find alternative solutions.

Regards,
Kunshan Wang
School of Computer Science
Australian National University

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20161222/e7e5e4b3/attachment-0001.sig>


More information about the pypy-dev mailing list