[pypy-issue] Issue #2617: pypy binary is linked to too much stuff, which breaks manylinux wheels (pypy/pypy)

Nathaniel Smith issues-reply at bitbucket.org
Tue Jul 25 01:05:35 EDT 2017


New issue 2617: pypy binary is linked to too much stuff, which breaks manylinux wheels
https://bitbucket.org/pypy/pypy/issues/2617/pypy-binary-is-linked-to-too-much-stuff

Nathaniel Smith:

With pypy 5.8, `ldd pypy` shows that it's linked to `libbz2`, `libcrypto`, `libffi`, `libncurses`, ... Likewise for pypy3 5.8 and `ldd pypy3.5`.

You would not think so, but it turns out that this is a big problem for distributing wheels.

The issue is that the way ELF works, any libraries that show up in `ldd $TOPLEVELBINARY` effectively get LD_PRELOADed into any extension modules that you load later. So, for example, if some wheel distributes its own version of openssl, then any symbols that show up in both their copy of openssl and pypy's copy of openssl will get shadowed and hello segfaults.

The cryptography project recently ran into this with uwsgi: https://github.com/pyca/cryptography/issues/3804#issuecomment-317401627

Fortunately this has not been a big deal so far because, uh... nobody distributes pypy wheels. But in the future maybe this is something that should be supported :-). And while in theory it would be nice if this could be fixed on the wheel side, [this is not trivial](https://github.com/pypa/auditwheel/issues/79).

The obvious solution would be to switch things around so that the top-level pypy executable does `dlopen("libpypy-c.so", RTLD_LOCAL)` to start the interpreter, instead of linking against it with `-lpypy-c`. Then the symbols from `libpypy-c.so` and everything it links to would be confined to an ELF local namespace, and would stop polluting the namespace of random extension modules.

However... there is a problem, which is that cpyext extension modules need *some* way to get at the C API symbols, and I assume cffi extension modules need access to some pypy symbols as well.

This is... tricky, given how rpython wants to mush everything together into one giant .so, and ELF makes it difficult to only expose *some* symbols from a binary like this. Some options:

* when using libcrypto or whatever from rpython, use `dlopen("libcrypto", RTLD_LOCAL)` instead of `-lcrypto`. I guess this could be done systematically in rffi?
* provide a special `libcpyext` that uses `dlopen` to fetch the symbols from `libpypy-c.so` and then manually re-exports them?




More information about the pypy-issue mailing list