[pypy-issue] Issue #2842: Running pyarrow on pypy segfaults (pypy/pypy)

Thu May 31 08:40:47 EDT 2018

New issue 2842: Running pyarrow on pypy segfaults
https://bitbucket.org/pypy/pypy/issues/2842/running-pyarrow-on-pypy-segfaults

bivald:

Hi,

I'm experimenting in running pyarrow on pypy. Arrow is "is a cross-language development platform for in-memory data" and on a more practical level allows you to read and write parquet files for pandas. It's a great format for number crunching and used heavily instead of csv with Apache Spark etc.

PyArrow is not trivial to install since there are no pre-build wheels, but in theory should work since it relies on numpy and pandas mainly (which PyPy supports). It's built on the Arrow C++ library and the Parquet C++ library.

Once built a lot of things work, but there are several tests that segfaults. I'm not sure if this is something in Pyarrow or PyPy so I'm opening a ticket here as well.

Ticket on PyArrow: https://github.com/apache/arrow/issues/2089

Since it's quite tricky to built I've created a Dockerfile which builds arrow and runs the tests, they can be found on https://github.com/bivald/pyarrow-docker-test - it relies on pypy2 docker image.