[pypy-issue] Issue #2842: Running pyarrow on pypy segfaults (pypy/pypy)
bivald
issues-reply at bitbucket.org
Thu May 31 08:40:47 EDT 2018
New issue 2842: Running pyarrow on pypy segfaults
https://bitbucket.org/pypy/pypy/issues/2842/running-pyarrow-on-pypy-segfaults
bivald:
Hi,
I'm experimenting in running pyarrow on pypy. Arrow is "is a cross-language development platform for in-memory data" and on a more practical level allows you to read and write parquet files for pandas. It's a great format for number crunching and used heavily instead of csv with Apache Spark etc.
PyArrow is not trivial to install since there are no pre-build wheels, but in theory should work since it relies on numpy and pandas mainly (which PyPy supports). It's built on the Arrow C++ library and the Parquet C++ library.
Once built a lot of things work, but there are several tests that segfaults. I'm not sure if this is something in Pyarrow or PyPy so I'm opening a ticket here as well.
Ticket on PyArrow: https://github.com/apache/arrow/issues/2089
Since it's quite tricky to built I've created a Dockerfile which builds arrow and runs the tests, they can be found on https://github.com/bivald/pyarrow-docker-test - it relies on pypy2 docker image.
More information about the pypy-issue
mailing list