[Numpy-discussion] Improving Python+MPI import performance

Fri Jan 13 15:19:11 EST 2012

On 01/13/2012 02:13 AM, Asher Langton wrote:
> Hi all,
>
> (I originally posted this to the BayPIGgies list, where Fernando Perez
> suggested I send it to the NumPy list as well. My apologies if you're
> receiving this email twice.)
>
> I work on a Python/C++ scientific code that runs as a number of
> independent Python processes communicating via MPI. Unfortunately, as
> some of you may have experienced, module importing does not scale well
> in Python/MPI applications. For 32k processes on BlueGene/P, importing
> 100 trivial C-extension modules takes 5.5 hours, compared to 35
> minutes for all other interpreter loading and initialization. We
> developed a simple pure-Python module (based on knee.py, a
> hierarchical import example) that cuts the import time from 5.5 hours
> to 6 minutes.
>
> The code is available here:
>
> https://github.com/langton/MPI_Import
>
> Usage, implementation details, and limitations are described in a
> docstring at the beginning of the file (just after the mandatory
> legalese).
>
> I've talked with a few people who've faced the same problem and heard
> about a variety of approaches, which range from putting all necessary
> files in one directory to hacking the interpreter itself so it
> distributes the module-loading over MPI. Last summer, I had a student
> intern try a few of these approaches. It turned out that the problem
> wasn't so much the simultaneous module loads, but rather the huge
> number of failed open() calls (ENOENT) as the interpreter tries to
> find the module files. In the MPI_Import module, we have rank 0
> perform the module lookups and then broadcast the locations to the
> rest of the processes. For our real-world scientific applications
> written in Python and C++, this has meant that we can start a problem
> and actually make computational progress before the batch allocation
> ends.

This is great news! I've forwarded to the mpi4py mailing list which 
despairs over this regularly.

Another idea: Given your diagnostics, wouldn't dumping the output of 
"find" of every path in sys.path to a single text file work well? Then 
each node download that file once and consult it when looking for 
modules, instead of network file metadata.

(In fact I think "texhash" does the same for LaTeX?)

The disadvantage is that one would need to run "update-python-paths" 
every time a package is installed to update the text file. But I'm not 
sure if that that disadvantage is larger than remembering to avoid 
diverging import paths between nodes; hopefully one could put a reminder 
to run update-python-paths in the ImportError string.

> If you try out the code, I'd appreciate any feedback you have:
> performance results, bugfixes/feature-additions, or alternate
> approaches to solving this problem. Thanks!

I didn't try it myself, but forwarding this from the mpi4py mailing list:

"""
I'm testing it now and actually
running into some funny errors with unittest on Python 2.7 causing
infinite recursion.  If anyone is able to get this going, and could
report successes back to the group, that would be very helpful.
"""

Dag Sverre