[Numpy-discussion] Improving Python+MPI import performance

Travis Oliphant travis at continuum.io
Fri Jan 13 16:48:51 EST 2012


It is a straightforward thing to implement a "registry mechanism" for Python that by-passes imp.find_module (i.e. using sys.meta_path).  You could  imagine creating the registry file for a package or distribution (much like Dag described) and push that to every node during distribution.   

The registry file would have the map between

package_name : file_location

which would avoid all the failed open calls.     You would need to keep the registry updated as Dag describes, but this seems like a fairly simple approach that should help. 

-Travis





On Jan 13, 2012, at 2:38 PM, Sturla Molden wrote:

> Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn:
>> Another idea: Given your diagnostics, wouldn't dumping the output of
>> "find" of every path in sys.path to a single text file work well?
> 
> It probably would, and would also be less prone to synchronization 
> problems than using an MPI broadcast. Another possibility would be to 
> use a bsddb (or sqlite?) file as a persistent dict for caching the 
> output of imp.find_module.
> 
> Sturla
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list