Building a HPC data assimilation system using Python?

Matthew Francis mattjamesfrancis at gmail.com
Thu May 30 00:27:54 EDT 2013


I have a prototype data assimilation code ( an ionospheric nowcast/forecast model driven by GPS data ) that is written in IDL (interactive data language) which is a horrible language choice for scaling the application up to large datasets as IDL is serial and slow (interpreted).

I am embarking on a project to convert this prototype into an operational parallel HPC code. In the past I've used C++ for this kind of project and am comfortable using MPI. On the other hand, I've recently started using python and appreciate the flexibility and speed of development using python compared with C++. I have read that there is a trend to use python as the high level 'glue' for these kind of large number crunching projects, so it would seem appropriate to go down that path. There are a number of C++ and FORTRAN(!) libraries I'd need to incorporate that handle things such as the processing of raw GPS data and computing ionospheric models, so I'd need to be able to make the appropriate interface for these into python.

If anyone uses python is this way, I'd appreciate any tips, hints, things to be careful about and in general any war stories you can relate that you wish you'd heard before making some mistake.

Here are the things I have investigated that it looks like I'd probably need to use:

* scipy/numpy/matplotlib
* Cython (or pyrex?) for speeding up any bottlenecks that occur in python code (as opposed to C++/FORTRAN libraries)
* MPI for Python (mpi4py). Does this play nice with Cython?
* Something to interface python with other language libraries. ctypes, swig, boost? Which would be best for this application?
* Profiling. profile/cprofile are straightforward to use, but how do they cope with a parallel (mpi4py) code?
* If a C++ library call has its own MPI calls, does that work smoothly with mpi4py operating in the python part of the code?

Sorry if some of this is a little basic, I'm trying to get up to speed on this a quick as I can.

Thanks in advance!



More information about the Python-list mailing list