Building a HPC data assimilation system using Python?

Carlos Nepomuceno carlosnepomuceno at outlook.com
Thu May 30 18:34:26 EDT 2013


Hi Matthew! I'm on a similar quest!

I'm still learning the basics of Python so I may not be a good source of information.

I'm reading a lot of stuff about how to use Python for the parallelization of code and data and found BSP[1] to be very interesting and perhaps worth the time to learn it! ;)


[1] http://www.multicorebsp.com/



----------------------------------------
> Date: Wed, 29 May 2013 21:27:54 -0700
> Subject: Building a HPC data assimilation system using Python?
> From: mattjamesfrancis at gmail.com
> To: python-list at python.org
>
> I have a prototype data assimilation code ( an ionospheric nowcast/forecast model driven by GPS data ) that is written in IDL (interactive data language) which is a horrible language choice for scaling the application up to large datasets as IDL is serial and slow (interpreted).
>
> I am embarking on a project to convert this prototype into an operational parallel HPC code. In the past I've used C++ for this kind of project and am comfortable using MPI. On the other hand, I've recently started using python and appreciate the flexibility and speed of development using python compared with C++. I have read that there is a trend to use python as the high level 'glue' for these kind of large number crunching projects, so it would seem appropriate to go down that path. There are a number of C++ and FORTRAN(!) libraries I'd need to incorporate that handle things such as the processing of raw GPS data and computing ionospheric models, so I'd need to be able to make the appropriate interface for these into python.
>
> If anyone uses python is this way, I'd appreciate any tips, hints, things to be careful about and in general any war stories you can relate that you wish you'd heard before making some mistake.
>
> Here are the things I have investigated that it looks like I'd probably need to use:
>
> * scipy/numpy/matplotlib
> * Cython (or pyrex?) for speeding up any bottlenecks that occur in python code (as opposed to C++/FORTRAN libraries)
> * MPI for Python (mpi4py). Does this play nice with Cython?
> * Something to interface python with other language libraries. ctypes, swig, boost? Which would be best for this application?
> * Profiling. profile/cprofile are straightforward to use, but how do they cope with a parallel (mpi4py) code?
> * If a C++ library call has its own MPI calls, does that work smoothly with mpi4py operating in the python part of the code?
>
> Sorry if some of this is a little basic, I'm trying to get up to speed on this a quick as I can.
>
> Thanks in advance!
> --
> http://mail.python.org/mailman/listinfo/python-list 		 	   		  


More information about the Python-list mailing list