[SciPy-user] handling of huge files for post-processing

David Huard david.huard at gmail.com
Mon Feb 25 09:53:31 EST 2008


Hi Cristoph,

I am not sure exactly what causes your method to fail but it might be that
you are trying to hold all the arrays in memory at once. Can you do your
calculation using iterators/generators ? The idea is to load into memory
only the part of the array that you need for a given calculation, store the
result and continue iterating.  I used to process ~2GB files using iterators
from PyTables tables and it worked smoothly.

David


2008/2/25, Christoph Scheit <Christoph.Scheit at lstm.uni-erlangen.de>:
>
> Hello everybody,
>
> I get from a Fortran-Code (CFD) binary files containing
> the acoustic pressure at some distinct points.
> The files has N "lines" which look like this:
>
> TimeStep(int) DebugInfo (int) AcousticPressure(float)
>
> and is binary. My problem is now, that the file can be
> huge (> 100 MB) and that after several runs on a cluster
> indeed not only one but 20 - 50 files of that size are
> to be post-processed.
>
> Since the CFD code runs parallel, I have to sum up
> the results from different cpu's (cpu 1 calculates only
> a fraction of the acoustic pressure of point p and time step
> t, so that I have to sum over all cpu's)
>
> Currently I'm reading all the data into a sqlite-table, than
> I group the data, summing up over the processors and
> then I'm writing out files containing the data of the single
> points. This approach works for smaller files somehow,
> but does not seem to be working for big files like described
> above.
>
> Do you have some ideas on this problem? Thank you very
> much in advance,
>
> Christoph
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20080225/33d1fb1c/attachment.html>


More information about the SciPy-User mailing list