[SciPy-user] handling of huge files for post-processing

Christoph Scheit Christoph.Scheit at lstm.uni-erlangen.de
Tue Feb 26 04:01:05 EST 2008


Hello David,

I guess that everythink is kept in memory... but I don't
know how to handle this problem using iterators. Can
you give me some more detail? You read your files
all in once?

One problem is, that, let's assume I have three files
a, b and c, then
b depends on data from a
c depends on data from b (and maybe from a, but
this might be not the case in 99%)
This is due to differences in signal runtime...

christoph

------------------------------

Message: 4
Date: Mon, 25 Feb 2008 09:53:31 -0500
From: "David Huard" <david.huard at gmail.com>
Subject: Re: [SciPy-user] handling of huge files for post-processing
To: "SciPy Users List" <scipy-user at scipy.org>
Message-ID:
	<91cf711d0802250653g652df1f9mdd9aaa5adf869bc5 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Cristoph,

I am not sure exactly what causes your method to fail but it might be that
you are trying to hold all the arrays in memory at once. Can you do your
calculation using iterators/generators ? The idea is to load into memory
only the part of the array that you need for a given calculation, store the
result and continue iterating.  I used to process ~2GB files using iterators
from PyTables tables and it worked smoothly.

David


2008/2/25, Christoph Scheit <Christoph.Scheit at lstm.uni-erlangen.de>:
>
> Hello everybody,
>
> I get from a Fortran-Code (CFD) binary files containing
> the acoustic pressure at some distinct points.
> The files has N "lines" which look like this:
>
> TimeStep(int) DebugInfo (int) AcousticPressure(float)
>
> and is binary. My problem is now, that the file can be
> huge (> 100 MB) and that after several runs on a cluster
> indeed not only one but 20 - 50 files of that size are
> to be post-processed.
>
> Since the CFD code runs parallel, I have to sum up
> the results from different cpu's (cpu 1 calculates only
> a fraction of the acoustic pressure of point p and time step
> t, so that I have to sum over all cpu's)
>
> Currently I'm reading all the data into a sqlite-table, than
> I group the data, summing up over the processors and
> then I'm writing out files containing the data of the single
> points. This approach works for smaller files somehow,
> but does not seem to be working for big files like described
> above.
>
> Do you have some ideas on this problem? Thank you very
> much in advance,
>
> Christoph
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-user/attachments/20080225/33d1fb1c/attachment-0001.html 

------------------------------

Message: 5
Date: Mon, 25 Feb 2008 15:58:13 +0100
From: Johann Cohen-Tanugi <cohen at slac.stanford.edu>
Subject: Re: [SciPy-user] order in profiles and packages
To: SciPy Users List <scipy-user at scipy.org>
Message-ID: <47C2D785.9090405 at slac.stanford.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

my apologies, this was the wrong list.... I submitted it to ipython list.
Johan


------------------------------

Message: 6
Date: Mon, 25 Feb 2008 17:14:27 +0100
From: "Shane Legg" <shane at vetta.org>
Subject: [SciPy-user] Bug in matplotlib plot_wireframe?
To: scipy-user at scipy.org
Message-ID:
	<d13d7ef40802250814v77ec0acbtfbf54f7e7e5c20db at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

I'm new here so if this isn't the right place to ask just let
me know where I should head.  Thanks.

I think there is a significant bug in plot_wireframe in matplotlib
where it incorrectly displays the Z axis values.  The code below
demonstrates the problem:


import scipy
import pylab as p
import matplotlib.axes3d as p3
from numpy import *

"""
# If you do a wire frame of the following, the graph is correct:
Z = scipy.array(
[[ 0.52,  0.00020],
 [ 0.45,  0.00018],
 [ 0.34,  0.00016]] )
"""

# but if you put negative signs in:
Z = scipy.array(
[[ -0.52,  -0.00020],
 [ -0.45,  -0.00018],
 [ -0.34,  -0.00016]] )

"""
 the graph displays:
[[ -0.62, -0.10020 ],
 [ -0.55, -0.10018 ],
 [ -0.44, -0.10016 ]]
"""

X, Y = meshgrid(arange(0, 3, 1.0), arange(0, 4, 1.0))

fig = p.figure()
ax = p3.Axes3D(fig)
ax.plot_wireframe(X, Y, Z)

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

p.show()


I'm running Ubuntu 7.10 x64 with python 2.5.1-1ubuntu2 and
python-scipy 0.5.2-9ubuntu4 both installed from the .deb files.
I sent the above code to somebody with a 32bit Linux system
and they had the same problem.

Any help appreciated!

Cheers
Shane
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-user/attachments/20080225/6f9bbe82/attachment-0001.html 

------------------------------

Message: 7
Date: Mon, 25 Feb 2008 10:53:22 -0600
From: "Robert Kern" <robert.kern at gmail.com>
Subject: Re: [SciPy-user] Bug in matplotlib plot_wireframe?
To: shane at vetta.org, "SciPy Users List" <scipy-user at scipy.org>
Message-ID:
	<3d375d730802250853j112bb67ah84847faef07b1255 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Mon, Feb 25, 2008 at 10:14 AM, Shane Legg <shane at vetta.org> wrote:
> Hi,
>
> I'm new here so if this isn't the right place to ask just let
> me know where I should head.  Thanks.

The appropriate matplotlib list is here:

  https://lists.sourceforge.net/lists/listinfo/matplotlib-users

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


------------------------------

_______________________________________________
SciPy-user mailing list
SciPy-user at scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user


End of SciPy-user Digest, Vol 54, Issue 48
******************************************




More information about the SciPy-User mailing list