large arrays in python (scientific)

Jason Orendorff jason at jorendorff.com
Mon Jan 7 20:22:58 EST 2002


> When I try to create this array, I create about a million of the
> elements, and then the script slows down and
> eventually stops.  I'm not sure why this is happening. 

Building the list this way requires a lot of memory.
I would guess about 1 gigabyte of RAM.  How much do you have?

You might try allocating the list in advance:

    data = [None] * (128*128*128)    # allocate a large list
    i = 0
    for line in input_file:
        ...
        data[i] = ((x, y, z), value)   # populate it
        i += 1

This will save some memory (not much) and might run faster.

Also, use a tuple for each data point, as the above code does,
not a list (as in your explanation).  The documentation
specifies tuples, and it will save some memory (but not much).

You should seriously consider discarding 98% of the data points
at random, during the input phase.  It might be enough data for
an equally good result -- least-squares fits aren't perfect --
and it will certainly use less memory and take less time
to calculate.

Good luck!  Once you get the list into memory, you may
have problems with the algorithm running rather slowly.
If so, send another e-mail; the algorithm can be optimized
for your case.



Hmmmm...

If you're using Python 2.2, and you simply *must* have all the
data points, but you just don't have near enough RAM, and you
have plenty of time to sit around and wait, then try this:

from __future__ import generators
from Scientific.Functions.LeastSquares import leastSquaresFit

class MyData:
    def __init__(self, filename):
        self.filename = filename
    def __iter__(self):
        # XXX Put your code for loading the data set here.
        # For example, assuming you're getting the data
        # from a file:
        f = open(filename, 'r')  # or whatever
        while ...:
            ...
            ... whatever your loop is for fetching data
            ...
            yield ((x, y, z), value)
        f.close()

data = MyData("myfile.txt")
leastSquaresFit(model, parameters, data)

This reads the data from the file once per iteration
(leastSquaresFit is an iterative algorithm), which is slow;
but it does not store the whole data set in memory, which
might be helpful.

## Jason Orendorff    http://www.jorendorff.com/




More information about the Python-list mailing list