large arrays in python (scientific)

Mark Grant markrgrant at yahoo.com
Wed Jan 9 23:19:16 EST 2002


Thanks everyone for the excellent suggestions. 

Sorry about not providing more information about the problem.  I
didn't want to bore anyone with details, since the problem was simply
creating a large list (and I wasn't worried yet about how to improve
the fitting).

I was able to run the script on a machine with more memory, as most of
you suggested (1gb, instead of the 200mb that was on my machine). 
This allowed me to create the entire list of 2 million tuples. 
However, dropping data points is even better, I'll just have to
monitor the effect on the constants for the fitting functions. The
curve of the data is quite smooth, so it shouldn't be an issue. I'm
still curious how python is storing the data in memory, it'll help me
predict memory requirements for constructs.

Jason, thanks for the tip on generators, and the example code. 


ameoba <ahmebah at hotmail.com> wrote in message news:<Xns91912706322A4ahmebahhotmailcom at 198.99.146.10>...
> "Jason Orendorff" <jason at jorendorff.com> wrote in 
> news:mailman.1010453290.26070.python-list at python.org:
> 
> > If you're using Python 2.2, and you simply *must* have all the
> > data points, but you just don't have near enough RAM, and you
> > have plenty of time to sit around and wait, then try this:
> > 
> > from __future__ import generators
> > from Scientific.Functions.LeastSquares import leastSquaresFit
> > This reads the data from the file once per iteration
> > (leastSquaresFit is an iterative algorithm), which is slow;
> > but it does not store the whole data set in memory, which
> > might be helpful.
> 
> Assuming that your data points are evenly distributed in a grid, you could 
> probably trim that down so that you didn't have to explicitly store the 
> (x,y,z).  If this is the case, even working with a list of lists of lists 
> (defining matrixes was one of the first struggles I had with python, before 
> I found the module that did it for me) of data would probably be more 
> efficient than actually storing each point.  
> 
> Of course, calculating offsets into a single array (err... list.. this is 
> python) would be quite efficient and any reasonable computro should have no 
> problem with 2M element list, but I think calculating multi-dimensional 
> array offsets into a 1D array kinda breaks the paradigm of python 
> programming.
> 
> 
> 
> BTW- I keep reading interesting examples of generators... they seem 
> interesting, and occassionally even useful.  Where can I find some good 
> info on 'em?



More information about the Python-list mailing list