Not sure why this is filling my sys memory

Jonathan Gardner jgardner at jonathangardner.net
Sun Feb 21 00:34:41 EST 2010


On Sat, Feb 20, 2010 at 5:53 PM, Vincent Davis <vincent at vincentdavis.net> wrote:
>> On Sat, Feb 20, 2010 at 6:44 PM, Jonathan Gardner <jgardner at jonathangardner.net> wrote:
>>
>> With this kind of data set, you should start looking at BDBs or
>> PostgreSQL to hold your data. While processing files this large is
>> possible, it isn't easy. Your time is better spent letting the DB
>> figure out how to arrange your data for you.
>
> I really do need all of it in at time, It is dna microarray data. Sure there are 230,00 rows but only 4 columns of small numbers. Would it help to make them float() ? I need to at some point. I know in numpy there is a way to set the type for the whole array "astype()" I think.
> What I don't get is that it show the size of the dict with all the data to have only 6424 bytes. What is using up all the memory?
>

Look into getting PostgreSQL to organize the data for you. It's much
easier to do processing properly with a database handle than a file
handle. You may also discover that writing functions in Python inside
of PostgreSQL can scale very well for whatever data needs you have.

--
Jonathan Gardner
jgardner at jonathangardner.net



More information about the Python-list mailing list