Memory ?

Alex Martelli aleax at aleax.it
Mon Jul 8 18:09:44 EDT 2002


Shagshag13 wrote:
        ...
> I have a data structure like this :
> 
> key_0 -> value_0_? -> value_0_? -> ... value_0_m
> key_i -> value_i_? -> value_i_?
> ...
> key_n -> value_n_? -> value_n_? -> value_n_? -> value_n_?
> 
> where :
> 
> - key are characters,

You presumably mean strings -- you can't have 2,500,000 different
characters (character == string of length 1, in Python).

> - value are float
> - n > 2,500,000
> - value "list" can have length from 1 to m = ~1,000,000 (values from key_i
> could be 1 long, while key_i+1 could be 1,000,000) - i didn't knew exact
> value for n (it depends on input data) - i can guess m
> 
> by now i use a dict index for key, which give me access to a queue (list)
> containing my values. that's really to slow !!!

"queue"...?

> with your helps, i think i should use a numeric.array for my values list,
> but i even wonder if i shouldn't use a matrix (2,500,000 * 1,000,000) even

Where are you going to find the 20+ terabytes to keep that mostly-empty
matrix in...?

Builtin module array may be suitable for your highly specialized need,
and it does have the advantage of implementing an .append method on
array.array objects, simplifying your code a bit.  You probably need to
benchmark both array.array and Numeric.array and check, but you're
apparently not using Numeric's strengths here.  Oh, and, be sure to
use 4-byte floats (both array and Numeric support them) if that precision
is enough for you -- that would halve the memory compared to the
usual "double precision" (8-byte) floats.


Alex




More information about the Python-list mailing list