[Tutor] memory consumption

Andre' Walker-Loud walksloud at gmail.com
Thu Jul 4 00:37:24 CEST 2013


Hi Oscar,

[snip]

> The error is for creating an mmap object. This is not something that
> numpy does unless you tell it to.

I have not explicitly told it to.

[snip]

> So you're using pytables or h5py, or something else? It really would
> help if you would specify this instead of trying to be generic.

All the third party software I am using:
numpy
pytables 
hdf5
pyminuit [python interface to Minuit]
Minuit	[Minuit - a c++ code based developed in CERN for numerical minimization]

I was hoping I was must making some obvious mistake, assuming people are mostly not familiar with much of any of the specific code I am using.  But I guess it is either not so obvious, or my explanations are too poor.

> My guess is that the hdf5 library loads the array as an mmap'ed memory
> block and you're not actually working with an ordinary numpy array
> (even if it has a similar interface).

I specifically load the data once, as

my_file = tables.openFile(my_data_file.h5)
my_data = my_file.getNode(path_to_data).read()

after this, "my_data" seems to have all the features of a numpy array.  for example,

> Have you checked the actual memory size of the array? If it's a real
> numpy array you can use the nbytes attribute:
>>>> a = numpy.zeros([300, 256, 1, 2], float)
>>>> a.nbytes
> 1228800

this works on my_data.

[snip]

>> class do_stuff:
>> # I am aware this doesn't follow the class naming convention, just sticking with my previous post name
>>    def __call__(data,other_vars):
>>        self.fit = third_party.function_set_up(data,other_vars)
>> 
>>    def minimize(self):
>>        try:
>>            self.fit.minimize()
>>            self.have_fit = True
>>        except third_party.Error:
>>            self.have_fit = False
>> ##########################
> 
> If you write code like the above then you really cannot expect other
> people to just understand what you mean if you don't show them the
> code. Specifically the use of __call__ is confusing. Really, though,
> this class is just a distraction from your problem and should have
> been simplified away.

Off my main topic, but could you explain more?
I am also not very experienced writing classes, and was learning from an example.  So I am not sure why __call__ is confusing.  I thought that was correct.

[snip]
> The above function can be vectorised to something like:
> 
> def chop_data(data, N, M):
>    return data[np.random.randint(0, data.shape[0], (M, N))].mean(axis=0)

Thanks.

[snip]
>>            tmp_data = my_class.chop_data(data,n,m)
> 
> Where did data come from? Is that the mmap'ed array from the hdf5 library?

as above, data is loaded once with pytables, using the .read() function.
Subsequently, with my "chop_data" function, I believe that returns a numpy array, and I have not explicitly asked it anything about "mmap" so I am not sure.  How would you check?

>>            my_func(tmp_data,other_vars)
>>            my_func.minimize()
> 
> I now know that the above two lines call thirdparty.function_setup()
> and my_func.fit.minimize(). I still have no idea what they do though.

I was trying to avoid that, since I suspect(ed) the problem is with me and not the third party.
Without going into specifics, the first function constructs a chi^2 function which looks like

chisq = sum_i ( (data[i] - fit_func(i,fit_params)) / data_error[i] )**2

the second function works to numerically minimize chisq with respect to the "fit_params" which are a 1d array.


[snip]

>> Hopefully, this is more clear.
> 
> Only slightly. The details that you choose to include are not the ones
> that are needed to understand your problem. Instead of paraphrasing
> simplify this into a short but *complete* example and post that.

If you can help me understand the issue of mmap (whether somehow I am creating this unwittingly), that would be great.  ie, what tests can I perform to check?


Otherwise, it seems perhaps the best thing for me to do now is take eryksun's advice and learn how to use a memory profiler.


Thanks,

Andre






More information about the Tutor mailing list