[Tutor] memory consumption
Andre' Walker-Loud
walksloud at gmail.com
Thu Jul 4 00:37:24 CEST 2013
Hi Oscar,
[snip]
> The error is for creating an mmap object. This is not something that
> numpy does unless you tell it to.
I have not explicitly told it to.
[snip]
> So you're using pytables or h5py, or something else? It really would
> help if you would specify this instead of trying to be generic.
All the third party software I am using:
numpy
pytables
hdf5
pyminuit [python interface to Minuit]
Minuit [Minuit - a c++ code based developed in CERN for numerical minimization]
I was hoping I was must making some obvious mistake, assuming people are mostly not familiar with much of any of the specific code I am using. But I guess it is either not so obvious, or my explanations are too poor.
> My guess is that the hdf5 library loads the array as an mmap'ed memory
> block and you're not actually working with an ordinary numpy array
> (even if it has a similar interface).
I specifically load the data once, as
my_file = tables.openFile(my_data_file.h5)
my_data = my_file.getNode(path_to_data).read()
after this, "my_data" seems to have all the features of a numpy array. for example,
> Have you checked the actual memory size of the array? If it's a real
> numpy array you can use the nbytes attribute:
>>>> a = numpy.zeros([300, 256, 1, 2], float)
>>>> a.nbytes
> 1228800
this works on my_data.
[snip]
>> class do_stuff:
>> # I am aware this doesn't follow the class naming convention, just sticking with my previous post name
>> def __call__(data,other_vars):
>> self.fit = third_party.function_set_up(data,other_vars)
>>
>> def minimize(self):
>> try:
>> self.fit.minimize()
>> self.have_fit = True
>> except third_party.Error:
>> self.have_fit = False
>> ##########################
>
> If you write code like the above then you really cannot expect other
> people to just understand what you mean if you don't show them the
> code. Specifically the use of __call__ is confusing. Really, though,
> this class is just a distraction from your problem and should have
> been simplified away.
Off my main topic, but could you explain more?
I am also not very experienced writing classes, and was learning from an example. So I am not sure why __call__ is confusing. I thought that was correct.
[snip]
> The above function can be vectorised to something like:
>
> def chop_data(data, N, M):
> return data[np.random.randint(0, data.shape[0], (M, N))].mean(axis=0)
Thanks.
[snip]
>> tmp_data = my_class.chop_data(data,n,m)
>
> Where did data come from? Is that the mmap'ed array from the hdf5 library?
as above, data is loaded once with pytables, using the .read() function.
Subsequently, with my "chop_data" function, I believe that returns a numpy array, and I have not explicitly asked it anything about "mmap" so I am not sure. How would you check?
>> my_func(tmp_data,other_vars)
>> my_func.minimize()
>
> I now know that the above two lines call thirdparty.function_setup()
> and my_func.fit.minimize(). I still have no idea what they do though.
I was trying to avoid that, since I suspect(ed) the problem is with me and not the third party.
Without going into specifics, the first function constructs a chi^2 function which looks like
chisq = sum_i ( (data[i] - fit_func(i,fit_params)) / data_error[i] )**2
the second function works to numerically minimize chisq with respect to the "fit_params" which are a 1d array.
[snip]
>> Hopefully, this is more clear.
>
> Only slightly. The details that you choose to include are not the ones
> that are needed to understand your problem. Instead of paraphrasing
> simplify this into a short but *complete* example and post that.
If you can help me understand the issue of mmap (whether somehow I am creating this unwittingly), that would be great. ie, what tests can I perform to check?
Otherwise, it seems perhaps the best thing for me to do now is take eryksun's advice and learn how to use a memory profiler.
Thanks,
Andre
More information about the Tutor
mailing list