Pickle caching objects?

Sat Nov 30 20:31:35 EST 2019

On 11/30/19 5:05 PM, José María Mateos wrote:
> Hi,
>
> I just asked this question on the IRC channel but didn't manage to get
> a response, though some people replied with suggestions that expanded
> this question a bit.
>
> I have a program that has to read some pickle files, perform some
> operations on them, and then return. The pickle objects I am reading
> all have the same structure, which consists of a single list with two
> elements: the first one is a long list, the second one is a numpy object.
>
> I found out that, after calling that function, the memory taken by the
> Python executable (monitored using htop -- the entire thing runs on
> Python 3.6 on an Ubuntu 16.04, pretty standard conda installation with
> a few packages installed directly using `conda install`) increases in
> proportion to the size of the pickle object being read. My intuition
> is that that memory should be free upon exiting.
>
> Does pickle keep a cache of objects in memory after they have been
> returned? I thought that could be the answer, but then someone
> suggested to measure the time it takes to load the objects. This is a
> script I wrote to test this; nothing(filepath) just loads the pickle
> file, doesn't do anything with the output and returns how long it took
> to perform the load operation.
>
<snip>
> Notice that memory usage increases noticeably specially on files 4 and
> 5, the biggest ones, and doesn't come down as I would expect it to.
> But the loading time is constant, so I think I can disregard any
> pickle caching mechanisms.
>
> So I guess now my question is: can anyone give me any pointers as to
> why is this happening? Any help is appreciated.
>
> Thanks,
>
Python likely doesn't return the memory it has gotten from the OS back
to the OS just because it isn't using it at the moment. This is actually
very common behavior as getting new memory from the OS is somewhat
expensive, and it is common that memory release will be used again shortly.

There is also the fact that to return the memory, the block needs to be
totally unused, and it isn't hard for a few small pieces to still be
left in use.

You are asking for the information about how much memory Python has
gotten from the OS, which is different than how much it is actively
using, as when objects go away there memory is returned to the free pool
INSIDE Python, to be used for other requests before asking the OS for more.

-- 
Richard Damon