Pickle caching objects?

Inada Naoki songofacandy at gmail.com
Mon Dec 16 03:11:01 EST 2019


Would you try the pull request in this issue?
https://bugs.python.org/issue36694

I'm not sure this issue is relating to you because I don't know about your data.

Regards,

On Sun, Dec 1, 2019 at 10:14 AM José María Mateos <chema at rinzewind.org> wrote:
>
> Hi,
>
> I just asked this question on the IRC channel but didn't manage to get a
> response, though some people replied with suggestions that expanded this
> question a bit.
>
> I have a program that has to read some pickle files, perform some
> operations on them, and then return. The pickle objects I am reading all
> have the same structure, which consists of a single list with two
> elements: the first one is a long list, the second one is a numpy
> object.
>
> I found out that, after calling that function, the memory taken by the
> Python executable (monitored using htop -- the entire thing runs on
> Python 3.6 on an Ubuntu 16.04, pretty standard conda installation with a
> few packages installed directly using `conda install`) increases in
> proportion to the size of the pickle object being read. My intuition is
> that that memory should be free upon exiting.
>
> Does pickle keep a cache of objects in memory after they have been
> returned? I thought that could be the answer, but then someone suggested
> to measure the time it takes to load the objects. This is a script I
> wrote to test this; nothing(filepath) just loads the pickle file,
> doesn't do anything with the output and returns how long it took to
> perform the load operation.
>
> ---
> import glob
> import pickle
> import timeit
> import os
> import psutil
>
> def nothing(filepath):
>     start = timeit.default_timer()
>     with open(filepath, 'rb') as f:
>         _ = pickle.load(f)
>     return timeit.default_timer() - start
>
> if __name__ == "__main__":
>
>     filelist = glob.glob('/tmp/test/*.pk')
>
>     for i, filepath in enumerate(filelist):
>         print("Size of file {}: {}".format(i, os.path.getsize(filepath)))
>         print("First call:", nothing(filepath))
>         print("Second call:", nothing(filepath))
>         print("Memory usage:", psutil.Process(os.getpid()).memory_info().rss)
>         print()
> ---
>
> This is the output of the second time the script was run, to avoid any
> effects of potential IO caches:
>
> ---
> Size of file 0: 11280531
> First call: 0.1466723980847746
> Second call: 0.10044755204580724
> Memory usage: 49418240
>
> Size of file 1: 8955825
> First call: 0.07904054620303214
> Second call: 0.07996074995025992
> Memory usage: 49831936
>
> Size of file 2: 43727266
> First call: 0.37741047400049865
> Second call: 0.38176894187927246
> Memory usage: 49758208
>
> Size of file 3: 31122090
> First call: 0.271301960805431
> Second call: 0.27462846506386995
> Memory usage: 49991680
>
> Size of file 4: 634456686
> First call: 5.526095286011696
> Second call: 5.558765463065356
> Memory usage: 539324416
>
> Size of file 5: 3349952658
> First call: 29.50982437795028
> Second call: 29.461691531119868
> Memory usage: 3443597312
>
> Size of file 6: 9384929
> First call: 0.0826977719552815
> Second call: 0.08362263604067266
> Memory usage: 3443597312
>
> Size of file 7: 422137
> First call: 0.0057482069823890924
> Second call: 0.005949910031631589
> Memory usage: 3443597312
>
> Size of file 8: 409458799
> First call: 3.562588643981144
> Second call: 3.6001368327997625
> Memory usage: 3441451008
>
> Size of file 9: 44843816
> First call: 0.39132978999987245
> Second call: 0.398518088972196
> Memory usage: 3441451008
> ---
>
> Notice that memory usage increases noticeably specially on files 4 and
> 5, the biggest ones, and doesn't come down as I would expect it to. But
> the loading time is constant, so I think I can disregard any pickle
> caching mechanisms.
>
> So I guess now my question is: can anyone give me any pointers as to why
> is this happening? Any help is appreciated.
>
> Thanks,
>
> --
> José María (Chema) Mateos || https://rinzewind.org/
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
Inada Naoki  <songofacandy at gmail.com>


More information about the Python-list mailing list