[pypy-issue] Issue #2982: Continual Increase In Memory Utilization Using PyPy 6.0 (python 2.7) and Pymongo (pypy/pypy)

Robert Whitcher issues-reply at bitbucket.org
Wed Mar 27 20:07:44 EDT 2019


New issue 2982: Continual Increase In Memory Utilization Using PyPy 6.0 (python 2.7) and Pymongo
https://bitbucket.org/pypy/pypy/issues/2982/continual-increase-in-memory-utilization

Robert Whitcher:

So I have a process running PyPy that hold a pymongo connection and loops.
It does basically the same thing every loop, which is to query a table in via pymongo and do a few non-saved calculations and then wait and loop again 

The RSS of the process continually increased (the PYPY_GC_MAX is set pretty high).
So I hooked in the GC stats output per: http://doc.pypy.org/en/latest/gc_info.html
I also assured that gc.collect() was called at least every 3 minutes.

What I see is that... The memory while high is fair constant for a long time (output about every minute):
2019-03-27 00:04:10.033-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144244736
...
2019-03-27 01:01:46.841-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144420864
2019-03-27 01:02:36.943-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144269312

Then it decides (and the exact per-loop behavior is the same each time) to chew up much more memory:

2019-03-27 01:04:17.184-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 145469440
2019-03-27 01:05:07.305-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 158175232
2019-03-27 01:05:57.401-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 173191168
2019-03-27 01:06:47.490-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 196943872
2019-03-27 01:07:37.575-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 205406208
2019-03-27 01:08:27.659-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 254562304
2019-03-27 01:09:17.770-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 256020480
2019-03-27 01:10:07.866-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 289779712

That's 140 MB .... Where is all that memory going...
What's more is that the PyPy GC stats do not show anything different:

Here are the GC stats from GC-Complete when we were at 144MB:

2019-03-26 23:55:49.127-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 140632064
2019-03-26 23:55:49.133-0600 [-] main_thread(29621)log (async_worker_process.py:308): DBG0: Total memory consumed:
            GC used:            56.8MB (peak: 69.6MB)
               in arenas:            39.3MB
               rawmalloced:          14.5MB
               nursery:              3.0MB
            raw assembler used: 521.6kB
            -----------------------------
            Total:              57.4MB

            Total memory allocated:
            GC allocated:            63.0MB (peak: 71.2MB)
               in arenas:            43.9MB
               rawmalloced:          22.7MB
               nursery:              3.0MB
            raw assembler allocated: 1.0MB
            -----------------------------
            Total:                   64.0MB

Here are the GC stats from GC-Complete when we are at 285MB:

2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 285147136
2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log (async_worker_process.py:308): DBG0: Total memory consumed:
            GC used:            57.5MB (peak: 69.6MB)
               in arenas:            39.9MB
               rawmalloced:          14.6MB
               nursery:              3.0MB
            raw assembler used: 1.5MB
            -----------------------------
            Total:              58.9MB

            Total memory allocated:
            GC allocated:            63.1MB (peak: 71.2MB)
               in arenas:            43.9MB
               rawmalloced:          22.7MB
               nursery:              3.0MB
            raw assembler allocated: 2.0MB
            -----------------------------
            Total:                   65.1MB

How is this possible?

I am measuring RSS with:

def get_rss_mem_usage():
    '''
    Get the RSS memory usage in bytes
    @return: memory size in bytes; -1 if error occurs
    '''
    try:
        process = psutil.Process(os.getpid())
        return process.get_memory_info().rss
    except:
        return -1

And cross referencing with "ps -orss -p <pid>" and the RSS values reported are correct....

I cannot figure out where to go from here with this as it appears that PyPy is leaking this memory somehow...
And I have no idea how to proceed from here...
I end up having memory problems and getting Memory Warnings for a process that just loops and queries via pymongo
Pymongo version is 3.7.1

This is:

Python 2.7.13 (ab0b9caf307db6592905a80b8faffd69b39005b8, Apr 30 2018, 08:21:35)
[PyPy 6.0.0 with GCC 7.2.0]




More information about the pypy-issue mailing list