[pypy-dev] Continual Increase In Memory Utilization Using PyPy 6.0 (python 2.7) -- Help!?

Antonio Cuni anto.cuni at gmail.com
Thu Mar 28 11:55:49 EDT 2019


Hi Robert,
are you using any package which relies on cpyext? I.e., modules written in
C and/or with Cython (cffi is fine).
IIRC, at the moment PyPy doesn't detect GC cycles which involve cpyext
objects.
So if you have a cycle which does e.g.
    Py_foo -> C_bar -> Py_foo
(where Py_foo is a pure-python object and C_bar a cpyext object) they will
never be collected unless you break the cycle manually.

Other than that: have you tried running it with PyPy 7.0 and/or 7.1?


On Thu, Mar 28, 2019 at 8:35 AM Robert Whitcher <robert.whitcher at rubrik.com>
wrote:

> So I have a process that use PyPy and pymongo in a loop.
> It does basically the same thing every loop, which query a table in via
> pymongo and do a few non-save calculations and then wait and loop again
>
> The RSS of the process continually increased (the PYPY_GC_MAX is set
> pretty high).
> So I hooked in the GC stats output per:
> http://doc.pypy.org/en/latest/gc_info.html
> I also assure that gc.collect() was called at least every 3 minutes.
>
> What I see is that... The memory while high is fair constant for a long
> time:
>
> 2019-03-27 00:04:10.033-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 144244736
> ...
> 2019-03-27 01:01:46.841-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 144420864
> 2019-03-27 01:02:36.943-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 144269312
>
>
> Then it decides (an the exact per-loop behavior is the same each time) to
> chew up much more memory:
>
> 2019-03-27 01:04:17.184-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 145469440
> 2019-03-27 01:05:07.305-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 158175232
> 2019-03-27 01:05:57.401-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 173191168
> 2019-03-27 01:06:47.490-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 196943872
> 2019-03-27 01:07:37.575-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 205406208
> 2019-03-27 01:08:27.659-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 254562304
> 2019-03-27 01:09:17.770-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 256020480
> 2019-03-27 01:10:07.866-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 289779712
>
>
> That's 140 MB .... Where is all that memory going...
> What's more is that the PyPy GC stats do not show anything different:
>
> Here are the GC stats from GC-Complete when we were at *144MB*:
>
> 2019-03-26 23:55:49.127-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 140632064
> 2019-03-26 23:55:49.133-0600 [-] main_thread(29621)log
> (async_worker_process.py:308): DBG0: Total memory consumed:
>             GC used:            56.8MB (peak: 69.6MB)
>                in arenas:            39.3MB
>                rawmalloced:          14.5MB
>                nursery:              3.0MB
>             raw assembler used: 521.6kB
>             -----------------------------
>             Total:              57.4MB
>
>             Total memory allocated:
>             GC allocated:            63.0MB (peak: 71.2MB)
>                in arenas:            43.9MB
>                rawmalloced:          22.7MB
>                nursery:              3.0MB
>             raw assembler allocated: 1.0MB
>             -----------------------------
>             Total:                   64.0MB
>
>
> Here are the GC stats from GC-Complete when we are at *285MB*:
>
> 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log
> (async_worker_process.py:304): INFO_FLUSH: RSS: 285147136
> 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log
> (async_worker_process.py:308): DBG0: Total memory consumed:
>             GC used:            57.5MB (peak: 69.6MB)
>                in arenas:            39.9MB
>                rawmalloced:          14.6MB
>                nursery:              3.0MB
>             raw assembler used: 1.5MB
>             -----------------------------
>             Total:              58.9MB
>
>             Total memory allocated:
>             GC allocated:            63.1MB (peak: 71.2MB)
>                in arenas:            43.9MB
>                rawmalloced:          22.7MB
>                nursery:              3.0MB
>             raw assembler allocated: 2.0MB
>             -----------------------------
>             Total:                   65.1MB
>
>
> How is this possible?
>
> I am measuring RSS with:
>
> def get_rss_mem_usage():
>     '''
>     Get the RSS memory usage in bytes
>     @return: memory size in bytes; -1 if error occurs
>     '''
>     try:
>         process = psutil.Process(os.getpid())
>         return process.get_memory_info().rss
>     except:
>         return -1
>
>
> And cross referencing with "ps -orss -p <pid>" and the RSS values reported
> are correct....
>
> I cannot figure out where to go from here with this as it appears that
> PyPy is leaking this memory somehow...
> And I have no idea howto proceed from here...
> I end up having memory problems and getting Memory Warnings for a process
> that just loops and queries via pymongo
> Pymongo version is 3.7.1
>
> This is:
>
> Python 2.7.13 (ab0b9caf307db6592905a80b8faffd69b39005b8, Apr 30 2018,
> 08:21:35)
> [PyPy 6.0.0 with GCC 7.2.0]
>
>
>
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20190328/0e44d06e/attachment-0001.html>


More information about the pypy-dev mailing list