[Python-Dev] Changing pymalloc behaviour for long running processes

Tue Oct 19 19:25:28 CEST 2004

On Oct 19, 2004, at 12:14, Tim Peters wrote:
> True.  That's one major problem for some apps.  Another major problem
> for some apps is due to unbounded internal free lists outside of
> obmalloc.  Another is that the platform OS+libc may not shrink VM at
> times even when memory is returned to the system free().

There is absolutely nothing I can do about that, however. On platforms 
that matter to me (Mac OS X, Linux) some number of large malloc() 
allocations are done via mmap(), and can be immediately released when 
free() is called. Hence, large blocks are reclaimable. I have no 
knowledge about the implementation of malloc() on Windows. Anyone care 
to enlighten me?

Another approach is to not free the memory, but instead to inform the 
operating system that the pages are unused (on Unix, madvise(2) with 
MADV_DONTNEED or MADV_FREE). When this happens, the operating system 
*may* discard the pages, but the address range remains valid: If it is 
touched again in the future, the OS will allocate the new page. This 
would require some dramatic changes to Python's internals.

>> if the memory usage has been relatively constant, and has been well
>> below the amount of memory allocated.
> That's a possible implementation strategy.  I think you'll find it
> helpful to distinguish goals from implementations.

You are correct: This is an implementation detail. However, it is a 
relatively important one, as I do not want to change Python's 
aggressive memory recycling behaviour.

> Maybe you just mean that you collapse adjacent free pools into a free
> pool of a larger size class, when possible?  If so, that's a possible
> step on the way toward identifying unused arenas, but I wouldn't call
> it an instance of decreasing memory fragmentation.

I am not moving around Python objects, I'm just dealing with free pools 
and arenas in obmalloc.c at the moment. There two separate things I am 
doing:

1. Scan through the free pool list, and count the number of free pools 
in each arena. If an arena is completely unused, I free it. If there is 
even one pool in use, the arena cannot be freed.

2. Sorting the free pool list so that "nearly full" arenas are used 
before "nearly empty" arenas. Right now, when a pool is free, it is 
pushed on the list. When one is needed, it is popped off. This leads to 
an LRU allocation of memory. What I am doing is removing all the free 
pools from the list, and putting them back on so that areas that have 
more free pools are used later, while arenas with less free pools are 
used first.

In my crude tests, the second detail increases the number of completely 
free arenas. However, I suspect that differentiating between free 
arenas and used arenas, like is already done for pools, would be a good 
idea.

> In apps with steady states, between steady-state transitions it's not
> a good idea to "artificially" collapse free pools into free pools of
> larger size, because the app is going to want to reuse pools of the
> specific sizes it frees, and obmalloc optimizes for that case.

Absolutely: I am not touching that. I'm working from the assumption 
that pymalloc has been well tested and well tuned and is appropriate 
for Python workloads. I'm just trying to make it free memory 
occasionally.

> If the real point of this (whatever it is <wink>) is to identify free
> arenas, I expect that could be done a lot easier by keeping a count of
> allocated pools in each arena; e.g., maybe at the start of the arena,
> or by augmenting the vector of arena base addresses.

You are correct, and this is something I would like to play with. This 
is, of course, a tradeoff between overhead on each allocation and 
deallocation, and one big occasionally overhead caused by the "cleanup" 
process. I'm going to try and take a look at this tonight, if I get 
some real work done this afternoon.

> But in some versions of reality, that isn't true.  The best available
> explanation is in new_arena()'s long internal comment block:  because
> of historical confusions about what Python's memory API *is*, it's
> possible that extension modules outside the core are incorrectly
> calling the obmalloc free() when they should be calling the system
> free(), and doing so without holding the GIL.

Let me just make sure I am clear on this: Some extensions use native 
threads, is that why this is a problem? Because as far as I am aware, 
the Python interpreter itself is not threaded. So how does the cyclical 
garbage collector work? Doesn't it require that there is no execution 
going on?

> Now all such insane uses have been officially deprecated, so you could
> be bold and just assume obmalloc is always entered by a thread holding
> the GIL now.

I would rather not break this property of obmalloc. However, this leads 
to a big problem: I'm not sure it is possible to have an occasional 
cleanup task be lockless and co-operate nicely with other threads, 
since by definition it needs to go and mess with all the arenas. One of 
the reason that obmalloc *doesn't* have this problem is because it 
never releases memory.

> It's only a waste if it ultimately fails <wink>.

It is also a waste if the core Python developers decide it is a bad 
idea, and don't want to accept patches! :)

Thanks for your feedback,

Evan Jones

--
Evan Jones: http://evanjones.ca/
"Computers are useless. They can only give answers" - Pablo Picasso