[Python-Dev] Changing pymalloc behaviour for long running
processes
Evan Jones
ejones at uwaterloo.ca
Tue Oct 19 23:47:21 CEST 2004
First, let me thank you for this very detailed reply. It really helped
me understand a lot more about what is going on inside the Python
interpreter.
On Oct 19, 2004, at 16:53, Tim Peters wrote:
> It's stack-like: it reuses the pool most recently emptied, because
> the expectation is that the most recently emptied pool is the most
> likely of all empty pools to be highest in the memory hierarchy. I
> really don't know what LRU (or MRU) might mean in this context (it's
> not like we've evicting something from a cache).
Err... Right: MRU. It uses the most recently used free block. This is
totally a cache: It's a cache of free memory pages.
> Harder than it looked, eh <wink>?
Actually, much. I spent about 6 hours figuring out what was going on.
At this point, I think I have enough of a handle on the situation that
I might as well go about trying to improve it.
> Or it may be small overhead, if all it's trying to do is free() empty
> arenas. Indeed, if arenas "grow states" too, *arena* transitions
> should be so rare that perhaps they could afford to do extra
> processing right then to decide whether to free() an arena that just
> transitioned to its notion of an empty state.
That is true. However, I don't think freeing arenas immediately is the
best plan, as we don't really want to do that if the application is
cyclical in its memory consumption (ie. it creates a ton of objects,
then releases them, then does it again). I still think that some sort
of periodic collection is best, as it will help Python adjust to
applications with a wide variety of memory profiles.
> If we changed PyMem_{Free, FREE, Del, DEL} to map to the system
> free(), all would be golden (except for broken old code mixing
> PyObject_ with PyMem_ calls). If any such broken code still exists,
> that remapping would lead to dramatic failures, easy to reproduce; and
> old code broken in the other, infinitely more subtle way (calling
> PyMem_{Free, FREE, Del, DEL} when not holding the GIL) would continue
> to work fine.
Hmm... This seems like a logical approach to me. It certainly gives me
a lot more freedom in reworking the memory allocator. Are there any
objections to this idea?
> Any number of threads can be running
> Python code in a single process, although the GIL serializes their
> execution *while* they're executing Python code. When a thread ends
> up in C code, it's up to the C code to decide whether to release the
> GIL and so allow other threads to run at the same time. If it does,
> that thread must reacquire the GIL before making another Python C API
> call (with very few exceptions, related to Python C API thread
> initialization and teardown functions).
Ah, now I understand! Creating a Python thread actually creates a
native thread then, it's just that because of the GIL they run
sequentially when executing Python code. This is an interesting
approach! For some reason I was under the impression that the Python
interpreter used user level threads to implement Python threads.
> obmalloc doesn't have *that* problem, though -- nothing obmalloc does
> can cause Python code to get executed, so obmalloc can assume that the
> thread calling into it holds the GIL for as long as obmalloc wants.
> Except, again, for the crazy PyMem_{Free, FREE, Del, DEL} exception.
Terrific. This makes life much, much easier.
> I would -- it's backward compatibility hacks for insane code, which
> may not even exist anymore, and you'll find that it puts severe
> contraints on what you can do.
Again, does anyone object to this point of view before I begin working
from this assumption? This means that I can assume that only one thread
will call code in obmalloc at a time. I can do the same thing that the
current obmalloc implementation does: Add the macros for the locks, but
have them resolve to nothing.
Thanks for the tutorial in the Python interpreter internals,
Evan Jones
--
Evan Jones: http://evanjones.ca/
"Computers are useless. They can only give answers" - Pablo Picasso
More information about the Python-Dev
mailing list