[Python-Dev] The untuned tunable parameter ARENA_SIZE
Larry Hastings
larry at hastings.org
Fri Jun 2 16:05:21 EDT 2017
On 06/02/2017 02:38 AM, Antoine Pitrou wrote:
> I hope those are not the actual numbers you're intending to use ;-)
> I still think that allocating more than 1 or 2MB at once would be
> foolish. Remember this is data that's going to be carved up into
> (tens of) thousands of small objects. Large objects eschew the small
> object allocator (not to mention that third-party libraries like Numpy
> may be using different allocation routines when they allocate very
> large data).
Honest, I'm well aware of what obmalloc does and how it works. I bet
I've spent more time crawling around in it in the last year than anybody
else on the planet. Mainly because it works so well for CPython, nobody
else needed to bother!
I'm also aware, for example, that if your process grows to consume
gigabytes of memory, you're going to have tens of thousands of allocated
arenas. The idea that on systems with gigabytes of memory--90%+? of
current systems running CPython--we should allocate memory forever in
256kb chunks is faintly ridiculous. I agree that we should start small,
and ramp up slowly, so Python continues to run well on small computers
and not allocate tons of memory for small programs. But I also think we
should ramp up *ever*, for programs that use tens or hundreds of megabytes.
Also note that if we don't touch the allocated memory, smart modern OSes
won't actually commit any resources to it. All that happens when your
process allocates 1GB is that the OS changes some integers around. It
doesn't actually commit any memory to your process until you attempt to
write to that memory, at which point it gets mapped in in
local-page-size chunks (4k? 8k? something in that neighborhood and
power-of-2 sized). So if we allocate 32mb, and only touch the first
1mb, the other 31mb doesn't consume any real resources. I was planning
on making the multi-arena code only touch memory when it actually needs
to, similarly to the way obmalloc lazily consumes memory inside an
allocated pool (see the nextoffset field in pool_header), to take
advantage of this ubiquitous behavior.
If I write this multi-arena code, which I might, I was thinking I'd try
this approach:
* leave arenas themselves at 256k
* start with a 1MB multi-arena size
* every time I allocate a new multi-arena, multiply the size of the
next multi-arena by 1.5 (rounding up to 256k each time)
* every time I free a multi-arena, divide the size of the next
multi-arena by 2 (rounding up to 256k each time)
* if allocation of a multi-arena fails, use a binary search algorithm
to allocate the largest multi-arena possible (rounding up to 256k at
each step)
* cap the size of multi arenas at, let's say, 32mb
So multi-arenas would be 1mb, 1.5mb, 2.25mb, 3.5mb (round up!), etc.
Fun fact: Python allocates 16 arenas at the start of the program, just
to initialize obmalloc. That consumes 4mb of memory. With the above
multi-arena approach, that'd allocate the first three multi-arenas,
pre-allocating 19 arenas, leaving 3 unused. It's *mildly* tempting to
make the first multi-arena be 4mb, just so this is exactly right-sized,
but... naah.
//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170602/71562c56/attachment.html>
More information about the Python-Dev
mailing list