[Python-Dev] pymalloc killer

Wed, 03 Apr 2002 14:02:05 -0500

>> #define ADDRESS_IN_RANGE(P, I) \
>>         ((I) < narenas && (uptr)(P) - arenas[I] < (uptr)ARENA_SIZE)

[Marangozov, Vladimir]
> Bravo! Very nice.

Surprisingly so, yes <win>.

> Now that the pool header is reduced in size by one slot, I can only
> suggest a small optimization to get rid off the multiplication when
> trying to extend the free list. Since you know the code by heart
> already, this should be self explanatory. It's only 3 lines of code:
>
> 1) add a member in the pool struct, after capacity:
>
>       uint lastoffset;               /* free list tail block offset */
>
> 2) In malloc, in the "try to extend the free list" block:
>
>       size <<= ALIGNMENT_SHIFT;      /* block size */
>       pool->lastoffset += size;
>       pool->freeblock = (block *)pool + pool->lastoffset;
>
> 3) In the "init_pool" block:
>
>       pool->capacity = ...
>       pool->lastoffset = POOL_OVERHEAD + size;
>       UNLOCK();
>
> In other words, the lastoffset acts as an upper bound watermark.

I added a bunch of low-level overviews (arean mgmt, pool mgmt, block mgmt)
as comment blocks over the last week, and noted the possibility for this in
one of them.  Another possibility is to get rid of the runtime divisions,
and the pool_header capacity member, by exploiting that

	pool->capacity = (POOL_SIZE - POOL_OVERHEAD) / size;

is invariant across all pools with the same szidx; i.e., all references to
pool->capacity *could* be replaced by capacity[pool->szidx], where
"capacity" is a new small (at most 32 uints) file-static const vector
computed at compile-time.

> I didn't want to do that optimization before, because it would have
> resulted in a bigger pool header and waste of space. Now it's ok.

So now the tradeoff is a little muddier:  If I skipped the multiplication
optimization, but did the division optimization, each pool would suddenly
gain 8 more bytes (on 32-bit boxes) to use for blocks.  If I did both, the
pool_header size would remain as it is now, "halfway between"
multiple-of-8-byte sizes.  How do you judge the tradeoffs now (do only one,
or do both)?  I don't foresee that anyone will have enough time to do the
kinds of careful studies you did when first developing these algorithms, so
happy to defer to your educated intuition on this.