[Python-Dev] binary trees. Review obmalloc.c

Sat Apr 29 06:17:02 CEST 2006

[Vladimir 'Yu' Stepanov]
>> * To adapt allocation of blocks of memory with other alignment. Now
>> alignment is rigidly set on 8 bytes. As a variant, it is possible to
>> use alignment on 4 bytes. And this value can be set at start of the
>> interpreter through arguments/variable environments/etc. At testing
>> with alignment on 4 or 8 bytes difference in speed of work was not
>> appreciable.

[Martin v. Löwis]
> That depends on the hardware you use, of course. Some architectures
> absolutely cannot stand mis-aligned accesses, and programs will just
> crash if they try to perform such accesses.

Note that we _had_ to add a goofy "long double" to the PyGC_Head union
because some Python platform required 8-byte alignment for some types
... see rev 25454.  Any spelling of malloc() also needs to return
memory aligned for any legitimate purpose, and 8-byte alignment is the
strictest requirement we know of across current Python platforms.

> So Python should err on the safe side, and only use a smaller alignment
> when it is known not to hurt.
>
> OTOH, I don't see the *advantage* in reducing the alignment.

It could cut wasted bytes.  There is no per-object memory overhead in
a release-build obmalloc:  the allocatable part of a pool is entirely
used for user-visible object memory, except when the alignment
requirement ends up forcing unused (by both the user and by obmalloc)
pad bytes.  For example, Python ints consume 12 bytes on 32-bit boxes,
but if space were allocated for them by obmalloc (it's not), obmalloc
would have to use 16 bytes per int to preserve 8-byte alignment.

OTOH, obmalloc (unlike PyGC_Head) has always used 8-byte alignment,
because that's what worked best for Vladimir Marangozov during his
extensive timing tests.  It's not an isolated decision -- e.g., it's
also influenced by, and influences, "the best" pool size, and (of
course) cutting alignment to 4 would double the number of "size
classes" obmalloc has to juggle.

>> * To expand the maximal size which can be allocated by means of the
>> given module. Now the maximal size is 256 bytes.

> Right. This is somewhat deliberate, though; the expectation is that
> fragmentation will increase dramatically if even larger size classes
> are supported.

It's entirely deliberate ;-)  obmalloc is no way trying to be a
general-purpose allocator.  It was designed specifically for the
common memory patterns in Python, aiming at large numbers of small
objects of the same size.  That's extremely common in Python, and
allocations larger than 256 bytes are not. The maximum size was
actually 64 bytes at first.    After that, dicts grew an embedded
8-element table, which vastly increased the size of the dict struct. 
obmalloc's limit was boosted to 256 then, although it could have
stopped at the size of a dict (in the rough ballpark of 150 bytes). 
There was no reason (then or now) to go beyond 256.

>> * At the size of the allocated memory close to maximal, the
>> allocation of blocks becomes inefficient. For example, for the
>> sizesof blocks 248 and 256 (blocksize), values of quantity of
>> elements (PAGESIZE/blocksize) it is equal 16. I.e. it would be
>> possible to use for the sizes of the block 248 same page, as for the
>> size of the block 256.

> Good idea; that shouldn't be too difficult to implement: for sizes above
> 215, pools need to be spaced apart only at 16 bytes.

I'd rather drop the limit to 215 then <0.3 wink>.  Allocations that
large probably still aren't important to obmalloc, but it is important
that finding a requested allocation's "size index" be as cheap as
possible.  Uniformity helps that.

On modern boxes, it may be desirable to boost POOL_SIZE -- nobody has
run timing experiments as comprehensive as Vladimir ran way back when,
and there's no particular reason to believe that the same set of
tradeoffs would look best on current architectures.