[Patches] [ python-Patches-425242 ] Patch which "inlines" small dictionaries
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 22 May 2001 12:15:27 -0700
Patches item #425242, was updated on 2001-05-18 10:15
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=425242&group_id=5470
Category: core (C code)
Group: None
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: M.-A. Lemburg (lemburg)
>Assigned to: Tim Peters (tim_one)
>Summary: Patch which "inlines" small dictionaries
Initial Comment:
As discussed on python-dev, this patch inlines small
dictionary tables directly in the dictionary object.
----------------------------------------------------------------------
>Comment By: Jeremy Hylton (jhylton)
Date: 2001-05-22 12:15
Message:
Logged In: YES
user_id=31392
timdict2.patch looks good on the few little tests and real
applications that I reported below. So measurement suggests
it does no harm, and reason suggests it helps.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-05-22 12:03
Message:
Logged In: YES
user_id=31435
You're suffering an off-by-10000x error <wink>:
test_mutants as-is actually creates 100 pairs of dicts (so
that's 200 dicts) of average size 50, and fills the keys
and values of each with instances of class Horrid (so
that's about 20,000 additional instance dicts).
If Jeremy is still keen to time things (I kinda doubt it:
we're not seeing any surprises), I'd much rather time real
programs now -- in real life dicts aren't just created and
destroyed, they're also accessed, and the small-but-real
getitem/setitem critical-path reductions are aimed at the
latter.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-05-22 10:15
Message:
Logged In: YES
user_id=38388
Jeremy, I think you ought to test the patches against
benchmarks which actually make extensive use of dictionaries
(test_mutant only uses 2-3 dictionaries).
pybench's Instances.py looks like a good start....
----------------------------------------------------------------------
Comment By: Jeremy Hylton (jhylton)
Date: 2001-05-22 08:45
Message:
Logged In: YES
user_id=31392
Comparing three different dict impls on the same tests.
CVS -- Python 2.2 CVS repository as of May 21, 2001
TIM -- Tim's dict patches to inline small dicts
MAL -- MAL's dict patches to inline small dicts
Test machine: Jeremy's 800MHz PIII, 256MB RAM, Linux 2.2.17
Compiled with gcc 2.95.3 -O3
Tests:
pystone -- the standard pystone benchmark
mutants -- the test_mutants test from the Python regression
test
compile -- the compiler package compiling itself
Results
Results for macro benchmarks are median of 9 runs.
For the mutants and compile benchmarks, measurement is
combined user
and system time reported by Unix time utility. For the
pystone
benchmark, result is time reported by pystone.
Result table
test CVS TIM MAL
test_mutant 1.49 1.48 1.53
pystone 0.94 0.91 0.94
compile 11.7 11.65 11.83
I tried running a few of the pybench tests with gprof
enabled, but
the time the runs took varied wildly -- a range of a few
seconds.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-05-22 02:54
Message:
Logged In: YES
user_id=38388
I think I better leave the timing to Jeremy then. It'll be
another week or two before I get the new machine up and
running.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-05-21 22:29
Message:
Logged In: YES
user_id=31435
I don't know, but I'd guess you're also using different
releases of compilers, libraries and an OS since you first
did this under 1.5. Any of those could account for it too
(e.g., in my Win9x days I traced many timing surprises to
bizarre system malloc() behavior); the Python codebase is
simply larger now too, thanks to mounds of useless new
stuff like Unicode <wink>.
BTW, bump the priority on getting that new box: I finally
did that myself earlier this year, and it's still a joy
that *everything* runs 5x faster. My old machine wasn't
quite quick enough to run Tomb Raider in high resolution,
but now I can do that and test Python at the same time.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-05-21 14:01
Message:
Logged In: YES
user_id=38388
I was a bit surprised at my timings, since I would have
expected a speedup too (I did get a speedup for Python 1.5
back in the old days). I think this has to do with CPU
caches and branch prediction.
In any case, a new machine is in the making :-) This old box
just aint no fun any more...
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-05-21 13:38
Message:
Logged In: YES
user_id=31435
On my WinTel Win2K box at work, current CVS is ~12350
pystone w/o my patch and ~12500 with it. Jeremy's timings
on his Linux+gcc box show small but reproducible
improvements on some other "realer" programs. I'm not
surprised if it actually slows things on *some* platforms,
because that's the nature of low-level optimization work.
But the changes simplify the code on frequent paths,
reducing the start-to-finish operation counts, and over
time that's the only thing that reliably improves x-
platform performance.
BTW, the pystone results suggest you could get a factor of
6 improvement in an hour by buying a midrange new machine
<wink>.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-05-21 06:53
Message:
Logged In: YES
user_id=38388
I've tested both patches against the unpatched CVS version
with pystone. The results are surprising: CVS benchmarks at
2300 pystones while your patch runs at 2200 pystones and my
patch at 2100 pystones.
I think we need to put some more thought into this...
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-05-21 02:07
Message:
Logged In: YES
user_id=31435
There's more explanation about how to bump MINSIZE in the
patch than in current dictobject.c. Read the new
comments! You have to choose a power of 2 and fiddle the
polys[] vector to match. That's all.
The code defines a ma_smalltable member just as yours did.
The rest is details. Primarily it's more aggressive about
exploiting the new member, because I found it was more
efficient to avoid undue work in the empty case by looping
on the ma_fill count rather than by continuing to
constantly fiddle around with NULL-pointer checks all over
dictobject.c (they've been replaced by asserts). It does
pay for a memset() of the ma_smalltable member during
creation, though.
It would be interesting to try both patches, but I spent
too much time on this already to debug your patch too. It
would be good if you could upload a new version of your
patch.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-05-21 01:25
Message:
Logged In: YES
user_id=38388
Hmm, timdict2.patch seems to be a completly different
solution. What I am missing in that patch is the ability to
tune the implementation... what happens if I want to bump
MINSIZE to some higher value ? I think the patch should at
least include an explanation of how this can be done.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-05-20 15:31
Message:
Logged In: YES
user_id=31435
Just deleted my first patch.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-05-20 15:30
Message:
Logged In: YES
user_id=31435
timdict2.patch is more aggressive about iterating until
we've seen fill non-virgin entries go by rather than the
table size. This is never slower but is sometimes a
surprising win! For example, dicts keyed by a contiguous
range of small integers tend to fill a solid initial
segment of the dict. When resizing or copying, after the
patch it's common for the loop to get out, e.g., after ~170
iterations instead of (the current, and always) 256.
Also added some desperately needed comments about what the
GF(2^n-{0}) business means in pragmatic terms; + a few
assorted cleanups.
I think this is ready for prime time, but do have concerns
about the increased memory use (embedding an 8-slot table
in every dict object means every dict bites another 8*3*4
== 96 bytes on a 32-bit box; OTOH, dicts with up to and
including 5 entries never need more space than that, and
dicts with 3, 4 or 5 entries were actually larger before
due to the malloc overhead tagging along with their
dynamically allocated 8-slot table).
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-05-19 20:40
Message:
Logged In: YES
user_id=31435
Assigned to Jeremy since he's been timing stuff lately
too: care to give it a try?
timdict.patch is in some ways more aggressive than MAL's
patch, taking advantage of that we always have *some*
memory to point to now, thus allowing to get rid of a bunch
of table==NULL? tests and associated gotos and labels.
The newer patch passes all the tests I've thrown at it, in
debug and release builds. It was a major bitch to get
working correctly, due to an excruciating interaction among
PyDict_Clear() and garbage collection and module
finalization and the special treatment of __builtins__ (who
would have guessed that allocating a tuple could cause a
module to finalize!?).
I never figured out why MAL's patch died, and after
spending 10 hours figuring out why mine did, don't intend
to waste the rest of the weekend trying <wink>.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-05-18 10:21
Message:
Logged In: YES
user_id=38388
This is just a quick update of the original patch which I
prepared for Python 1.5 some years ago. Since the dictionary
implementation has evolved somewhat since then, I'm not sure
whether it still works as robust as it does for Python 1.5
(I've been using this for years without any problem).
Running the test suite, their appears to be a hanger due to
some endless loop which occurrs for test_popen2. I'm not
sure where Python hangs, but it could well be that the
resize routines have to adapted a little for the current
dict implementation.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=425242&group_id=5470