Powersets of a list?

Bengt Richter bokr at accessone.com
Tue May 29 14:44:58 EDT 2001


On Mon, 28 May 2001 13:56:39 -0400, "Tim Peters" <tim.one at home.com>
wrote:

>[Bengt Richter]
>> What about the effect of garbage collection? To be fair, shouldn't all
>> timings be started after a fresh garbage collection, at least on an
>> algorithm basis, if not each invocation?
>
>The bulk of garbage collection in CPython is done by reference counting, and
>objects normally go away as soon as their last reference goes away.  That's
>"fair" across algorithms, in the sense that they'll pay for the amount of
>trash they create while they're running.
>
>There's another scheme, though, for cleaning up trash with cycles, which
>reference counting alone can't discover.  This is triggered by "excess"
>allocations:  if, since the last time this pass ran, the number of
>allocations is greater than the number of destructions by a certain amount,
>this other scheme kicks in, and searches *everything* for the possibility of
>trash cycles.  So when you're, say, creating oodles and oodles of lists, but
>not destroying them, the cycle-detection system gets triggered often.
>
>That's also fair (in some sense <wink>), but much subtler.  To see the
>effect on a particular algorithm, it's easiest to disable that form of gc
>for the duration:
>
>import gc
 gc.collect()	# force as consistent a state as possible?
>gc.disable()
># run the algorithm; report times
>gc.enable()
>
>It *usually* doesn't make much difference (a few percent).
>
I guess it would depend on how allocation is implemented,
whether there's left over state to affect the current test
if you don't force a collection. Even then, could the state
of available memory-chunk pools contain un-consolidated
small chunks from a previous pass that would benefit the current?
I.e., if big chunks are allocated from a different pool than
little chunks, and little-chunk pools are extended by allocating
big chunks and making them available as a bunch of little ones.
I am just speculating, not having looked at the code.

On windows one should probably reboot in a configuration excluding
all automatically started applications and connectivity, and start
a new instance of the python interpreter, all for each algorithm
test, to be fair to each of them. Even then it would probably
be a good idea to run the tests in all possible permutations
of order ;-)

When timing really fast things, it's also important to classify
the results as to what kind of extraneous time may be included,
(since, UIAM, clock() is accurate, but not user process time)
e.g., cpu cache loading, interrupt servicing, context switches
to higher priority threads/processes, VM/disk stuff, etc.

It's interesting to store the values in an array and graph them.
They will often tend to happen in patterns, with values selected
from a small set.

Hiccups in memory allocation should show up, even if no gc
was triggered. I am assuming that the allocation process is not
_absolutely_ constant time ;-)

If you use the time values as dict keys and count the frequency
of each, you can print an interesting frequency-sorted list.




More information about the Python-list mailing list