preferring [] or () in list of error codes?

samwyse samwyse at gmail.com
Tue Jun 9 07:57:48 EDT 2009


On Jun 8, 8:57 pm, samwyse <samw... at gmail.com> wrote:
> I conclude that using constructors is generally a bad idea, since the
> compiler doesn't know if you're calling the builtin or something with
> an overloaded name.  I presume that the compiler will eventually
> optimize the second example to match the last, but both of them use
> the BUILD_SET opcode.  I expect that this can be expensive for long
> lists, so I don't think that it's a good idea to use set constants
> inside loops.  Instead it should be assigned to a global or class
> variable.

Time to test things!   I'm going to compare three things using Python
3.0:
  X={...}\nS=lambda x: x in X
  S=lambda x: x in {...}
  S=lambda x: x in (...)
where the ... is replaced by lists of integers of various lengths.
Here's the test bed:

from random import seed, sample
from timeit import Timer
maxint = 2**31-1
values = list(map(lambda n: 2**n-1, range(1,16)))
def append_numbers(k, setup):
    seed(1968740928)
    for i in sample(range(maxint), k):
        setup.append(str(i))
        setup.append(',')
print('==', 'separate set constant')
for n in values[::2]:
    print('===', n, 'values')
    setup = ['X={']
    append_numbers(n, setup)
    setup.append('}\nS=lambda x: x in X')
    t = Timer('S(88632719)', ''.join(setup))
    print(t.repeat())
print('==', 'in-line set constant')
for n in values[:4]:
    print('===', n, 'values')
    setup = ['S=lambda x: x in {']
    append_numbers(n, setup)
    setup.append('}')
    t = Timer('S(88632719)', ''.join(setup))
    print(t.repeat())
print('==', 'in-line list constant')
for n in values:
    print('===', n, 'values')
    setup = ['S=lambda x: x in (']
    append_numbers(n, setup)
    setup.append(')')
    t = Timer('S(88632719)', ''.join(setup))
    print(t.repeat())

And here are the results.  There's something interesting at the very
end.

== separate set constant
=== 1 values
[0.26937306277753176, 0.26113626173158877, 0.26000092190487889]
=== 7 values
[0.26583266867716426, 0.27223543774418268, 0.27681646689732919]
=== 31 values
[0.25089725090758752, 0.25562690230182894, 0.25844625504079444]
=== 127 values
[0.32404313956103392, 0.33048948958596691, 0.34487930728626104]
=== 511 values
[0.27574566041214732, 0.26991838348169983, 0.28309016928129083]
=== 2047 values
[0.27826162263639631, 0.27337357122204065, 0.26888752620793976]
=== 8191 values
[0.27479134917985437, 0.27955955295994261, 0.27740676538498654]
=== 32767 values
[0.26189725230441319, 0.25949247739587022, 0.2537356004743625]
== in-line set constant
=== 1 values
[0.43579086168772818, 0.4231755711968983, 0.42178740594125852]
=== 3 values
[0.54712875519095228, 0.55325048295244272, 0.54346991028189251]
=== 7 values
[1.1897654590178366, 1.1763383335032813, 1.2009900699669931]
=== 15 values
[1.7661906750718313, 1.7585005915556291, 1.7405896559478933]
== in-line list constant
=== 1 values
[0.23651385860493335, 0.24746972031361381, 0.23778469051234197]
=== 3 values
[0.23710750947396875, 0.23205630883254713, 0.23345592805789295]
=== 7 values
[0.24607764394636789, 0.23551903943099006, 0.24241377046524093]
=== 15 values
[0.2279376289444599, 0.22491908887861456, 0.24076747184349045]
=== 31 values
[0.22860084172708994, 0.233022074034551, 0.23138639128715965]
=== 63 values
[0.23671639831319169, 0.23404259479906031, 0.22269394573891077]
=== 127 values
[0.22754176857673158, 0.22818151468971593, 0.22711154629987718]
=== 255 values
[0.23503126794047802, 0.24493699618247788, 0.26690207833677349]
=== 511 values
[0.24518255811842238, 0.23878118587697728, 0.22844830837438934]
=== 1023 values
[0.23285585179122137, 0.24067220833932623, 0.23807439213642922]
=== 2047 values
[0.24206484343680756, 0.24352201187581102, 0.24366253252857462]
=== 4095 values
[0.24624526301527183, 0.23692145230748807, 0.23829956041899081]
=== 8191 values
[0.22246514570986164, 0.22435309515595137, 0.22220114567633331]
=== 16383 values
[194.29462683106374, 193.21789529116128, 193.25843228678508]
=== 32767 values

You will note that testing against a list constant is just as fast as
testing against a set.  This was surprising for me; apparently the
__contains__ operator turns a tuple into a set.  You will also note
that  performance to fall off drastically for the last set of values.
I'm not sure what happens there; I guess I'll file a bug report.



More information about the Python-list mailing list