[Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0)

M.-A. Lemburg mal@lemburg.com
Wed, 31 Jan 2001 15:34:19 +0100


Michael Hudson wrote:
> 
> In the interest of generating some numbers (and filling up my hard
> drive), last night I wrote a script to build lots & lots of versions
> of python (many of which turned out to be redundant - eg. -O6 didn't
> seem to do anything different to -O3 and pybench doesn't work with
> 1.5.2), and then run pybench with them.  Summarised results below;
> first a key:
> 
> src-n: this morning's CVS (with Jeremy's f_localsplus optimisation)
>         (only built this with -O3)
> src: CVS from yesterday afternoon
> src-obmalloc: CVS from yesterday afternoon with Vladimir's obmalloc
>         patch applied.  More on this later...
> Python-2.0: you can guess what this is.
> 
> All runs are compared against Python-2.0-O2:
> 
> Benchmark: src-n-O3 (rounds=10, warp=20)
>             Average round time:   49029.00 ms              -0.86%
> Benchmark: src (rounds=10, warp=20)
>             Average round time:   67141.00 ms             +35.76%
> Benchmark: src-O (rounds=10, warp=20)
>             Average round time:   50167.00 ms              +1.44%
> Benchmark: src-O2 (rounds=10, warp=20)
>             Average round time:   49641.00 ms              +0.37%
> Benchmark: src-O3 (rounds=10, warp=20)
>             Average round time:   49104.00 ms              -0.71%
> Benchmark: src-O6 (rounds=10, warp=20)
>             Average round time:   49131.00 ms              -0.66%
> Benchmark: src-obmalloc (rounds=10, warp=20)
>             Average round time:   63276.00 ms             +27.94%
> Benchmark: src-obmalloc-O (rounds=10, warp=20)
>             Average round time:   46927.00 ms              -5.11%
> Benchmark: src-obmalloc-O2 (rounds=10, warp=20)
>             Average round time:   46146.00 ms              -6.69%
> Benchmark: src-obmalloc-O3 (rounds=10, warp=20)
>             Average round time:   46456.00 ms              -6.07%
> Benchmark: src-obmalloc-O6 (rounds=10, warp=20)
>             Average round time:   46450.00 ms              -6.08%
> Benchmark: Python-2.0 (rounds=10, warp=20)
>             Average round time:   68933.00 ms             +39.38%
> Benchmark: Python-2.0-O (rounds=10, warp=20)
>             Average round time:   49542.00 ms              +0.17%
> Benchmark: Python-2.0-O3 (rounds=10, warp=20)
>             Average round time:   48262.00 ms              -2.41%
> Benchmark: Python-2.0-O6 (rounds=10, warp=20)
>             Average round time:   48273.00 ms              -2.39%
> 
> My conclusion?  Python 2.1 is slower than Python 2.0, but not by
> enough to care about.

What compiler did you use and on which platform ?

I have made similar experience with -On with n>3 compared to -O2
using pgcc (gcc optimized for PC processors). BTW, the Linux
kernel uses "-Wall -Wstrict-prototypes -O3 -fomit-frame-pointer"
as CFLAGS -- perhaps Python should too on Linux ?!
 
Does anybody know about the effect of -fomit-frame-pointer ?
Would it cause problems or produce code which is not compatible
with code compiled without this flag ?

> Interestingly, adding obmalloc speeds things up.  Let's take a closer
> look:
> 
> $ python pybench.py -c src-obmalloc-O3 -s src-O3
> PYBENCH 0.7
> 
> Benchmark: src-O3 (rounds=10, warp=20)
> 
> Tests:                              per run    per oper.  diff *
> ------------------------------------------------------------------------
>           BuiltinFunctionCalls:     843.35 ms    6.61 us   +2.93%
>            BuiltinMethodLookup:     878.70 ms    1.67 us   +0.56%
>                  ConcatStrings:    1068.80 ms    7.13 us   -1.22%
>                  ConcatUnicode:    1373.70 ms    9.16 us   -1.24%
>                CreateInstances:    1433.55 ms   34.13 us   +9.06%
>        CreateStringsWithConcat:    1031.75 ms    5.16 us  +10.95%
>        CreateUnicodeWithConcat:    1277.85 ms    6.39 us   +3.14%
>                   DictCreation:    1275.80 ms    8.51 us  +44.22%
>                       ForLoops:    1415.90 ms  141.59 us   -0.64%
>                     IfThenElse:    1152.70 ms    1.71 us   -0.15%
>                    ListSlicing:     397.40 ms  113.54 us   -0.53%
>                 NestedForLoops:     789.75 ms    2.26 us   -0.37%
>           NormalClassAttribute:     935.15 ms    1.56 us   -0.41%
>        NormalInstanceAttribute:     961.15 ms    1.60 us   -0.60%
>            PythonFunctionCalls:    1079.65 ms    6.54 us   -1.00%
>              PythonMethodCalls:     908.05 ms   12.11 us   -0.88%
>                      Recursion:     838.50 ms   67.08 us   -0.00%
>                   SecondImport:     741.20 ms   29.65 us  +25.57%
>            SecondPackageImport:     744.25 ms   29.77 us  +18.66%
>          SecondSubmoduleImport:     947.05 ms   37.88 us  +25.60%
>        SimpleComplexArithmetic:    1129.40 ms    5.13 us  +114.92%
>         SimpleDictManipulation:    1048.55 ms    3.50 us   -0.00%
>          SimpleFloatArithmetic:     746.05 ms    1.36 us   -2.75%
>       SimpleIntFloatArithmetic:     823.35 ms    1.25 us   -0.37%
>        SimpleIntegerArithmetic:     823.40 ms    1.25 us   -0.37%
>         SimpleListManipulation:    1004.70 ms    3.72 us   +0.01%
>           SimpleLongArithmetic:     865.30 ms    5.24 us  +100.65%
>                     SmallLists:    1657.65 ms    6.50 us   +6.63%
>                    SmallTuples:    1143.95 ms    4.77 us   +2.90%
>          SpecialClassAttribute:     949.00 ms    1.58 us   -0.22%
>       SpecialInstanceAttribute:    1353.05 ms    2.26 us   -0.73%
>                 StringMappings:    1161.00 ms    9.21 us   +7.30%
>               StringPredicates:    1069.65 ms    3.82 us   -5.30%
>                  StringSlicing:     846.30 ms    4.84 us   +8.61%
>                      TryExcept:    1590.40 ms    1.06 us   -0.49%
>                 TryRaiseExcept:    1104.65 ms   73.64 us  +24.46%
>                   TupleSlicing:     681.10 ms    6.49 us   -3.13%
>                UnicodeMappings:    1021.70 ms   56.76 us   +0.79%
>              UnicodePredicates:    1308.45 ms    5.82 us   -4.79%
>              UnicodeProperties:    1148.45 ms    5.74 us  +13.67%
>                 UnicodeSlicing:     984.15 ms    5.62 us   -0.51%
> ------------------------------------------------------------------------
>             Average round time:   49104.00 ms              +5.70%
> 
> *) measured against: src-obmalloc-O3 (rounds=10, warp=20)
> 
> Words fail me slightly, but maybe some tuning of the memory allocation
> of longs & complex numbers would be in order?

AFAIR, Vladimir's malloc implementation favours small objects.
All number objects (except longs) fall into this category.

Perhaps we should think about adding his lib to the core ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/