Performance of int/long in Python 3

Neil Hodgson nhodgson at iinet.net.au
Tue Apr 2 23:31:03 EDT 2013


Ian Kelly:

> Micro-benchmarks like the ones you have been reporting are *useful*
> when it comes to determining what operations can be better optimized,
> but they are not *important* in and of themselves.  What is important
> is that actual, real-world programs are not significantly slowed by
> these kinds of optimizations.  Until you can demonstrate that real
> programs are adversely affected by PEP 393, there is not in my opinion
> any regression that is worth worrying over.

    The problem with only responding to issues with real-world programs 
is that real-world programs are complex and their performance issues 
often difficult to diagnose. See, for example, scons which is written in 
Python and which has not been able to overcome performance problems over 
several years. 
(http://www.electric-cloud.com/blog/2010/07/21/a-second-look-at-scons-performance/)

    Bottom-up performance work has advantages in that a narrow focus 
area can be more easily analyzed and tested and can produce widely 
applicable benefits.

    The choice of comparison for the script wasn't arbitrary. Comparison 
is one of the main building blocks of higher-level code. Sorting, for 
example, depends strongly on comparison performance with a decrease in 
comparison speed multiplied when applied to sorting.

    Its unfortunate that stringbench.py does not contain any comparison 
or sorting tests.

    Sorting a million string list (all the file paths on a particular 
computer) went from 0.4 seconds with Python 3.2 to 0.78 with 3.3 so 
we're out of the 'not noticeable by humans' range. Perhaps this is still 
a 'micro-benchmark' - I'd just like to avoid adding email access to get 
this over the threshold.

    Here's some code. Replace the "if 1" with "if 0" on subsequent runs 
to avoid the costly file system walk.

import os, time
from os.path import join, getsize
paths = []
if 1:
     for root, dirs, files in os.walk('c:\\'):
         for name in files:
             paths.append(join(root, name))
     with open("filelist.txt", "w") as f:
         f.write("\n".join(paths))
else:
     with open("filelist.txt", "r") as f:
         paths = f.read().split("\n")
print(len(paths))
timeStart = time.time()
paths.sort()
timeEnd = time.time()
print("Time taken=", timeEnd - timeStart)

    Neil



More information about the Python-list mailing list