Speed quirk: redundant line gives six-fold speedup

Stelios Xanthakis sxanth at ceid.upatras.gr
Thu Aug 25 14:23:09 EDT 2005


Mark Dickinson wrote:
> I have a simple 192-line Python script that begins with the line:
> 
> dummy0 = 47
> 
> The script runs in less than 2.5 seconds.  The variable dummy0 is never
> referenced again, directly or indirectly, by the rest of the script.
> 
> Here's the surprise: if I remove or comment out this first line, the
> script takes more than 15 seconds to run.  So it appears that adding a
> redundant line produces a spectacular six-fold increase in speed!
> 
> (Actually, I had to add 29 dummy lines at the beginning of the code to
> get the speed increase; if any one of these lines is removed the
> running time reverts to around 15 seconds again.)
> 
> Questions:
> 
> (1) Can anyone else reproduce this behaviour, or is it just some quirk
>     of my setup?
> (2) Any possible explanations?  Is there some optimization that kicks
>     in at a certain number of lines, or at a certain length of
>     bytecode?
> (3) If (2), is there some way to force the optimization, so that I can
>     get the speed increase without having to add the extra lines?
>

Hi.

I haven't been able to reproduce this but I had a similar case
before (a program that some times crashed and some times worked
perfectly and the difference was whitespace in the comments!!!).

After lots of wondering and thinking that I must be dreaming
(luckily I had pyvm which also crashed but for different amounts
of whitespace), it was solved.  The explanation is this: hash
and comparison of objects depends on the state of the memory
allocator.  A sample case is this:

	class A: pass
	dummy0=47  # comment this to get a different result for min
	a=A()
	b=A()
	print min (a, b)

the result of 'min' is not only non-deterministic but also depends
on whether other things have been allocated before.  The same
thing can happen for 'dictionary.keys()' if the keys are objects
and 'iterate-over-set' when the set contains objects.

In the sudoku solver, there is a min (number, object) which is
probably what's affected by the extistance of the dummy variable.
Now, in sudoku puzzles some times the algorithm has to suppose
that in a box the right solution is one of the possible, try it
and if it fails then try the other one.  I guess that the result
of from the different 'min' leads the solver to first try the
correct solution in the fast case and therefore need not attempt
the wrong one.  Or at least this is what I think that happens.

By the way, since we started posting code, here is a more pythonic
sudoku solver.

#----The 'man with scissors runs around shifting barrels algorithm'

from sys import argv
SEMANTIC = 1'SEM' in argv

class Impossible:
     pass

Patterns = []
for r in range (9):
     for c in range (9):
	p = 27*(r/3) + 3*(c/3)
	pl = set (range (9*r, 9*r+9) + range (c, 81, 9) + [p+x for x in 
(0,1,2,9,10,11,18,19,20)])
	pl.remove (9*r+c)
	Patterns.append (tuple (sorted (list (pl))))

def initboard ():
     x = range (1, 10)
     return [ x [:] for i in xrange (9*9) ]

def printboard (board):
     if not SEMANTIC:
         return
     print 30*'-'
     for i in range (9):
	for j in board [9*i:9*(i+1)]:
	    if type (j) is list:
		#print 'X',
		print ''.join (map (str, j)),
	    else: print j,
	print
     print 30*'-'

def dupboard (board):
     B = []
     for i in board:
	if type (i) is list:
	    B.append (i [:])
	else:
	    B.append (i)
     return B

def solve (board, coords):
     while coords:
	p, v = coords.pop ()
	board [p] = v
	for i in Patterns [p]:
	    if type (board [i]) is list:
		if v in board [i]:
		    board [i].remove (v)
		    if len (board [i]) == 1:
			board [i] = board [i][0]
			coords.append ((i, board [i]))
	    else:
		if board [i] == v:
		    raise Impossible
     for p, i in enumerate (board):
	if type (i) is list:
	    for j in i:
		try:
		    return solve (dupboard (board), [(p, j)])
		except Impossible:
		    pass
	    raise Impossible
     return board


PP = [
[
"7xxxxxx19",
"46x19xxxx",
"xxx6827x4",
"x9xxxxxx7",
"xxx3xx4x5",
"xx67xxxxx",
"xx1xxxxxx",
"2xxx74xxx",
"xxx2xx3xx",
]
]

def puz2coord (P):
     if len (P) != 9:
	print "P must have 9 rows"
	raise SystemExit
     coords = []
     for r, i in enumerate (P):
	if len (i) != 9:
	    print "Row [%s] doesn't have 9 columns" %i
	    raise SystemExit
	for c, j in enumerate (list (i)):
	    if j != 'x':
		coords.append ((9*r + c, int (j)))
     return coords

try:
   if SEMANTIC:
    for i in xrange (10):
     for P in PP:
         printboard (solve (initboard (), puz2coord (P)))
   else:
    for i in xrange (TIMES):
     for P in PP:
         printboard (solve (initboard (), puz2coord (P)))
except Impossible:
     print "IMPOSSIBLY IMPOSSIBLE"




More information about the Python-list mailing list