Removing None objects from a sequence
bearophileHUGS at lycos.com
bearophileHUGS at lycos.com
Sat Dec 13 09:00:09 EST 2008
Steven D'Aprano:
> The algorithm is unclear: try explaining
> what you are doing in English, and see how difficult it is.
That algorithm is a standard one, and quite clear. Any programmer
worth his/her/hir salt has to be able to understand that.
I agree that the idiom with the list comp (algorithm2) is better for
Python, but you have to know that algorithm1 to use it in other
languages.
> Now, sure, most of the work in Tim's version is executed in fast C code
> instead of slow Python code. If we were programming in C, your version
> might perform better relative to Tim's.
Try Psyco too:
from timeit import default_timer as clock
def remove_nones1(alist):
n = len(alist)
i = j = 0
while i < n:
if alist[i] is not None:
alist[j] = alist[i]
j += 1
i += 1
alist[j : i] = []
def remove_nones2(alist):
alist[:] = [x for x in alist if x is not None]
def remove_nones3(alist):
for i in xrange(len(alist)-1, -1, -1):
if alist[i] is None:
del alist[i]
def test123(data):
data1 = data[:]
data2 = data[:]
data3 = data[:]
t = clock()
remove_nones1(data1)
print " remove_nones1:", round(clock() - t, 2), "s"
t = clock()
remove_nones2(data2)
print " remove_nones2:", round(clock() - t, 2), "s"
t = clock()
remove_nones3(data3)
print " remove_nones3:", round(clock() - t, 2), "s"
assert data1 == data2 == data3
assert data1 == data2
def test12(data):
data1 = data[:]
data2 = data[:]
data3 = data[:]
t = clock()
remove_nones1(data1)
print " remove_nones1:", round(clock() - t, 2), "s"
t = clock()
remove_nones2(data2)
print " remove_nones2:", round(clock() - t, 2), "s"
print " remove_nones3: (not tested)"
assert data1 == data2
def gen_data(N):
print " N:", N
data = [None]*2 + range(5) + [None]*3 + range(5, 10) + [None]*4 +
\
[10, None, 11, None, 12, None]
data *= N
data += range(N * 10)
data += [None] * (N * 10)
data += [None, 0] * (N * 10)
data += [1] * (N * 100)
return data
print "Without Psyco:"
test123(gen_data(1000))
test12(gen_data(30000))
print
print "With Psyco:"
import psyco; psyco.full()
test123(gen_data(1000))
test12(gen_data(30000))
Results on a Core2 2 GHz, Python 2.6.1, latest Psyco:
Without Psyco:
N: 1000
remove_nones1: 0.05 s
remove_nones2: 0.02 s
remove_nones3: 3.04 s
N: 30000
remove_nones1: 1.51 s
remove_nones2: 0.62 s
remove_nones3: (not tested)
With Psyco:
N: 1000
remove_nones1: 0.0 s
remove_nones2: 0.01 s
remove_nones3: 3.01 s
N: 30000
remove_nones1: 0.08 s
remove_nones2: 0.33 s
remove_nones3: (not tested)
As you see even with Psyco a (bit) lower level style of coding
wins :-)
In practice if that routine is important (or if it's a generic library
routine that you don't know how it will be used) you use two different
algorithms according to the availability of Psyco. Something like:
def remove_nones4(alist):
if psyco_available:
n = len(alist)
i = j = 0
while i < n:
if alist[i] is not None:
alist[j] = alist[i]
j += 1
i += 1
alist[j : i] = []
else:
alist[:] = [x for x in alist if x is not None]
Bye,
bearophile
More information about the Python-list
mailing list