Removing None objects from a sequence

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Sat Dec 13 09:00:09 EST 2008


Steven D'Aprano:
> The algorithm is unclear: try explaining
> what you are doing in English, and see how difficult it is.

That algorithm is a standard one, and quite clear. Any programmer
worth his/her/hir salt has to be able to understand that.

I agree that the idiom with the list comp (algorithm2) is better for
Python, but you have to know that algorithm1 to use it in other
languages.


> Now, sure, most of the work in Tim's version is executed in fast C code
> instead of slow Python code. If we were programming in C, your version
> might perform better relative to Tim's.

Try Psyco too:

from timeit import default_timer as clock

def remove_nones1(alist):
    n = len(alist)
    i = j = 0
    while i < n:
        if alist[i] is not None:
            alist[j] = alist[i]
            j += 1
        i += 1
    alist[j : i] = []


def remove_nones2(alist):
    alist[:] = [x for x in alist if x is not None]


def remove_nones3(alist):
    for i in xrange(len(alist)-1, -1, -1):
        if alist[i] is None:
            del alist[i]


def test123(data):
    data1 = data[:]
    data2 = data[:]
    data3 = data[:]

    t = clock()
    remove_nones1(data1)
    print "    remove_nones1:", round(clock() - t, 2), "s"

    t = clock()
    remove_nones2(data2)
    print "    remove_nones2:", round(clock() - t, 2), "s"

    t = clock()
    remove_nones3(data3)
    print "    remove_nones3:", round(clock() - t, 2), "s"

    assert data1 == data2 == data3
    assert data1 == data2


def test12(data):
    data1 = data[:]
    data2 = data[:]
    data3 = data[:]

    t = clock()
    remove_nones1(data1)
    print "    remove_nones1:", round(clock() - t, 2), "s"

    t = clock()
    remove_nones2(data2)
    print "    remove_nones2:", round(clock() - t, 2), "s"

    print "    remove_nones3: (not tested)"

    assert data1 == data2


def gen_data(N):
    print "  N:", N
    data = [None]*2 + range(5) + [None]*3 + range(5, 10) + [None]*4 +
\
                 [10, None, 11, None, 12, None]
    data *= N
    data += range(N * 10)
    data += [None] * (N * 10)
    data += [None, 0] * (N * 10)
    data += [1] * (N * 100)
    return data

print "Without Psyco:"
test123(gen_data(1000))
test12(gen_data(30000))
print

print "With Psyco:"
import psyco; psyco.full()
test123(gen_data(1000))
test12(gen_data(30000))


Results on a Core2 2 GHz, Python 2.6.1, latest Psyco:

Without Psyco:
  N: 1000
    remove_nones1: 0.05 s
    remove_nones2: 0.02 s
    remove_nones3: 3.04 s
  N: 30000
    remove_nones1: 1.51 s
    remove_nones2: 0.62 s
    remove_nones3: (not tested)

With Psyco:
  N: 1000
    remove_nones1: 0.0 s
    remove_nones2: 0.01 s
    remove_nones3: 3.01 s
  N: 30000
    remove_nones1: 0.08 s
    remove_nones2: 0.33 s
    remove_nones3: (not tested)

As you see even with Psyco a (bit) lower level style of coding
wins :-)
In practice if that routine is important (or if it's a generic library
routine that you don't know how it will be used) you use two different
algorithms according to the availability of Psyco. Something like:

def remove_nones4(alist):
    if psyco_available:
        n = len(alist)
        i = j = 0
        while i < n:
            if alist[i] is not None:
                alist[j] = alist[i]
                j += 1
            i += 1
        alist[j : i] = []
    else:
        alist[:] = [x for x in alist if x is not None]

Bye,
bearophile



More information about the Python-list mailing list