Speed up this code?

Ben Cartwright bencvt at gmail.com
Thu May 25 21:24:03 EDT 2006


aomighty at gmail.com wrote:
> I'm creating a program to calculate all primes numbers in a range of 0
> to n, where n is whatever the user wants it to be. I've worked out the
> algorithm and it works perfectly and is pretty fast, but the one thing
> seriously slowing down the program is the following code:
>
> def rmlist(original, deletions):
>    return [i for i in original if i not in deletions]
>
> original will be a list of odd numbers and deletions will be numbers
> that are not prime, thus this code will return all items in original
> that are not in deletions. For n > 100,000 or so, the program takes a
> very long time to run, whereas it's fine for numbers up to 10,000.
>
> Does anybody know a faster way to do this? (finding the difference all
> items in list a that are not in list b)?

The "in" operator is expensive for lists because Python has to check,
on average, half the items in the list.  Use a better data structure...
in this case, a set will do nicely.  See the docs:

http://docs.python.org/lib/types-set.html
http://docs.python.org/tut/node7.html#SECTION007400000000000000000

Oh, and you didn't ask for it, but I'm sure you're going to get a dozen
pet implementations of prime generators from other c.l.py'ers.  So
here's mine. :-)

def primes():
    """Generate prime numbers using the sieve of Eratosthenes."""
    yield 2
    marks = {}
    cur = 3
    while True:
        skip = marks.pop(cur, None)
        if skip is None:
            # unmarked number must be prime
            yield cur
            # mark ahead
            marks[cur*cur] = 2*cur
        else:
            n = cur + skip
            while n in marks:
                # x already marked as multiple of another prime
                n += skip
            # first unmarked multiple of this prime
            marks[n] = skip
        cur += 2

--Ben




More information about the Python-list mailing list