Prime number generator

Wed Jul 10 13:47:11 EDT 2013

On 10 July 2013 17:15, Chris Angelico <rosuav at gmail.com> wrote:
> On Thu, Jul 11, 2013 at 1:47 AM, bas <blswinkels at gmail.com> wrote:
>> On Wednesday, July 10, 2013 5:12:19 PM UTC+2, Chris Angelico wrote:
>>> Well, that does answer the question. Unfortunately the use of lambda
>>> there has a severe performance cost [ ...]
>> If you care about speed, you might want to check the heapq module. Removing the smallest item and inserting a new item in a heap both cost O(log(N)) time, while finding the minimum in a dictionary requires iterating over the whole dictionary, which cost O(N) time.

Actually, because it's a list under the hood I'd imagine push and pop
still take O(n) time :/.

> Ehh, speed isn't the ultimate. I was just trying to avoid something
> that worked out ridiculously slow (a Python function call IS quite
> slow). I haven't profiled the code to find out where the bulk of the
> time is spent, but switching in the lambda-based version doubled total
> run time, so I didn't like it :)
>
>> (untested)
>> #before loop
>> from heapq import *
>> primes = [(2,2)] #heap of tuples (multiple, prime). start with 1 item, so no need for heapify
>>
>> #during loop
>> smallest, prm = heappop(primes)
>> heappush(primes, (smallest+prm, prm))
>>
>> #when new prime found
>> heappush(primes, (i+i, i))
>
> Ahh, that's the bit I should have thought of! Of course.
>
> My original thought experiment had involved basically a really long
> list, like the classic Sieve but getting longer as time moves on, with
> composites replaced by None and primes with their next-markers, which
> I then collapsed to a dict. Always I was thinking in terms of indexing
> with the prime to get its next composite. Here's the code involving
> heapq:
>
> # -- start --
> def primes():
>         """Generate an infinite series of prime numbers."""
>         from heapq import heappush,heappop
>         i=2
>         yield 2
>         prime=[(2,2)] # Heap
>         while True:
>                 smallest, prm = heappop(prime)
>                 heappush(prime, (smallest+prm, prm))
>                 while i<smallest:
>                         yield i
>                         heappush(prime, (i+i, i))
>                         i+=1
>                 if i==smallest: i+=1
>
> gen=primes()
> print([next(gen) for i in range(10)])
> for i in range(1000):
>         next(gen) # Star Trek?
> print("The next prime number is:",next(gen))
> # -- end --
>
> And that's significantly shorter, clearer, AND faster than the original. Thanks!

AFAICT, that's exactly my code but using a less-efficient storage
medium and a *much* more efficient sorting mechanism. It'd be
interesting what could happen if heapq didn't reject blists -- they
have better efficiency for changing list sizes (which this code does a
lot of).

Thanks for the heads-up on heapq, by the way -- it's new to me and a
blimmin' good idea.

PS: It's faster to use heapreplace(...) than
heappop(...);heappush(...) but it only saves a few %.