Why is my code faster with append() in a loop than with a large list?

Mon Jul 6 05:21:07 EDT 2009

2009/7/6 Xavier Ho <contact at xavierho.com>:
> Why is version B of the code faster than version A? (Only three lines
> different)

Here's a guess:

As the number you're testing gets larger, version A is creating very
big list. I'm not sure exactly how much overhead each list entry has
in python, but I guess it's at least 8 bytes: a 32-bit reference for
each list entry, and 32 bits to hold the int value (assuming a 32-bit
version of python). The solution you're looking for is a large 8 digit
number; let's say 80,000,000, for the sake of easy calculation. That
means, as you get close to the solution, you'll be trying to allocate
almost 640 Mb of memory for every number you're checking. That's going
to make the garbage collector work extremely hard. Also, depending on
how much memory your computer has free, you'll probably start hitting
virtual memory too, which will slow you down even further. Finally,
the reduce step has to process all 80,000,000 elements which is
clearly going to take a while.

Version b creates a list which is only as long as the largest prime
factor, so at worst the list size will be approx. sqrt(80,000,000),
which is approx. 8900 elements or approx. 72 Kb or memory - a much
more manageable size.

Hope that helps,

Vil.