Most efficient way to "pre-grow" a list?

Mon Nov 9 23:56:30 EST 2009

En Sun, 08 Nov 2009 10:08:35 -0300, gil_johnson  
<gil_johnson at earthlink.net> escribió:
> On Nov 6, 8:46 pm, gil_johnson <gil_john... at earthlink.net> wrote:
>
> The problem I was solving was this: I wanted an array of 32-bit
> integers to be used as a bit array, and I wanted it initialized with
> all bits set, that is, each member of the array had to be set to
> 4294967295. Of course, you could set your initializer to 0, or any
> other 32-bit number.
>
> Originally I found that the doubling method I wrote about before was a
> LOT faster than appending the elements one at a time, and tonight I
> tried the "list = initializer * N" method. Running the code below, the
> doubling method is still fastest, at least on my system.
>
> Of course, as long as you avoid the 'one at a time' method, we're
> talking about fractions of a second, even for arrays that I think are
> huge, like the 536,870,912 byte beastie below.

I don't get your same results; one_item*N is faster on my tests. Note that  
you're comparing disparate things:

> # Doubling method, run time = 0.413938045502
> newArray = array.array('I')             # 32-bit unsigned
> integers

Method 1 creates an array object; this takes roughly 2**29 bytes

> # One at a time, run time = 28.5479729176
> newArray2 = array.array('I')
> for i in range(134217728):        # the same size as above
>     newArray2.append(4294967295)

This creates also an array object, *and* 134 million integer objects in  
the meantime.

> # List with "*", run time = 1.06160402298
> newList = [4294967295] * 134217728

And this creates a list object, not an array.

Note that all your 3 methods create global objects and you never delete  
them - so the last method is at a disadvantage.
A more fair test would use timeit, and run just one method at a time:

<code>
# Written in Python 3.x
# use xrange everywhere with Python 2.x

import array

INITIAL_VALUE = 4294967295
LOG2SIZE=25
SIZE = 2**LOG2SIZE

def method1a():
   newArray = array.array('I')
   newArray.append(INITIAL_VALUE)
   for i in range(LOG2SIZE):
     newArray.extend(newArray)
   assert len(newArray)==SIZE
   assert newArray[SIZE-1]==INITIAL_VALUE

def method2a():
   newArray = array.array('I')
   for i in range(SIZE):
     newArray.append(INITIAL_VALUE)
   assert len(newArray)==SIZE
   assert newArray[SIZE-1]==INITIAL_VALUE

def method3a():
   newArray = array.array('I', [INITIAL_VALUE]) * SIZE
   assert len(newArray)==SIZE
   assert newArray[SIZE-1]==INITIAL_VALUE

def method1l():
   newList = [INITIAL_VALUE]
   for i in range(LOG2SIZE):
     newList.extend(newList)
   assert len(newList)==SIZE
   assert newList[SIZE-1]==INITIAL_VALUE

def method2l():
   newList = []
   for i in range(SIZE):
     newList.append(INITIAL_VALUE)
   assert len(newList)==SIZE
   assert newList[SIZE-1]==INITIAL_VALUE

def method3l():
   newList = [INITIAL_VALUE] * SIZE
   assert len(newList)==SIZE
   assert newList[SIZE-1]==INITIAL_VALUE
</code>

D:\temp>python31 -m timeit -s "from arraygrow import method1a" "method1a()"
10 loops, best of 3: 411 msec per loop

D:\temp>python31 -m timeit -s "from arraygrow import method2a" "method2a()"
...aborted, too long...

D:\temp>python31 -m timeit -s "from arraygrow import method3a" "method3a()"
10 loops, best of 3: 377 msec per loop

D:\temp>python31 -m timeit -s "from arraygrow import method1l" "method1l()"
10 loops, best of 3: 628 msec per loop

D:\temp>python31 -m timeit -s "from arraygrow import method3l" "method3l()"
10 loops, best of 3: 459 msec per loop

So arrays are faster than lists, and in both cases one_item*N outperforms  
your doubling algorithm.
Adding one item at a time is -at least- several hundred times slower; I  
was not patient enough to wait.

> Finally, I just looked into calling C functions, and found
> PyMem_Malloc, PyMem_Realloc, PyMem_Free, etc. in the Memory Management
> section of the Python/C API Reference Manual. This gives you
> uninitialized memory, and should be really fast, but it's 6:45 AM
> here, and I don't have the energy to try it.

No, those are for internal use only, you can't use the resulting pointer  
in Python code. An array object is a contiguous memory block, so you don't  
miss anything.

-- 
Gabriel Genellina