Most efficient way to "pre-grow" a list?

Sun Nov 8 08:08:35 EST 2009

On Nov 6, 8:46 pm, gil_johnson <gil_john... at earthlink.net> wrote:

> I don't have the code with me, but for huge arrays, I have used
> something like:
>
> >>> arr[0] = initializer
> >>> for i in range N:
> >>>      arr.extend(arr)
>
> This doubles the array every time through the loop, and you can add
> the powers of 2 to get the desired result.
> Gil

To all who asked about my previous post,

I'm sorry I posted such a cryptic note before, I should have waited
until I had the code available.

I am still new to Python, and I couldn't find a way to create an array
of size N, with each member initialized to a given value. If there is
one, please let me know.

I think Paul Rubin and Paul Rudin are right, if they really are 2
different people, an array is a better solution if you're dealing with
integers. They both mention numpy, which I know nothing about, or the
array module.

kj, what *are* you going to do with this list/array? As others have
pointed out, there are differences between lists, arrays, and
dictionaries.

The problem I was solving was this: I wanted an array of 32-bit
integers to be used as a bit array, and I wanted it initialized with
all bits set, that is, each member of the array had to be set to
4294967295. Of course, you could set your initializer to 0, or any
other 32-bit number.

Originally I found that the doubling method I wrote about before was a
LOT faster than appending the elements one at a time, and tonight I
tried the "list = initializer * N" method. Running the code below, the
doubling method is still fastest, at least on my system.

Of course, as long as you avoid the 'one at a time' method, we're
talking about fractions of a second, even for arrays that I think are
huge, like the 536,870,912 byte beastie below.

[code]

# Written in Python 3.x

import array
import time

#* * * * * * * * * * * * * * * * * * * * * *
# Doubling method, run time = 0.413938045502

t0 = time.time()

newArray = array.array('I')             # 32-bit unsigned
integers

newArray.append(4294967295)

for i in range(27):             # 2**27 integers, 2**29 bytes
    newArray.extend(newArray)

print(time.time() - t0)

print(newArray[134217727]) # confirm array is fully initialized

#* * * * * * * * * * * * * * * * * * * * * *
# One at a time, run time = 28.5479729176

t0 = time.time()

newArray2 = array.array('I')

for i in range(134217728):        # the same size as above
    newArray2.append(4294967295)

print(time.time() - t0)

print(newArray2[134217727])                   # confirm array

#* * * * * * * * * * * * * * * * * * * * * *
# List with "*", run time = 1.06160402298

t0 = time.time()

newList = [4294967295] * 134217728

print(time.time() - t0)

print(newList[134217727])                      # confirm list

[/code]

If, instead of 134,217,728 integers, I want something different, like
100,000,000, the method I use is:

[code]

#* * * * * * * * * * * * * * * * * * * * * *
# Not a power of 2, run time = 0.752086162567

t0 = time.time()

newArray = array.array('I')

tempArray = array.array('I')

tempArray.append(4294967295)

size = 100000000

while size:                                          # chew through
'size' until it's gone
    if (size & 1):                                    # if this bit of
'size' is 1
        newArray.extend(tempArray)    # add a copy of the temp array
    size >>= 1                                    # chew off one bit
    tempArray.extend(tempArray)       # double the size of the temp
array

print(time.time() - t0)

print(newArray[99999999])

#* * * * * * * * * * * * * * * * * * * * * *
# # Not a power of 2, list with "*", run time = 1.19271993637

t0 = time.time()

newList = [4294967295] * 100000000

print(time.time() - t0)

print(newList[99999999])

[/code]

I think it is interesting that the shorter list takes longer than the
one that is a power of 2 in length. I think this may say that the
"list = initializer * N" method uses something similar to the doubling
method.

Also, tempArray (above) gets reallocated frequently, and demonstrates
that reallocation is not a big problem.

Finally, I just looked into calling C functions, and found
PyMem_Malloc, PyMem_Realloc, PyMem_Free, etc. in the Memory Management
section of the Python/C API Reference Manual. This gives you
uninitialized memory, and should be really fast, but it's 6:45 AM
here, and I don't have the energy to try it.

Gil