Fast list traversal

Sun Nov 2 04:13:43 EST 2008

On Sun, 02 Nov 2008 00:25:13 -0700, dineshv wrote:

> I want to see if there is an alternative method for fast list traversal.
>  The code is very simple:
> 
> dict_long_lists = defaultdict(list)
> for long_list in dict_long_lists.itervalues()
>         for element in long_list:
>                 array_a[element] = m + n + p                # m,n,p
> are variable numbers

It might help if you showed some sample data, because your explanation is 
confusing. You are asking about traversing lists, but your data is in a 
defaultdict. Why is it in a dict, when you don't seem to be using the key 
anywhere? Putting that aside, I'm going to make a guess and assume your 
data looks something like this...

dict_long_lists = {
    'key1': [0, 1, 2, 3, 4, 5],
    'key2': [16, 17, 18, 19],
    'key3': [7, 9, 11, 13, 15], 
    'key4': [6, 8, 10, 12, 14] }

Then you do something like this:

array_a = [None]*20

Then after running your code, you end up with:

array_a == [x0, x1, x2, x3, .... , x19]
where each x is calculated from m + n + p.

> The long_list's are read from a defaultdict(list) dictionary and so
> don't need initializing.  The elements of long_list are integers and
> ordered (sorted before placing in dictionary).  There are > 20,000
> long_list's each with a variable number of elements (>5,000).  The
> elements of long_list are immutable (ie. don't change).  The above code
> is within a def function.
> 
> I've tried set() using defaultdict(set) but the elements are not
> ordered.

It's not clear what you have tried to do with set(), or why the elements 
need to be ordered.

> What is the fastest way to traverse these long_list's sequentially from
> the beginning to the end?  Maybe there is another data structure that
> can be used instead of a list.

I doubt you'll find anything faster than a list.

You have > 20,000 lists of > 5,000 items each, which means you have a 
*minimum* of 100,000,000 items to traverse. If each iteration takes 0.1 
millisecond, not an unreasonably slow speed depending on the amount of 
computation each iteration is, that will take 10,000 seconds or nearly 
three hours. The only solutions to that are to reduce the amount of 
computation in each loop, reduce the number of items, or get a faster 
computer.

Have you tried running the profiler to see where the time is actually 
going? I suggest you write a small set of test data (say, 1000 items), 
and profile it, then write a longer set of test data (say, 100,000 
items), and if it's still not clear where the time is being lost, ask for 
help.

-- 
Steven