[Tutor] Faster procedure to filter two lists . Please help

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Sat Jan 15 00:51:33 CET 2005



On Fri, 14 Jan 2005, kumar s wrote:

> >>>for i in range(len(what)):
> 	ele = split(what[i],'\t')
> 	cor1 = ele[0]
> 	for k in range(len(my_report)):
> 		cols = split(my_report[k],'\t')
> 		cor = cols[0]
> 		if cor1 == cor:
> 			print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]



Hi Kumar,


Ok, this calls for the use of an "associative map" or "dictionary".


The main time sink is the loop here:

> 	for k in range(len(my_report)):
> 		cols = split(my_report[k],'\t')
> 		cor = cols[0]
> 		if cor1 == cor:
> 			print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]

Conceptually, my_report can be considered a list of key/value pairs.  For
each element in 'my_report', the "key" is the first column (cols[0]), and
the "value" is the rest of the columns (cols[1:]).


The loop above can, in a pessimistic world, require a search across the
whole of 'my_report'.  This can take time that is proportional to the
length of 'my_report'.  You mentioned earlier that each list might be of
length 249502, so we're looking into a process whose overall cost is
gigantic.


[Notes on calculating runtime cost: when the structure of the code looks
like:

    for element1 in list1:
        for element2 in list2:
            some_operation_that_costs_K_time()

then the overall cost of running this loop will be

    K * len(list1) * len(list2)
]


We can do much better than this if we use a "dictionary" data structure. A
"dictionary" can reduce the time it takes to do a lookup search down from
a linear-time operation to an atomic-time one.  Do you know about
dictionaries yet?  You can take a look at:

    http://www.ibiblio.org/obp/thinkCSpy/chap10.htm

which will give an overview of a dictionary.  It doesn't explain why
dictionary lookup is fast, but we can talk about that later if you want.


Please feel free to ask any questions about dictionaries and their use.
Learning how to use a dictionary data structure is a skill that pays back
extraordinarily well.


Good luck!



More information about the Tutor mailing list