Fast Data Comparison (dict v. list. v string)

george young gry at ll.mit.edu
Fri Apr 16 11:08:57 EDT 2004


"Scott Brady Drummonds" <scott.b.drummonds.nospam at intel.com> wrote in message news:<c5k8rp$ql6$1 at news01.intel.com>...
> Hi, everyone,
> 
> I'm a relative novice at Python and am working on some optimizations in my
> first Python project.  At its core, this tool I'm writing needs to perform
> many comparisons to fixed-size lists of fixed-length strings.
> 
> Currently, my implementation uses dictionaries to store each string.  I
> maintain a separate "key-mapping" dictionary that maps keys from one
> dictionary to the other.  Then, all I have to do to compare the two
> dictionaries is as follows:
>   for metaKey in keyMap.keys():
>     if dict1[metaKey] != dict2[keyMap[metaKey]]:
>       #  Do some processing
> 
> Since the sizes of the dictionaries never change, I tried implementing this
> using lists.  The solution looks something like this (and assumes that a
> pre-processing phase has sorted the contents of each list so their indexes
> are the same):
>   for i in len(list1):
>     if list1[i] != list2[i]:
>       #  Do some processing
> 
> As it turns out, this implementation appears to be about 33% faster than the
> dictionary-based one.  Now, assuming that the datum being stored at each
> index can fit into one character, I could do a string-based implementation
> like this:
>   for i in len(string1):
>     if string1[i] != string[i]:
>       #  Do some processing
> 
> This last solution actually runs about the same as the dictionary, which
> takes 50% longer than the list implementation.
> 
> Now, my questions are:
> 1)  Does anyone have another suggestion as to how I can organize these data
> so that I can compare many elements many times?
> 2)  Is there a string comparison operator that will return which indexes
> have different values?  Maybe it would be faster than the iterative
> comparison approach for the third implementation.
> 3)  Since my data are changing between the successive executions of the code
> snippets above, I need a way of having the other parts of the program update
> it.  But, it appears that strings are constant as I can't assign individual
> characters with "string1[i] = '0'".  Is there any way around this?

You might also take a look at psyco: http://psyco.sourceforge.net.  It is
very easy to use, non-intrusive, and claims large performance improvements for
code like this.  I have hopes that things like psyco can let us keep code
in the simplest and *clearest* form, and let external optimization mechanisms
make it fast enough when needed.

-- George Young



More information about the Python-list mailing list