Fast Data Comparison (dict v. list. v string)
george young
gry at ll.mit.edu
Fri Apr 16 11:08:57 EDT 2004
"Scott Brady Drummonds" <scott.b.drummonds.nospam at intel.com> wrote in message news:<c5k8rp$ql6$1 at news01.intel.com>...
> Hi, everyone,
>
> I'm a relative novice at Python and am working on some optimizations in my
> first Python project. At its core, this tool I'm writing needs to perform
> many comparisons to fixed-size lists of fixed-length strings.
>
> Currently, my implementation uses dictionaries to store each string. I
> maintain a separate "key-mapping" dictionary that maps keys from one
> dictionary to the other. Then, all I have to do to compare the two
> dictionaries is as follows:
> for metaKey in keyMap.keys():
> if dict1[metaKey] != dict2[keyMap[metaKey]]:
> # Do some processing
>
> Since the sizes of the dictionaries never change, I tried implementing this
> using lists. The solution looks something like this (and assumes that a
> pre-processing phase has sorted the contents of each list so their indexes
> are the same):
> for i in len(list1):
> if list1[i] != list2[i]:
> # Do some processing
>
> As it turns out, this implementation appears to be about 33% faster than the
> dictionary-based one. Now, assuming that the datum being stored at each
> index can fit into one character, I could do a string-based implementation
> like this:
> for i in len(string1):
> if string1[i] != string[i]:
> # Do some processing
>
> This last solution actually runs about the same as the dictionary, which
> takes 50% longer than the list implementation.
>
> Now, my questions are:
> 1) Does anyone have another suggestion as to how I can organize these data
> so that I can compare many elements many times?
> 2) Is there a string comparison operator that will return which indexes
> have different values? Maybe it would be faster than the iterative
> comparison approach for the third implementation.
> 3) Since my data are changing between the successive executions of the code
> snippets above, I need a way of having the other parts of the program update
> it. But, it appears that strings are constant as I can't assign individual
> characters with "string1[i] = '0'". Is there any way around this?
You might also take a look at psyco: http://psyco.sourceforge.net. It is
very easy to use, non-intrusive, and claims large performance improvements for
code like this. I have hopes that things like psyco can let us keep code
in the simplest and *clearest* form, and let external optimization mechanisms
make it fast enough when needed.
-- George Young
More information about the Python-list
mailing list