Fast Data Comparison (dict v. list. v string)

Larry Bates lbates at swamisoft.com
Thu Apr 15 10:03:47 EDT 2004


Scott,

You should take a close look at list comprehensions
and possibly sets (new in V2.3).  I'm not all that
familiar with sets, but your problem sounds like
they might come in handy.

Example list comprehension:

list1notinlist2=[x for x in list1 if x not in list2]
for item in list1notinlist:
    # Do some processing

Note: In Python if you find yourself using [i]
indexes into lists, there is "almost" always a
better way (at least that is my experience).

Larry Bates
Syscon, Inc.

"Scott Brady Drummonds" <scott.b.drummonds.nospam at intel.com> wrote in
message news:c5k8rp$ql6$1 at news01.intel.com...
> Hi, everyone,
>
> I'm a relative novice at Python and am working on some optimizations in my
> first Python project.  At its core, this tool I'm writing needs to perform
> many comparisons to fixed-size lists of fixed-length strings.
>
> Currently, my implementation uses dictionaries to store each string.  I
> maintain a separate "key-mapping" dictionary that maps keys from one
> dictionary to the other.  Then, all I have to do to compare the two
> dictionaries is as follows:
>   for metaKey in keyMap.keys():
>     if dict1[metaKey] != dict2[keyMap[metaKey]]:
>       #  Do some processing
>
> Since the sizes of the dictionaries never change, I tried implementing
this
> using lists.  The solution looks something like this (and assumes that a
> pre-processing phase has sorted the contents of each list so their indexes
> are the same):
>   for i in len(list1):
>     if list1[i] != list2[i]:
>       #  Do some processing
>
> As it turns out, this implementation appears to be about 33% faster than
the
> dictionary-based one.  Now, assuming that the datum being stored at each
> index can fit into one character, I could do a string-based implementation
> like this:
>   for i in len(string1):
>     if string1[i] != string[i]:
>       #  Do some processing
>
> This last solution actually runs about the same as the dictionary, which
> takes 50% longer than the list implementation.
>
> Now, my questions are:
> 1)  Does anyone have another suggestion as to how I can organize these
data
> so that I can compare many elements many times?
> 2)  Is there a string comparison operator that will return which indexes
> have different values?  Maybe it would be faster than the iterative
> comparison approach for the third implementation.
> 3)  Since my data are changing between the successive executions of the
code
> snippets above, I need a way of having the other parts of the program
update
> it.  But, it appears that strings are constant as I can't assign
individual
> characters with "string1[i] = '0'".  Is there any way around this?
>
> Thanks!
> Scott
>
> -- 
> Remove ".nospam" from the user ID in my e-mail to reply via e-mail.
>
>





More information about the Python-list mailing list