Anyway to clarify this code? (dictionaries)

Mike Meyer mwm at mired.org
Tue Nov 22 22:17:08 EST 2005


"javuchi" <javuchi at gmail.com> writes:

> I've been searching thru the library documentation, and this is the
> best code I can produce for this alogorithm:
>
> I'd like to return a dictionary which is a copy of 'another' dictionary
> whoes values are bigger than 'x' and has the keys 'keys':
>
> def my_search (another, keys, x):
>         temp = another.fromkeys(keys)
>         return dict([[k,v] for k in temp.keys() for v in temp.values()
> if v>=x])
>
> Is there any way to improve this code?

Lots of them. Let's start by pointing out two bugs:

You're creating a list that's the "cross product" of keys and values,
then handing that to dict. You're handing it a list with entries for
the same key. That behavior may be defined, but I wouldn't count on
it.

fromkeys sets the values in temp to the same value - None. So
temp.values() is a list of None's, so v is None every time through the
loop.

So you could do what that code does with:

def my_search(another, keys, x):
   if None >= x:
      return another.fromkeys(keys)
   else:
      return dict()
        

You probably want something like:

def my_search(another, keys, x):
    return dict([[k,another[k]] for k in keys if another[k] >= x]

If you really want to avoid indexing another twice, you could do:

def my_search(another, keys, x):
   return dict([[k, v] for k, v in another.items() if v >= x and k in keys])

But then you're looking through all the keys in another, and searching
through keys multiple times, which probably adds up to a lot more
wasted work than indexing another twice.

> I want to avoid converting the dictionary to a list and then to a
> dictionay. Are there speed penalties for such a conversion?

No. But it does take time to do the conversion. I think I'd write it
out "longhand":

def my_search(another, keys, x):
     new = dict()
     for k in keys:
         if another[k] >= x:
            new[k] = another[k]
     return new

This makes it clear that you only index another twice if you actually
use it. The list comprehension will do the loop in C, but it means you
have to scan two lists instead of one. If you're worried about which
is faster, measure it on your target platform.

        <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list