A small proposed change to dictionaries' "get" method

Gareth McCaughan Gareth.McCaughan at pobox.com
Wed Aug 2 18:26:55 EDT 2000


Consider the following piece of code, which takes a file
and prepares a concordance saying on which lines each word
in the file appears. (For real use it would need to be
made more sophisticated.)

    word2lines = {}

    line_number = 0
    for line in open(filename).readlines():
      line_number = line_number+1
      for word in map(string.lower, string.split(line)):
        existing_lines = word2lines.get(word, [])   |
        existing_lines.append(line_number)          | ugh!
        word2lines[word] = existing_lines           |

    words = word2lines.keys()
    words.sort()
    for word in words:
      print "word\t%s", word, string.join(map(str, word2lines[word]), " ")

All very simple and elegant. There's just one wart: the
three lines I've labelled "ugh!" seem much more verbose
and inefficient than they should be. We have to search
word2lines twice for "word" (once to see if it's there,
once to insert); we reassign even when there's no need
to; a simple operation is taking three lines; we've had
to introduce a new variable. Yuck. (The efficiency issues
are probably very minor in practice, but *gratuitous*
inefficiencies are almost always painful to behold.)

This (update a mutable object that's a value in a dictionary,
initialising it first if appropriate) is a common idiom, at
least in my programs. It's a shame that it should be so ugly.

                          *   *   *

I suggest a minor change: another optional argument to
"get" so that

    dict.get(item,default,flag)

is equivalent to

    if dict.has_key(item):
      VALUE IS dict[item]
    else:
      if flag: dict[item] = default    <-- This is all that's new
      VALUE IS default

but presumably more efficient. This would enable those
painful three lines to be written as

    word2lines.get(word, [], 1).append(line)

(Ahhhh, much better!)

The same issue arises, of course, if it's some other kind
of mutable thing that's stored in the dictionary. For instance,
if you didn't want to see duplicated line numbers, you could
use a dictionary of dictionaries, and instead of

    existing_lines = word2lines.get(word, {})
    existing_lines[line_number] = 1
    word2lines[word] = existing_lines

we'd write

    word2lines.get(word, {}, 1)[line_number] = 1


                          *   *   *

Comments?

-- 
Gareth McCaughan  Gareth.McCaughan at pobox.com
sig under construction



More information about the Python-list mailing list