[Tutor] Finding the shortest word in a list of words

Tue Jan 20 16:33:40 CET 2009

"Finding the shortest word among a list of words" sounds like something of a
trick question to me.  I think a more complete problem statement would be
"Find the list of words that are the shortest", since there is no guarantee
that the list does not contain two words of the same shortest length.  If
you just add "me" to your sample set, you now get misleading answers:

words = "man woman children he me".split()
print min(words, key=len)

prints:
he

What happened to "me"?  It is just as short as "he"!

To get *all* the words that are the shortest length, I'll show two
approaches.  The first uses a defaultdict, from the collections module of
the Python stdlib.  With a defaultdict, I can have new dict entries get
initialized with a default value using a specified factory function, without
first having to check to see if that entry exists.  I would like to create a
dict of lists of words, keyed by the lengths of the words.  Something that
would give me this dict:

{ 2 : ['he', 'me'], 3 : ['man'], ... etc. }

from which I could then find the minimum length using
min(wordlendict.keys()).  By using a defaultdict, I can just build things up
by iterating over the list and adding each word to the entry for the current
word's length - if the current word is the first one for this length to be
found, the defaultdict will initialize the value to an empty list for me.
This allows me to safely append the current word regardless of whether there
was an existing entry or not.  Here is the code that does this:

from collections import defaultdict
wordlendict = defaultdict(list)
for w in words: 
    wordlendict[len(w)].append(w)
minlen = min(wordlendict.keys())
minlist = wordlendict[minlen]
print minlist

prints:
['he', 'me']

Now we are getting a more complete answer!

A second approach uses the groupby method of the itertools module in the
stdlib.  groupby usually takes me several attempts to get everything right:
the input must be sorted by the grouping feature, and then the results
returned by groupby are in the form of an iterator that returns key-iterator
tuples.  I need to go through some mental gymnastics to unwind the data to
get the group I'm really interested in.  Here is a step-by-step code to use
groupby:

from itertools import groupby
grpsbylen = groupby(sorted(words,key=len),len)
mingrp = grpsbylen.next()
minlen = mingrp[0]
minlist = list(mingrp[1])
print minlist

The input list of words gets sorted, and then passed to groupby, which uses
len as the grouping criteria (groupby is not inherently aware of how the
input list was sorted, so len must be repeated).  The first group tuple is
pulled from the grpsbylen iterator using next().  The 0'th tuple element is
the grouping value - in this case, it is the minimum length 2.  The 1'th
tuple element is the group itself, given as an iterator.  Passing this to
the list constructor gives us the list of all the 2-character words, which
then gets printed:
['he', 'me']

Again, 'me' no longer gets left out.

Maybe this will get you some extra credit...

-- Paul