anagram finder / dict mapping question

George Sakkis george.sakkis at gmail.com
Sat May 10 03:43:22 EDT 2008


On May 9, 11:19 pm, dave <squareswallo... at 1ya2hoo3.net> wrote:
> On 2008-05-09 18:53:19 -0600, George Sakkis <george.sak... at gmail.com> said:
>
>
>
> > On May 9, 5:19 pm, umpsu... at gmail.com wrote:
> >>>> What would be the best method to print the top results, the one's that
>
> >>>> had the highest amount of anagrams??  Create a new histogram dict?
>
> >>> You can use the max() function to find the biggest list of anagrams:
>
> >>> top_results = max(anagrams.itervalues(), key=len)
>
> >>> --
> >>> Arnaud
>
> >> That is the biggest list of anagrams, what if I wanted the 3 biggest
> >> lists?  Is there a way to specific top three w/ a max command??
>
> >>>> import heapq
> >>>> help(heapq.nlargest)
> > Help on function nlargest in module heapq:
>
> > nlargest(n, iterable, key=None)
> >     Find the n largest elements in a dataset.
>
> >     Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]
>
> > HTH,
> > George
>
> I both the 'nlargest' and the 'sorted' methods.. I could only get the
> sorted to work.. however it would only return values like (3,  edam)
> not the list of words..
>
> Here is what I have now.. Am I over-analyzing this?  It doesn't seem
> like it should be this hard to get the program to print the largest set
> of anagrams first...
>
> def anafind():
>         fin = open('text.txt')
>         mapdic = {}
>         finalres = []                   # top answers go here
>         for line in fin:
>                 line = line.strip()
>                 alphaword = ''.join(sorted(line))       #sorted word as key
>                 if alphaword not in mapdic:
>                         mapdic[alphaword] = [line]
>                 else:
>                         mapdic[alphaword].append(line)
>         topans = sorted((len(mapdic[key]), key) for key in mapdic.keys())[-3:]
>   #top 3 answers
>         for key, value in topans:       #
>                 finalres.append(mapdic[value])
>         print finalres

Here is a working, cleaned up version:

from heapq import nlargest
from collections import defaultdict

def anagrams(words, top=None):
    key2words = defaultdict(set)
    for word in words:
        word = word.strip()
        key = ''.join(sorted(word))
        key2words[key].add(word)
    if top is None:
        return sorted(key2words.itervalues(), key=len, reverse=True)
    else:
        return nlargest(top, key2words.itervalues(), key=len)

if __name__ == '__main__':
    wordlist = ['live', 'evil', 'one', 'nose', 'vile', 'neo']
    for words in anagrams(wordlist,2):
        print words


By the way, try to generalize your functions (and rest of the code for
that matter) so that it can be reused easily. For example, hardcoding
the input file name in the function's body is usually undesirable.
Similarly for other constants like 'get top 3 answers'; it doesn't
cost you anything to replace 3 with 'top' and pass it as an argument
(or set it as default top=3 if that's the default case).

HTH,
George



More information about the Python-list mailing list