Sorting distionary by value

Brian McErlean b_mcerlean at yahoo.com
Fri Mar 22 10:33:55 EST 2002


Artur Skura <arturs at iidea.pl> wrote in message news:<slrna9m5sc.ai9.arturs at aph.waw.pdi.net>...
> No, and it seems the problem is not with sorting.
> I wanted to write a compact word counting script (well, in shell
> it can be done in a 5 lines or so), just  for fun.
> 
> and
> 
> import string,sys
> 
> 
> a = string.split(open(sys.argv[1],'r').read())
> 
> known = []
> times = 0
> output = {}
> 
> for i in a:
>     if i not in known:
>         for l in a:
>             if l == i:
>                 times = times + 1
>         known.append(i)
>         output[i] = times
>     times = 0
 

The first 3 lines are probably the major performance problem - 
for every element in the list, you are scanning through the list of
words again
(Your algorithm is O(N**2) with regard to the number of words)

Instead, you can do it in one pass, storing the count for each word in
a dictionary( which has approx O(1) (constant time) access.  Words you
haven't
counted yet get put into the dictionary with a count of 1, and
existing words
get their count incremented.  The loop then becomes:

for word in a:
    count = output.get(word, 0) # Get count of word, or 0 if not
present
    output[word] = count + 1    # Increment count.

The output can remain the same:

> items = [ (output[k], k) for k in output ]
> items.sort()
> items.reverse()
> items = [ (k, v) for (v, k) in items ]
> for k in items:
>     print k[0], k[1]
> 
> it seems it's slow not because of sorting...
> 
> Regards,
> Artur

Hope this helps,
Brian.



More information about the Python-list mailing list