Sorting distionary by value
Brian McErlean
b_mcerlean at yahoo.com
Fri Mar 22 10:33:55 EST 2002
Artur Skura <arturs at iidea.pl> wrote in message news:<slrna9m5sc.ai9.arturs at aph.waw.pdi.net>...
> No, and it seems the problem is not with sorting.
> I wanted to write a compact word counting script (well, in shell
> it can be done in a 5 lines or so), just for fun.
>
> and
>
> import string,sys
>
>
> a = string.split(open(sys.argv[1],'r').read())
>
> known = []
> times = 0
> output = {}
>
> for i in a:
> if i not in known:
> for l in a:
> if l == i:
> times = times + 1
> known.append(i)
> output[i] = times
> times = 0
The first 3 lines are probably the major performance problem -
for every element in the list, you are scanning through the list of
words again
(Your algorithm is O(N**2) with regard to the number of words)
Instead, you can do it in one pass, storing the count for each word in
a dictionary( which has approx O(1) (constant time) access. Words you
haven't
counted yet get put into the dictionary with a count of 1, and
existing words
get their count incremented. The loop then becomes:
for word in a:
count = output.get(word, 0) # Get count of word, or 0 if not
present
output[word] = count + 1 # Increment count.
The output can remain the same:
> items = [ (output[k], k) for k in output ]
> items.sort()
> items.reverse()
> items = [ (k, v) for (v, k) in items ]
> for k in items:
> print k[0], k[1]
>
> it seems it's slow not because of sorting...
>
> Regards,
> Artur
Hope this helps,
Brian.
More information about the Python-list
mailing list