Questiion on list

Quinn Dunkan quinn at retch.ugcs.caltech.edu
Thu Jun 27 13:51:47 EDT 2002


On Thu, 27 Jun 2002 15:32:28 +0000, SiverFish <occeanlinux at linuxmail.org> wrote:
>On Thu, 27 Jun 2002 05:11:06 +0000, Quinn Dunkan wrote:
>
>> On Thu, 27 Jun 2002 14:36:03 +0000, SiverFish
>> <occeanlinux at linuxmail.org> wrote:
>>>I got the list of words now i want to calculate how many word in that
>>>list which is the same word (insensitive case),can anyone show me how to
>>>do it for example ls = ['abc','cd','abc','adf',abc','dwc','cd'] it will
>>>show abc = 3 ,cd = 2, adf = 1, dwc=1
>> 
>> import string
>> d = {}
>> for e in map(string.lower, ls):
>>     d[e] = d.get(e, 0) + 1
>> freqs = [ (count, w) for w, count in d.items() ] freqs.sort()
>> freqs.reverse()
>> print ', '.join([ '%s = %d' %(count, w) for count, w in freqs ])
>
>Could you  do it in python 1.5.2 and explain clearly to me please 

The English version is "use a dictionary".  Go through all the words and put
each one in the dictionary with the value 1.  If the word is already in the
dictionary, increment the value.  When you are done, you have a dict mapping
words to frequency of occurrance.

>How about we have to do the same problem here when read in the multiple
>files,Thank a lot

The words can come from anywhere.  Just write a function that takes a list
of words and returns a histogram dict as above.  Then all you need to do is
convert your list of files into a list of words.

If you have large files and are concerned about memory usage, you could
upgrade to 2.2 and investigate iterators, and pass an appropriate iterator
to the function, or stay with 1.5.2 and rewrite the function to modify a
dict, and then pass in smaller lists consecutively.  Or just lose the function
and inline the whole thing.  Up to you.

d = {}
for fp in map(open, files):
    for line in fp.readlines():
        for w in line.split():
            d[w] = d.get(w, 0) + 1
    fp.close()

Of course, if you have to worry about punctuation and hyphenation, things get
more complicated...



More information about the Python-list mailing list