help make it faster please

Fri Nov 11 22:47:44 EST 2005

On 10 Nov 2005 10:43:04 -0800, bearophileHUGS at lycos.com wrote:

>This can be faster, it avoids doing the same things more times:
>
>from string import maketrans, ascii_lowercase, ascii_uppercase
>
>def create_words(afile):
>    stripper = """'[",;<>{}_&?!():[]\.=+-*\t\n\r^%0123456789/"""
>    mapper = maketrans(stripper + ascii_uppercase,
>                       " "*len(stripper) + ascii_lowercase)
good way to prepare for split

>    countDict = {}
>    for line in afile:
>        for w in line.translate(mapper).split():
>            if w:
I suspect it's not possible to get '' in the list from somestring.split()
>                if w in countDict:
>                    countDict[w] += 1
>                else:
>                    countDict[w] = 1
does that beat the try and get versions? I.e., (untested)
                 try: countDict[w] += 1
                 except KeyError: countDict[w] = 1
or
                 countDict[w] = countDict.get(w, 0) + 1

>    word_freq = countDict.items()
>    word_freq.sort()
>    for word, freq in word_freq:
>        print word, freq
>
>create_words(file("test.txt"))
>
>
>If you can load the whole file in memory then it can be made a little
>faster...
>
>Bear hugs,
>bearophile
>

Regards,
Bengt Richter