Optimizing a text statistics function

Scott David Daniels Scott.Daniels at Acm.Org
Wed Apr 21 17:38:08 EDT 2004


Peter Otten wrote:
> Nickolay Kolev wrote:
Playing along, simply because it's fun.

> def main(filename):
>     ...

#>     words = file(filename).read().translate(tr).split()
#>     histogram = {}
#>     wordCount = len(words)
#>     for word in words:
#>         histogram[word] = histogram.get(word, 0) + 1

Better not to do several huge string allocs above (I suspect).
This method lets you to work on files too large to read into memory:

       wordCount = 0
       histogram = {}

       for line in file(filename):
           words = line.translate(tr).split()
           wordCount += len(words)
           for word in words:
               histogram[word] = histogram.get(word, 0) + 1

>     ...


- Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list