Optimizing a text statistics function
Scott David Daniels
Scott.Daniels at Acm.Org
Wed Apr 21 17:38:08 EDT 2004
Peter Otten wrote:
> Nickolay Kolev wrote:
Playing along, simply because it's fun.
> def main(filename):
> ...
#> words = file(filename).read().translate(tr).split()
#> histogram = {}
#> wordCount = len(words)
#> for word in words:
#> histogram[word] = histogram.get(word, 0) + 1
Better not to do several huge string allocs above (I suspect).
This method lets you to work on files too large to read into memory:
wordCount = 0
histogram = {}
for line in file(filename):
words = line.translate(tr).split()
wordCount += len(words)
for word in words:
histogram[word] = histogram.get(word, 0) + 1
> ...
- Scott David Daniels
Scott.Daniels at Acm.Org
More information about the Python-list
mailing list