for in benchmark interested

Jeremy Hylton jeremy at cnri.reston.va.us
Thu Apr 15 15:25:37 EDT 1999


The Python version would be faster if you used sys.stdin.read instead
of sys.stdin.readlines.  I'm not sure why you need to split the input
into lines before you split it into words; it seems like an
unnecessary step.

The version below is 25% faster on my machine than your fastest Python 
version.  (And I'm not even an expert Python optimizer :-).

import sys
import string


def run():
        dict={}
        dict_get = dict.get
        read = sys.stdin.read
        string_split = string.split
        while 1:
		buf = read(500000)
                if buf:
			for key in string_split(buf):
				dict[key] = dict_get(key, 0) + 1
                else:
                        return dict


dict = run()
write = sys.stdout.write
for word in dict.keys():
	write("%4d\t%s\n" % (dict[word], word))


Jeremy




More information about the Python-list mailing list