for in benchmark interested
Jeremy Hylton
jeremy at cnri.reston.va.us
Thu Apr 15 18:39:11 EDT 1999
Doh!
I guess you could read it all at once, which would be fine for a file
that's only 6MB or so. If you wanted correctness (how important is
that in a benchmark anyway?) and still want to read fixed-size chunks,
then you need to see if the buffer that is read ends in the middle of
a word or between words. If you add that checking, the code is a bit
more complex but still about 20% faster.
#!/usr/local/bin/python
import sys
import string
def run():
dict={}
dict_get = dict.get
read = sys.stdin.read
string_split = string.split
prev = ''
while 1:
buf = read(500000)
if buf:
parts = string_split(buf)
# did buffer start with whitespace?
if buf[0] == parts[0][0]:
parts[0] = prev + parts[0]
elif prev:
dict[prev] = dict_get(prev, 0) + 1
for key in parts[:-1]:
dict[key] = dict_get(key, 0) + 1
# buffer end with whitespace?
if buf[-1] != parts[-1][-1]:
key = parts[-1]
dict[key] = dict_get(key, 0) + 1
prev = ''
else:
prev = parts[-1]
else:
return dict
dict = run()
write = sys.stdout.write
for word in dict.keys():
write("%4d\t%s\n" % (dict[word], word))
Jeremy
More information about the Python-list
mailing list