[Tutor] Bigrams and nested dictionaries
Michael Broe
mbroe at columbus.rr.com
Tue Apr 4 06:37:21 CEST 2006
Well coming up with this has made me really love Python. I worked on
this with my online pythonpenpal Kyle, and here is what we came up
with. Thanks to all for input so far.
My first idea was to use a C-type indexing for-loop, to grab a two-
element sequence [i, i+1]:
dict = {}
for i in range(len(t) - 1):
if not dict.has_key(t[i]):
dict[t[i]] = {}
if not dict[t[i]].has_key(t[i+1]):
dict[t[i]][t[i+1]] = 1
else:
dict[t[i]][t[i+1]] += 1
Which works, but. Kyle had an alternative take, with no indexing, and
after we worked on this strategy it seemed very Pythonesque, and ran
almost twice as fast.
----
#!/usr/local/bin/python
import sys
file = open(sys.argv[1], 'rb').read()
# We imagine a 2-byte 'window' moving over the text from left to right
#
# +-------+
# L o n | d o | n . M i c h a e l m a s t e
r m ...
# +-------+
#
# At any given point, we call the leftmost byte visible in the window
'L', and the
# rightmost byte 'R'.
#
# +-----------+
# L o n | L=d R=o | n . M i c h a e l m a s t
e r m ...
# +-----------+
#
# When the program begins, the first byte is preloaded into L, and we
position R
# at the second byte of the file.
#
dict = {}
L = file[0]
for R in file[1:]: # move right edge of window across the file
if not L in dict:
dict[L] = {}
if not R in dict[L]:
dict[L][R] = 1
else:
dict[L][R] += 1
L = R # move character in R over to L
# that's it. here's a printout strategy:
for entry in dict:
print entry, ':', sum(dict[entry].values())
print dict[entry]
print
----
More information about the Tutor
mailing list