[Tutor] Bigrams and nested dictionaries

Michael Broe mbroe at columbus.rr.com
Tue Apr 4 06:37:21 CEST 2006


Well coming up with this has made me really love Python. I worked on  
this with my online pythonpenpal Kyle, and here is what we came up  
with. Thanks to all for input so far.

My first idea was to use a C-type indexing for-loop, to grab a two- 
element sequence [i, i+1]:

dict = {}
for i in range(len(t) - 1):
	if not dict.has_key(t[i]):
		dict[t[i]] = {}
	if not dict[t[i]].has_key(t[i+1]):
		dict[t[i]][t[i+1]] = 1
	else:
		dict[t[i]][t[i+1]] += 1

Which works, but. Kyle had an alternative take, with no indexing, and  
after we worked on this strategy it seemed very Pythonesque, and ran  
almost twice as fast.

----

#!/usr/local/bin/python

import sys
file = open(sys.argv[1], 'rb').read()

# We imagine a 2-byte 'window' moving over the text from left to right
#
#          +-------+
# L  o  n  | d   o |  n  .  M  i  c  h  a  e  l  m  a  s      t  e   
r  m ...
#          +-------+
#
# At any given point, we call the leftmost byte visible in the window  
'L', and the
# rightmost byte 'R'.
#
#          +-----------+
# L  o  n  | L=d   R=o |  n  .  M  i  c  h  a  e  l  m  a  s      t   
e  r  m ...
#          +-----------+
#
# When the program begins, the first byte is preloaded into L, and we  
position R
# at the second byte of the file.
#

dict = {}

L = file[0]
for R in file[1:]:	# move right edge of window across the file
	if not L in dict:
		dict[L] = {}
		
	if not R in dict[L]:
		dict[L][R] = 1
	else:
		dict[L][R] += 1
		
	L = R	        # move character in R over to L

# that's it. here's a printout strategy:
	
for entry in dict:
	print entry, ':', sum(dict[entry].values())
	print dict[entry]
	print

----






More information about the Tutor mailing list