[Edu-sig] CP4E (continued)

Kirby Urner urnerk at qwest.net
Sat Apr 9 20:49:06 CEST 2005


Projected in front of class (teacher explaining her process for grabbing
cities and states, making a simple plaintext file for later reuse):

 >>> import urllib2
 >>> fo = urllib2.urlopen(
 "http://www.w3.org/2000/10/swap/test/dbork/data/USRegionState.daml")
 >>> fo  # <-- a file-like object
 <addinfourl at 21988896 whose fp = <socket._fileobject object at 
 0x00CA1490>>

 
 >>> for i in fo:  # grab the strings we'll need to parse
 	  if '<capital' in i:
		  allcapitals.append(i)

		
 >>> fo.close()

 >>> def getcities():  # snip off the fat
	   cities  = []
	   for city in allcapitals:
		 st = city.find("#")
		 fn = city.find('"/>')
		 cities.append(city[st+1:fn])
	   return cities

 >>> def getcitystate():  # separate city and state
	  citystate=[]
	  global cities
	  for e in cities:
		city = e[:-2]
		state = e[-2:]
	      citystate.append((city,state))
	  return citystate

 >>> cs = getcitystate()
 >>> cs
 [('montgomery', 'al'), ('juneau', 'ak'), ('phoenix', 'az'), ('littlerock',
'ar'), ('sacramento', 'ca'), ('denver', 'co'), ('hartford', 'ct'),
('washington', 'dc'), ('dover', 'de'), ('tallahassee', 'fl'), ('atlanta',
'ga'), ('honolulu', 'hi'), ('boise', 'id'), ('springfield', 'il'),
('indianapolis', 'in'), ('desmoines', 'ia'), ('topeka', 'ks'), ('frankfort',
'ky'), ('batonrouge', 'la'), ('augusta', 'me'), ('annapolis', 'md'),
('boston', 'ma'), ('lansing', 'mi'), ('stpaul', 'mn'), ('jackson', 'ms'),
('jeffersoncity', 'mo'), ('helena', 'mt'), ('lincoln', 'ne'), ('carsoncity',
'nv'), ('concord', 'nh'), ('trenton', 'nj'), ('santafe', 'nm'), ('albany',
'ny'), ('raleighdurham', 'nc'), ('bismarck', 'nd'), ('columbus', 'oh'),
('oklahomacity', 'ok'), ('salem', 'or'), ('harrisburg', 'pa'),
('providence', 'ri'), ('columbia', 'sc'), ('pierre', 'sd'), ('nashville',
'tn'), ('austin', 'tx'), ('saltlakecity', 'ut'), ('montpelier', 'vt'),
('richmond', 'va'), ('olympia', 'wa'), ('charleston', 'wv'), ('madison',
'wi'), ('cheyenne', 'wy')]

>>> 

etc. (still have to hand-space 'jefferson city' etc. once you get your
plaintext written out (hey, we humans have a role to play, why not?).

Actually, this whole exercise might be fun for the kids to mess with -- I'll
forward it to the police in Hillsboro as a lab activity (Red Hat 9 lab, West
Precinct).

We could show 'em XML parsing and regular expressions (alternative, more
sophisticated ways to suck strings) in later lessons.

Kirby




More information about the Edu-sig mailing list