urllib question. (asseumbling an HTTP post request)

Omri Schwarz ocscwar at h-after-ocsc.mit.edu
Tue Mar 26 15:08:50 EST 2002


Hi, all,

I'm trying to parse an HTML form in Python,
assemble a POST and send it off. Right now, here is
what the script I have does: 

# a regex to gather the inputs: 
findinput = re.compile('\<input type="(.*?)" *name="(.*?)" *value="(.*?)" *\>|\<input *type="(.*?)" *name="(.*?)" *(.*?)\>')
# then later on: 
	c = urllib.urlopen(b).read()
	subject = findsubject.search(c)
	if subject:
		print "Is this spam?\n"
		print subject.groups()[0]
		if (raw_input()[0] =='y') : 
# here's the part that may be going wrong:
			for cb in findinput.findall(c):
				print cb
				if cb[3] == 'checkbox':
					if cb[5] == 'checked' :
						poststring[cb[4]]= 'on'
				if cb[0] == 'hidden': 
					poststring[cb[1]]=cb[2]
			d = urllib.urlencode(poststring)
			print d
			res = urllib.urlopen('http://www.spamcop.net/sc',d).read()

The form has hidden inputs, checkboxes, 
and the submit button. 

When I use a browser (Netscape or Lynx),
everything is fine. When I try to use this
script, I get server side errors that are
no help to me.

So, has anyone already written a form parser
for this purpose? Can anyone think of other problems
that might be happening?

Thanks, in advance.
-- 
Omri Schwarz --- ocscwar at mit.edu ('h' before war) 
Timeless wisdom of biomedical engineering: "Noise is principally
due to the presence of the patient." -- R.F. Farr




More information about the Python-list mailing list