[Mechanize.ClientForm] double reading from urllib2.urlopen

Wed Jan 7 14:12:15 EST 2009

tiktak.hodiki at gmail.com wrote:
> Hello, folks!
> I use mechanize.clientform to parse HTML-forms. I preliminary check
> response and call response.read().find("..."). But when it's taken to
> ClientForm.ParseResponse, it can't parse because of response.read() is
> zero-length text. The problem is that ClientForm.ParseResponse is not
> taken text of response, only object.
> 
> Example:
> 
> import urllib
> from ClientForm import ParseResponse
> response = urllib.urlopen("http://yandex.ru")
> if -1 != response.read().find("foobar"):
>     pass
> form = ParseResponse(response)[1] <-- there is exception IndexError
> 
It might be that read() is consuming the data, so there's none remaining 
for the second read(). Try:

response = urllib.urlopen("http://yandex.ru")
text = response.read()
if "foobar" in text: # preferred to find()
     pass
form = ParseResponse(text)[1]