Problem accessing a web page

Tim Chase python.list at tim.thechases.com
Mon Dec 15 15:55:59 EST 2008


> I'm able to grab the problem webpage via Python just fine, albeit with
> a bit of a delay. So, don't know what your exact problem is, maybe
> your connection?

When you get the second page, are you getting the same content 
back that you get if you do a search in your favorite browser?

Using just

   content = urllib.urlopen(url2).read()
   'Error' in content # True
   'Friedrich' in content # False

However, when you browse to the page, those two should be inverted:

   'Error' in content # False
   'Friedrich' in content # True

I've tried adding in the parameters correctly via post

   params = urllib.urlencode([
     ('params.forzaQuery', 'N'),
...
     ('layout', 'busquedaisbn'),
     ])
   content = urllib.urlopen(url2, data).read()

However, this too fails because the underlying engine expects a 
session ID in the URL.  I finally got it to work with the code below:

   import urllib

   data = [
     ('params.forzaQuery', 'N'),
     ('params.cdispo', 'A'),
     ('params.cisbnExt', '8484031128'),
     ('params.liConceptosExt[0].texto', ''),
     ('params.orderByFormId', '1'),
     ('action', 'Buscar'),
     ('language', 'es'),
     ('prev_layout', 'busquedaisbn'),
     ('layout', 'busquedaisbn'),
     ]

   params = urllib.urlencode(data)

   url = 
'http://www.mcu.es/webISBN/tituloSimpleDispatch.do;jsessionid=5E8D9A11E4A28BDF0BA6B254D0118262'

   fp = urllib.urlopen(url, params)
   content = fp.read()
   fp.close()


but I had to hard-code the jsessionid parameter in the URL.  This 
would have to be determined from the initial call & response of 
the initial URL (the initial URL returns a <FORM> element with 
the URL to POST to, including this magic jsessionid parameter).

Hope this helps nudge you (the OP) in the right direction to get 
what you're looking for.

-tkc









More information about the Python-list mailing list