Problem in reading a URL

scum scumitchell at gmail.com
Fri Dec 8 12:16:54 EST 2006


This is not a python problem.  That is the text of the site when you go
to it.  The site uses cookies to store a session of your visit.  Using
python bypasses that cookie and throws an error.  You will be better
served using teh mechanize library.

http://wwwsearch.sourceforge.net/mechanize/

Even though you may not get far, since the site you are attempting to
search prevents robots from scouring their site.

>>> import mechanize
>>> from mechanize import Browser
>>> br= Browser()
>>> br.open("http://www.ncbi.nlm.nih.gov/entrez/")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/mechanize-0.1.4b-py2.4.egg/mechanize/_mechanize.py",
line 156, in open
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/mechanize-0.1.4b-py2.4.egg/mechanize/_mechanize.py",
line 214, in _mech_open
mechanize._response.httperror_seek_wrapper: HTTP Error 403: request
disallowed by robots.txt




More information about the Python-list mailing list