Problem in reading a URL
scum
scumitchell at gmail.com
Fri Dec 8 12:16:54 EST 2006
This is not a python problem. That is the text of the site when you go
to it. The site uses cookies to store a session of your visit. Using
python bypasses that cookie and throws an error. You will be better
served using teh mechanize library.
http://wwwsearch.sourceforge.net/mechanize/
Even though you may not get far, since the site you are attempting to
search prevents robots from scouring their site.
>>> import mechanize
>>> from mechanize import Browser
>>> br= Browser()
>>> br.open("http://www.ncbi.nlm.nih.gov/entrez/")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/mechanize-0.1.4b-py2.4.egg/mechanize/_mechanize.py",
line 156, in open
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/mechanize-0.1.4b-py2.4.egg/mechanize/_mechanize.py",
line 214, in _mech_open
mechanize._response.httperror_seek_wrapper: HTTP Error 403: request
disallowed by robots.txt
More information about the Python-list
mailing list