get google scholar using python

Nick Cash nick.cash at npcinternational.com
Mon Oct 1 12:51:24 EDT 2012


> urllib2.urlopen('http://scholar.google.co.uk/scholar?q=albert
>...
> urllib2.HTTPError: HTTP Error 403: Forbidden
> >>>
> 
> Will you kindly explain me the way to get rid of this?

Looks like Google blocks non-browser user agents from retrieving this query. You *could* work around it by setting the User-Agent header to something fake that looks browser-ish, but you're almost certainly breaking Google's TOS if you do so.

Should you really really want to, urllib2 makes it easy:
urllib2.urlopen(urllib2.Request("http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905&btnG=&hl=en&as_sdt=0%2C5&as_sdtp=", headers={"User-Agent":"Mozilla/5.0 Cheater/1.0"}))

-Nick Cash


More information about the Python-list mailing list