simple spider in python

Michael Bentley michael at jedimindworks.com
Thu Aug 23 16:48:50 EDT 2007


On Aug 23, 2007, at 6:33 AM, gmcalendar at gmail.com wrote:

> Hi everybody, i'm new to the forum so: hello everybody (should I say
> "world"?) ^_^
> I'm trying to do a simple spider in python which:
>
> 1) ask google a query
> 2) parse the data
>
> I'm a python newbie so *any* help would be very, very welcommed.
> Thanks in advice!

First thing to know is that google doesn't like the User-agent header  
urllib2 uses by default -- you'll have to masquerade as a browser  
(google throws a 403 error if you connect as 'User-Agent: Python- 
urllib/2.5': look into urllib2.build_opener()).  Second thing to know  
is that the interesting results have class attribute set to "l".

hope this helps,
Michael

---
Asking a person who he *is* ... is not Pythonic!  --Anton Vredegoor







More information about the Python-list mailing list