using urllib2

Alexnb alexnbryan at gmail.com
Fri Jun 27 13:41:13 EDT 2008


Okay, I tried to follow that, and it is kinda hard. But since you obviously
know what you are doing, where did you learn this? Or where can I learn
this?


Maric Michaud wrote:
> 
> Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
>> I have never used the urllib or the urllib2. I really have looked online
>> for help on this issue, and mailing lists, but I can't figure out my
>> problem because people haven't been helping me, which is why I am here!
>> :].
>> Okay, so basically I want to be able to submit a word to dictionary.com
>> and
>> then get the definitions. However, to start off learning urllib2, I just
>> want to do a simple google search. Before you get mad, what I have found
>> on
>> urllib2 hasn't helped me. Anyway, How would you go about doing this. No,
>> I
>> did not post the html, but I mean if you want, right click on your
>> browser
>> and hit view source of the google homepage. Basically what I want to know
>> is how to submit the values(the search term) and then search for that
>> value. Heres what I know:
>>
>> import urllib2
>> response = urllib2.urlopen("http://www.google.com/")
>> html = response.read()
>> print html
>>
>> Now I know that all this does is print the source, but thats about all I
>> know. I know it may be a lot to ask to have someone show/help me, but I
>> really would appreciate it.
> 
> This example is for google, of course using pygoogle is easier in this
> case, 
> but this is a valid example for the general case :
> 
>>>>[207]: import urllib, urllib2
> 
> You need to trick the server with an imaginary User-Agent.
> 
>>>>[208]: def google_search(terms) :
>     return urllib2.urlopen(urllib2.Request("http://www.google.com/search?"
> +  
> urllib.urlencode({'hl':'fr', 'q':terms}),
>                                            headers={'User-Agent':'MyNav
> 1.0 
> (compatible; MSIE 6.0; Linux'})
>                           ).read()
>    .....:
> 
>>>>[212]: res = google_search("python & co")
> 
> Now you got the whole html response, you'll have to parse it to recover
> datas, 
> a quick & dirty try on google response page :
> 
>>>>[213]: import re
> 
>>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
class=r>.*?</h2>', 
> res) ]
> ...[229]:
> ['Python Gallery',
>  'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty ...',
>  'Re: os x, panther, python & co: msg#00041',
>  'Re: os x, panther, python & co: msg#00040',
>  'Cardiff Web Site Design, Professional web site design services ...',
>  'Python Properties',
>  'Frees < Programs < Python < Bin-Co',
>  'Torb: an interface between Tcl and CORBA',
>  'Royal Python Morphs',
>  'Python & Co']
> 
> 
> -- 
> _____________
> 
> Maric Michaud
> --
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: http://www.nabble.com/using-urllib2-tp18150669p18160312.html
Sent from the Python - python-list mailing list archive at Nabble.com.




More information about the Python-list mailing list