using urllib2

Alexnb alexnbryan at gmail.com
Fri Jun 27 19:36:57 EDT 2008


I have read that multiple times. It is hard to understand but it did help a
little. But I found a bit of a work-around for now which is not what I
ultimately want. However, even when I can get to the page I want lets say,
"Http://dictionary.reference.com/browse/cheese", I look on firebug, and
extension and see the definition in javascript, 

<table class="luna-Ent">
<tbody>
<tr>
<td class="dn" valign="top">1.</td>
<td valign="top">the curd of milk separated from the whey and prepared in
many ways as a food. </td>

the problem being that if I use code like this to get the html of that page
in python:

response = urllib2.urlopen("the webiste....")
html = response.read()
print html

then, I get a bunch of stuff, but it doesn't show me the code with the table
that the definition is in. So I am asking how do I access this javascript.
Also, if someone could point me to a better reference than the last one,
because that really doesn't tell me much, whether it be a book or anything.

Jeff McNeil-2 wrote:
> 
> I stumbled across this a while back:
> http://www.voidspace.org.uk/python/articles/urllib2.shtml.
> It covers quite a bit. The urllib2 module is pretty straightforward
> once you've used it a few times.  Some of the class naming and whatnot
> takes a bit of getting used to (I found that to be the most confusing
> bit).
> 
> On Jun 27, 1:41 pm, Alexnb <alexnbr... at gmail.com> wrote:
>> Okay, I tried to follow that, and it is kinda hard. But since you
>> obviously
>> know what you are doing, where did you learn this? Or where can I learn
>> this?
>>
>>
>>
>>
>>
>> Maric Michaud wrote:
>>
>> > Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
>> >> I have never used the urllib or the urllib2. I really have looked
>> online
>> >> for help on this issue, and mailing lists, but I can't figure out my
>> >> problem because people haven't been helping me, which is why I am
>> here!
>> >> :].
>> >> Okay, so basically I want to be able to submit a word to
>> dictionary.com
>> >> and
>> >> then get the definitions. However, to start off learning urllib2, I
>> just
>> >> want to do a simple google search. Before you get mad, what I have
>> found
>> >> on
>> >> urllib2 hasn't helped me. Anyway, How would you go about doing this.
>> No,
>> >> I
>> >> did not post the html, but I mean if you want, right click on your
>> >> browser
>> >> and hit view source of the google homepage. Basically what I want to
>> know
>> >> is how to submit the values(the search term) and then search for that
>> >> value. Heres what I know:
>>
>> >> import urllib2
>> >> response = urllib2.urlopen("http://www.google.com/")
>> >> html = response.read()
>> >> print html
>>
>> >> Now I know that all this does is print the source, but thats about all
>> I
>> >> know. I know it may be a lot to ask to have someone show/help me, but
>> I
>> >> really would appreciate it.
>>
>> > This example is for google, of course using pygoogle is easier in this
>> > case,
>> > but this is a valid example for the general case :
>>
>> >>>>[207]: import urllib, urllib2
>>
>> > You need to trick the server with an imaginary User-Agent.
>>
>> >>>>[208]: def google_search(terms) :
>> >     return
>> urllib2.urlopen(urllib2.Request("http://www.google.com/search?"
>> > +
>> > urllib.urlencode({'hl':'fr', 'q':terms}),
>> >                                            headers={'User-Agent':'MyNav
>> > 1.0
>> > (compatible; MSIE 6.0; Linux'})
>> >                           ).read()
>> >    .....:
>>
>> >>>>[212]: res = google_search("python & co")
>>
>> > Now you got the whole html response, you'll have to parse it to recover
>> > datas,
>> > a quick & dirty try on google response page :
>>
>> >>>>[213]: import re
>>
>> >>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
>> class=r>.*?</h2>',
>> > res) ]
>> > ...[229]:
>> > ['Python Gallery',
>> >  'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty
>> ...',
>> >  'Re: os x, panther, python & co: msg#00041',
>> >  'Re: os x, panther, python & co: msg#00040',
>> >  'Cardiff Web Site Design, Professional web site design services ...',
>> >  'Python Properties',
>> >  'Frees < Programs < Python < Bin-Co',
>> >  'Torb: an interface between Tcl and CORBA',
>> >  'Royal Python Morphs',
>> >  'Python & Co']
>>
>> > --
>> > _____________
>>
>> > Maric Michaud
>> > --
>> >http://mail.python.org/mailman/listinfo/python-list
>>
>> --
>> View this message in
>> context:http://www.nabble.com/using-urllib2-tp18150669p18160312.html
>> Sent from the Python - python-list mailing list archive at Nabble.com.
> 
> 
> 
> --
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: http://www.nabble.com/using-urllib2-tp18150669p18165692.html
Sent from the Python - python-list mailing list archive at Nabble.com.




More information about the Python-list mailing list