using urllib2

Alexnb alexnbryan at gmail.com
Fri Jun 27 22:26:15 EDT 2008


Okay, so I copied your code(and just so you know I am on a mac right now and
i am using pydev in eclipse), and I got these errors, any idea what is up?

Traceback (most recent call last):
  File "/Users/Alex/Documents/workspace/beautifulSoup/src/firstExample.py",
line 14, in <module>
    print list(get_defs("cheese")) 
  File "/Users/Alex/Documents/workspace/beautifulSoup/src/firstExample.py",
line 9, in get_defs
    dictionary.reference.com/search?q=%s' % term))
  File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.py",
line 82, in urlopen
    return opener.open(url)
  File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.py",
line 190, in open
    return getattr(self, name)(url)
  File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.py",
line 325, in open_http
    h.endheaders()
  File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/httplib.py",
line 856, in endheaders
    self._send_output()
  File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/httplib.py",
line 728, in _send_output
    self.send(msg)
  File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/httplib.py",
line 695, in send
    self.connect()
  File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/httplib.py",
line 663, in connect
    socket.SOCK_STREAM):
IOError: [Errno socket error] (8, 'nodename nor servname provided, or not
known')

Sorry if it is hard to read. 


Jeff McNeil-2 wrote:
> 
> Well, what about pulling that data out using Beautiful soup? If you
> know the table name and whatnot, try something like this:
> 
> #!/usr/bin/python
> 
> import urllib
> from BeautifulSoup import BeautifulSoup
> 
> 
> def get_defs(term):
>     soup = BeautifulSoup(urllib.urlopen('http://
> dictionary.reference.com/search?q=%s' % term))
> 
>     for tabs in soup.findAll('table', {'class': 'luna-Ent'}):
>         yield tabs.findAll('td')[-1].contents[-1].string
> 
> print list(get_defs("frog"))
> 
> jeff at martian:~$ python test.py
> [u'any tailless, stout-bodied amphibian of the order Anura, including
> the smooth, moist-skinned frog species that live in a damp or
> semiaquatic habitat and the warty, drier-skinned toad species that are
> mostly terrestrial as adults. ', u' ', u' ', u'a French person or a
> person of French descent. ', u'a small holder made of heavy material,
> placed in a bowl or vase to hold flower stems in position. ', u'a
> recessed panel on one of the larger faces of a brick or the like. ',
> u' ', u'to hunt and catch frogs. ', u'French or Frenchlike. ', u'an
> ornamental fastening for the front of a coat, consisting of a button
> and a loop through which it passes. ', u'a sheath suspended from a
> belt and supporting a scabbard. ', u'a device at the intersection of
> two tracks to permit the wheels and flanges on one track to cross or
> branch from the other. ', u'a triangular mass of elastic, horny
> substance in the middle of the sole of the foot of a horse or related
> animal. ']
> 
> HTH,
> 
> Jeff
> 
> On Jun 27, 7:28 pm, Alexnb <alexnbr... at gmail.com> wrote:
>> I have read that multiple times. It is hard to understand but it did help
>> a
>> little. But I found a bit of a work-around for now which is not what I
>> ultimately want. However, even when I can get to the page I want lets
>> say,
>> "Http://dictionary.reference.com/browse/cheese", I look on firebug, and
>> extension and see the definition in javascript,
>>
>> <table class="luna-Ent">
>> <tbody>
>> <tr>
>> <td class="dn" valign="top">1.</td>
>> <td valign="top">the curd of milk separated from the whey and prepared in
>> many ways as a food. </td>
>>
>>
>>
>> Jeff McNeil-2 wrote:
>>
>> > the problem being that if I use code like this to get the html of that
>> > page in python:
>>
>> > response = urllib2.urlopen("the webiste....")
>> > html = response.read()
>> > print html
>>
>> > then, I get a bunch of stuff, but it doesn't show me the code with the
>> > table that the definition is in. So I am asking how do I access this
>> > javascript. Also, if someone could point me to a better reference than
>> the
>> > last one, because that really doesn't tell me much, whether it be a
>> book
>> > or anything.
>>
>> > I stumbled across this a while back:
>> >http://www.voidspace.org.uk/python/articles/urllib2.shtml.
>> > It covers quite a bit. The urllib2 module is pretty straightforward
>> > once you've used it a few times.  Some of the class naming and whatnot
>> > takes a bit of getting used to (I found that to be the most confusing
>> > bit).
>>
>> > On Jun 27, 1:41 pm, Alexnb <alexnbr... at gmail.com> wrote:
>> >> Okay, I tried to follow that, and it is kinda hard. But since you
>> >> obviously
>> >> know what you are doing, where did you learn this? Or where can I
>> learn
>> >> this?
>>
>> >> Maric Michaud wrote:
>>
>> >> > Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
>> >> >> I have never used the urllib or the urllib2. I really have looked
>> >> online
>> >> >> for help on this issue, and mailing lists, but I can't figure out
>> my
>> >> >> problem because people haven't been helping me, which is why I am
>> >> here!
>> >> >> :].
>> >> >> Okay, so basically I want to be able to submit a word to
>> >> dictionary.com
>> >> >> and
>> >> >> then get the definitions. However, to start off learning urllib2, I
>> >> just
>> >> >> want to do a simple google search. Before you get mad, what I have
>> >> found
>> >> >> on
>> >> >> urllib2 hasn't helped me. Anyway, How would you go about doing
>> this.
>> >> No,
>> >> >> I
>> >> >> did not post the html, but I mean if you want, right click on your
>> >> >> browser
>> >> >> and hit view source of the google homepage. Basically what I want
>> to
>> >> know
>> >> >> is how to submit the values(the search term) and then search for
>> that
>> >> >> value. Heres what I know:
>>
>> >> >> import urllib2
>> >> >> response = urllib2.urlopen("http://www.google.com/")
>> >> >> html = response.read()
>> >> >> print html
>>
>> >> >> Now I know that all this does is print the source, but thats about
>> all
>> >> I
>> >> >> know. I know it may be a lot to ask to have someone show/help me,
>> but
>> >> I
>> >> >> really would appreciate it.
>>
>> >> > This example is for google, of course using pygoogle is easier in
>> this
>> >> > case,
>> >> > but this is a valid example for the general case :
>>
>> >> >>>>[207]: import urllib, urllib2
>>
>> >> > You need to trick the server with an imaginary User-Agent.
>>
>> >> >>>>[208]: def google_search(terms) :
>> >> >     return
>> >> urllib2.urlopen(urllib2.Request("http://www.google.com/search?"
>> >> > +
>> >> > urllib.urlencode({'hl':'fr', 'q':terms}),
>> >> >                                          
>>  headers={'User-Agent':'MyNav
>> >> > 1.0
>> >> > (compatible; MSIE 6.0; Linux'})
>> >> >                           ).read()
>> >> >    .....:
>>
>> >> >>>>[212]: res = google_search("python & co")
>>
>> >> > Now you got the whole html response, you'll have to parse it to
>> recover
>> >> > datas,
>> >> > a quick & dirty try on google response page :
>>
>> >> >>>>[213]: import re
>>
>> >> >>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
>> >> class=r>.*?</h2>',
>> >> > res) ]
>> >> > ...[229]:
>> >> > ['Python Gallery',
>> >> >  'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty
>> >> ...',
>> >> >  'Re: os x, panther, python & co: msg#00041',
>> >> >  'Re: os x, panther, python & co: msg#00040',
>> >> >  'Cardiff Web Site Design, Professional web site design services
>> ...',
>> >> >  'Python Properties',
>> >> >  'Frees < Programs < Python < Bin-Co',
>> >> >  'Torb: an interface between Tcl and CORBA',
>> >> >  'Royal Python Morphs',
>> >> >  'Python & Co']
>>
>> >> > --
>> >> > _____________
>>
>> >> > Maric Michaud
>> >> > --
>> >> >http://mail.python.org/mailman/listinfo/python-list
>>
>> >> --
>> >> View this message in
>> >> context:http://www.nabble.com/using-urllib2-tp18150669p18160312.html
>> >> Sent from the Python - python-list mailing list archive at Nabble.com.
>>
>> > --
>> >http://mail.python.org/mailman/listinfo/python-list
>>
>> --
>> View this message in
>> context:http://www.nabble.com/using-urllib2-tp18150669p18165634.html
>> Sent from the Python - python-list mailing list archive at Nabble.com.
> 
> 
> --
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: http://www.nabble.com/using-urllib2-tp18150669p18166785.html
Sent from the Python - python-list mailing list archive at Nabble.com.




More information about the Python-list mailing list