Question about using urllib2 to load a url

Mon Apr 2 00:31:59 EDT 2007

On Apr 2, 2:52 am, "ken" <ken.carl... at gmail.com> wrote:
> Hi,
>
> i have the following code to load a url.
> My question is what if I try to load an invalide url
> ("http://www.heise.de/"), will I get an IOException? or it will wait
> forever?
>

Depends on why the URL is invalid.  If the URL refers to a non-
existent domain, a DNS request will result in error and you will get
an "urllib2.URLError: <urlopen error (-2, 'Name or service not
known')>".  If the name resolves but the host is not reachable, the
connect code will timeout (eventually) and result in an
"urllib2.URLError: <urlopen error (113, 'No route to host')>".  If the
host exists but does not have a web server running, you will get an
"urllib2.URLError: <urlopen error (111, 'Connection refused')>".  If a
webserver is running but the requested page does not exist, you will
get an "urllib2.HTTPError: HTTP Error 404: Not Found".

The URL you gave above does not meet any of these conditions, so
results in a valid handle to read from.

If, at any time, an error response fails to reach your machine, the
code will have to wait for a timeout.  It should not have to wait
forever.

> Thanks for any help.
>
>  opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
>     urllib2.install_opener(opener)
>
>     txheaders = {'User-agent': 'Mozilla/5.0 (X11; U; Linux i686; en-
> US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3'}
>
>     try:
>         req = Request(url, txdata, txheaders)
>         handle = urlopen(req)
>     except IOError, e:
>         print e
>         print 'Failed to open %s' % url
>         return 0;

--
Kushal