python 3.3 urllib.request

Hans Mulder hansmu at xs4all.nl
Sat Dec 8 07:21:51 EST 2012


On 8/12/12 07:20:55, Terry Reedy wrote:
> On 12/7/2012 12:27 PM, Hans Mulder wrote:
>> On 7/12/12 13:52:52, Steeve C wrote:
>>> hello,
>>>
>>> I have a python3 script with urllib.request which have a strange
>>> behavior, here is the script :
>>>
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>> #!/usr/bin/env python3
>>> # -*- coding: utf-8 -*-
>>>
>>> import urllib.request
>>> import sys, time
>>>
>>>
>>> url = 'http://google.com'
>>>
>>> def make_some_stuff(page, url):
>>>      sys.stderr.write(time.strftime("%d/%m/%Y %H:%M:%S -> page from \"")
>>> + url + "\"\n")
>>>      sys.stderr.write(str(page) + "\"\n")
>>>      return True
>>>
>>> def get_page(url):
>>>      while 1:
>>>          try:
>>>              page = urllib.request.urlopen(url)
>>>              yield page
>>>
>>>          except urllib.error.URLError as e:
>>>              sys.stderr.write(time.strftime("%d/%m/%Y %H:%M:%S ->
>>> impossible to access to \"") + url + "\"\n")
>>>              time.sleep(5)
>>>              continue
>>>
>>> def main():
>>>      print('in main')
>>>      for page in get_page(url):
>>>          make_some_stuff(page, url)
>>>          time.sleep(5)
>>>
>>> if __name__ == '__main__':
>>>      main()
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> if the computer is connected on internet (with an ethernet connection
>>> for example) and I run this script, it works like a charme :
>>> - urllib.request.urlopen return the page
>>> - make_some_stuff write in stderr
>>> - when the ethernet cable is unplug the except block handle the error
>>> while the cable is unplug, and when the cable is pluged
>>> back urllib.request.urlopen return the page and make_some_stuff write in
>>> stderr
>>>
>>> this is the normal behavior (for me, imho).
>>>
>>> but if the computer is not connected on internet (ethernet cable
>>> unpluged) and I run this script, the except block handle the error
>>> (normal), but when I plug the cable, the script continue looping
>>> and urllib.request.urlopen never return the page (so, it always
>>> go to the except block)
>>>
>>> What can I do to handle that ?

> Don't do that '-).

>> On my laptop, your script works as you'd hope: if I plug in the
>> network cable, then the next urllib request sometimes fails, but
>> the request after that succeeds.
>> This is using Python 3.3 on MacOS X 10.5.
>> What version are you running?
>>
>> What happens if you start the script with the network cable
>> plugged in, then unplug it when the first request has succeeded,
>> and then plug it in again when the next request has failed?

> I believe he said that that worked.

You're right: he said that.

> But unplugging cables is not a good idea ;-)
>
> I remember when it was recommended that all cables be plugged in and the
> the connected devices turned on when the computer was turned on and when
> devices might not be recognized unless plugged in and on when the
> computer was booted or rebooted. In other words, ports were scanned once
> as part of the boot process and adding a device required a reboot.

I also remember the time when that was true.  But these day, many
devices are designed to be plugged in with the computer running,
and the OS continuously scans for new devices.

> It certainly was not that long ago when I had to reboot after the
> Internet Service went down and the cable modem had to reset.

That's a configuration problem: when the cable modem is reset, your
computer needs to rerun its "network up" script to renew its DHCP lease.
If it isn't configured to do that automatically, and you don't know
how to run it manually, then rebooting may be your only option.

This is a common problem on desktop computers (where losing the
connection to the cable modem is rare).  Laptops are typically
configured to deal with connection appearing and disappearing
on both the wired and the wireless interface.

> Ethernet and usb ports and modern OSes are more forgiving. But it does
> not surprise me if on some systems something has to be presence at
> process startup to even be visible to the process.

His system may be caching the outcome of the IP address lookup.

If that's the case, I'd expect different error messages, depending
on whether the first lookup succeeded or not.  But since his script
carefully avoids printing the exception message, it's hard to tell.

> I believe this is all beyond Python's control. So the only thing to do
> might be to change hardware and/or OS or have the program restart itself
> if it gets repeated errors.

I think that it would deped on the error message.  If the error is
"Network is unreachable" or "No route to host", then sleeping and
trying again might work. If the error is "nodename nor servname
provided, or not known", then the script would have to restart itself.


Hope this helps,

-- HansM







More information about the Python-list mailing list