newbie question about confusing exception handling in urllib
Peter Otten
__peter__ at web.de
Tue Apr 9 08:19:06 EDT 2013
cabbar at gmail.com wrote:
> Hi,
>
> I have been using Java/Perl professionally for many years and have been
> trying to learn python3 recently. As my first program, I tried writing a
> class for a small project, and I am having really hard time understanding
> exception handling in urllib and in python in general... Basically, what I
> want to do is very simple, try to fetch something
> "tryurllib.request.urlopen(request)", and:
> - If request times out or connection is reset, re-try n times
> - If it fails, return an error
> - If it works return the content.
>
> But, this simple requirement became a nightmare for me. I am really
> confused about how I should be checking this because:
> - When connection times out, I sometimes get URLException with "reason"
> field set to socket.timeout, and checking (isinstance(exception.reason,
> socket.timeout)) works fine - But sometimes I get socket.timeout
> exception directly, and it has no "reason" field, so above statement
> fails, since there is no reason field there. - Connection reset is a
> totally different exception - Not to mention, some exceptions have msg /
> reason / errno fields but some don't, so there is no way of knowing
> exception details unless you check them one by one. The only common
> thing I could was to find call __str__()? - Since, there are too many
> possible exceptions, you need to catch BaseException (I received
> URLError, socket.timeout, ConnectionRefusedError, ConnectionResetError,
> BadStatusLine, and none share a common parent). And, catching the top
> level exception is not a good thing.
>
> So, I ended up writing the following, but from everything I know, this
> looks really ugly and wrong???
>
> try:
> response = urllib.request.urlopen(request)
> content = response.read()
> except BaseException as ue:
> if (isinstance(ue, socket.timeout) or (hasattr(ue, "reason")
> and isinstance(ue.reason, socket.timeout)) or isinstance(ue,
> ConnectionResetError)):
> print("REQUEST TIMED OUT")
>
> or, something like:
>
> except:
> (a1,a2,a3) = sys.exc_info()
> errorString = a2.__str__()
> if ((errorString.find("Connection reset by peer") >= 0) or
> (errorString.find("error timed out") >= 0)):
>
> Am I missing something here? I mean, is this really how I should be doing
> it?
Does it help if you reorganize your code a bit? For example:
def read_content(request)
try:
response = urllib.request.urlopen(request)
content = response.read()
except socket.timeout:
return None
except URLError as ue:
if isinstance(ue.reason, socket.timeout):
return None
raise
return content
for i in range(max_tries):
content = read_content(request)
if content is not None:
break
else:
print("Could not download", request)
Instead of returning an out-of-band response (None) you could also raise a
custom exception (called MyTimeoutError below). The retry-loop would then
become
for i in range(max_tries):
try:
content = read_content(request):
except MyTimeoutError:
pass
else:
break
else:
print("Could not download", request)
More information about the Python-list
mailing list