[Patches] [ python-Patches-849407 ] urllib reporthook could be more informative

SourceForge.net noreply at sourceforge.net
Tue Nov 21 19:29:44 CET 2006


Patches item #849407, was opened at 2003-11-26 04:41
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=849407&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.3
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Allan B. Wilson (allanbwilson)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib reporthook could be more informative

Initial Comment:
A reporthook in urllib.urlretrieve() (in 2.3.2) is
given the max number of characters accepted ("bs") per
.read() as its second argument. It would be more
helpful to receive the number of characters actually
retrieved in the most recent block.

While perhaps this would break some existing code
(though I can't imagine how), the minor patches below
will allow giving progess updates, etc. that are accurate.

Thanks

Allan Wilson

------------

*** urllib.py.old Tue Nov 25 17:42:55 2003
--- urllib.py Tue Nov 25 18:00:50 2003
***************
*** 236,248 ****
              reporthook(0, bs, size)
          block = fp.read(bs)
          if reporthook:
!             reporthook(1, bs, size)
          while block:
              tfp.write(block)
              block = fp.read(bs)
              blocknum = blocknum + 1
              if reporthook:
!                 reporthook(blocknum, bs, size)
          fp.close()
          tfp.close()
          del fp
--- 236,248 ----
              reporthook(0, bs, size)
          block = fp.read(bs)
          if reporthook:
!             reporthook(1, len(block), size)
          while block:
              tfp.write(block)
              block = fp.read(bs)
              blocknum = blocknum + 1
              if reporthook:
!                 reporthook(blocknum, len(block), size)
          fp.close()
          tfp.close()
          del fp


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2006-11-21 19:29

Message:
Logged In: YES 
user_id=21627
Originator: NO

Discussion on python-dev revealed that read() on a socket will always give
you blocksize data, except for the last block. So this doesn't really
change anything in practice; applications that find that the data read
(blocksize*blocknumber) exceeds the amount of data expected should
conclude that they saw the last block.

Rejecting this patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-11-19 17:44

Message:
Logged In: YES 
user_id=6380
Originator: NO

I notice that the patch doesn't apply to the svn head (2.6a0).  But that's
easily fixed and the idea still applies.

As the original author of the code being patched I believe my reason for
doing it the old way was that I wanted the report hook to be called before
the first block, which would let a GUI open up a dialog box before anything
was read.  The idea was that if the reads are really slow, you'd want the
dialog box there right from the start.  But this was rather naive, since
the most likely source of delay is making the connection and getting the
response header back, and the report hook isn't being called at all until
all the headers have been seen.

The changed API to reporthook() needs to be documented very clearly. 
There's one call to reporthook() that still passes the block size instead
of the actual data size.  A naive implementation could be confused by this
call, although it is easily recognized because it is the first call and the
only one with blocknum equal to zero.

I think this is a fine change -- as long as it isn't backported, since it
is clearly a feature change.  I do wonder "why bother", since most people
using urllib don't care all that much about extreme details (I can't
remember the last time I specified a reporthook), and most people caring
about details don't like urllib and use something else (e.g. httplib, or
urllib2).

So I guess I'm somewhere between +0 and -0 on this on this.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-11-27 20:41

Message:
Logged In: YES 
user_id=21627

Can you please attach the patch, instead of pasting it?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=849407&group_id=5470


More information about the Patches mailing list