[Tutor] Hook (Urlretrieve)

Kent Johnson kent_johnson at skillsoft.com
Sun Aug 29 20:43:37 CEST 2004


The information you need is in the documentation for urlretrieve. I guess 
it needs some additional explanation, though :-)

Here are the relevant pieces:
urlretrieve( url[, filename[, reporthook[, data]]])
"The third argument, if present, is a hook function that will be called 
once on establishment of the network connection and once after each block 
read thereafter. The hook will be passed three arguments; a count of blocks 
transferred so far, a block size in bytes, and the total size of the file. 
The third argument may be -1 on older FTP servers which do not return a 
file size in response to a retrieval request. "

What this means is, the urlretrieve() function can take up to four 
arguments. The first argument (url) is required; the others are optional.

The third argument, if present, should be a reference to a function of 
three arguments. If you are not used to using first-class functions, this 
could be confusing, but actually it is very easy to use.

Define the hook function you want, for example:

 >>> def myReportHook(count, blockSize, totalSize):
...     print count, blockSize, totalSize
...

Now in your call to urlretrieve, pass 'myReportHook' as the third argument 
(or as a keyword argument):
 >>> import urllib
 >>> 
urllib.urlretrieve('http://personalpages.tds.net/~kent37/Python/PythonResources.html', 
reporthook=myReportHook)
0 8192 7955
1 8192 7955
2 8192 7955
('c:\\docume~1\\ktjohn~1\\locals~1\\temp\\tmpwxgvn_.html', 
<httplib.HTTPMessage instance at 0x0088EDF0>)

This result surprised me at first - why is it reading three blocks? But the 
first line is output _before_ any blocks are read (count=0); the second 
line is the actual read, and the third line is the failed read that ends 
the operation. If you look at the code for urllib.urlretrieve(), you see 
that it calls report hook for each attempted read, so it will always get an 
extra call at the end.

Kent

At 07:29 PM 8/29/2004 +0200, Øyvind wrote:
>Hello,
>
>    I have a few thousand objects I would like to retrieve from the web and
>store locally. Is urllib.urlretrieve the most efficient way to do so?
>
>    In the documentation to Python it is mentioned that the urlretrieve
>have a hook function. As I understand that function can provide the
>info about the size and blocks transfered. How do I use that hook
>function? I would like to present a counter that shows how far in the
>process with the downloading it is. But, as far as I have seen, there
>is no info on how to use that hook. The closest I found was
>PyCCmdTarget.HookCommand and its description. But to be honest, I
>didn't understand at all what, how or if this is related, and how I can
>use it. A search in my Pythonbooks as well as Google proved fruitless.
>Is there some way to explain hooks in a simple way? Or does anyone have
>some simple examples of how it may be used?
>
>Thanks in advance...
>
>--
>This email has been scanned for viruses & spam by Decna as - www.decna.no
>Denne E-post er sjekket for virus & spam av Decna as - www.decna.no
>
>_______________________________________________
>Tutor maillist  -  Tutor at python.org
>http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list