corrupt download with urllib2

Tue Nov 10 11:48:48 EST 2015

Ulli Horlacher wrote:

> Ulli Horlacher <framstag at rus.uni-stuttgart.de> wrote:
>> Peter Otten <__peter__ at web.de> wrote:
> 
>> > - consider shutil.copyfileobj to limit memory usage when dealing with
>> > data
>> >   of arbitrary size.
>> > 
>> > Putting it together:
>> > 
>> >     with open(sz, "wb") as szo:
>> >         shutil.copyfileobj(u, szo)
>> 
>> This writes the http stream binary to the file. without handling it
>> manually chunk by chunk?
> 
> I have a problem with it: There is no feedback for the user about the
> progress of the transfer, which can last several hours.
> 
> For small files shutil.copyfileobj() is a good idea, but not for huge
> ones.

Indeed. Have a look at the source code:

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

As simple as can be. I suggested the function as an alternative to writing 
the loop yourself when your example code basically showed

dest.write(source.read())

For the huge downloads that you intend to cater to you probably want your 
script not just to print a dot on every iteration, you need expected 
remaining time, checksums, ability to stop and resume a download and 
whatnot.

Does the Perl code offer that? Then why rewrite?

Or are there Python libraries that do that out of the box? Can you reuse 
them?