This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: fix for 1016880 urllib.urlretrieve silently truncates dwnld
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: georg.brandl, georg.brandl, irmen, jlgijsbers, loewis, rhettinger
Priority: high Keywords: patch

Created on 2004-11-07 20:15 by irmen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
urllib.patch2 irmen, 2004-11-07 20:54 improved patch
urllib-truncate.diff georg.brandl, 2005-07-15 08:51
Messages (11)
msg47256 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2004-11-07 20:15
The patch makes urllib.urlretrieve raise an IOError if
the actual download size is different from the expected
size (taken from the content-length header). 
msg47257 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2004-11-07 20:54
Logged In: YES 
user_id=129426

NOTE:
urllib.patch2 may be a bit better. It fixes a misspelling,
and also is more relaxed about a 'wrong' download size.
To be more precise: it treats content-length as a lower
bound (just like wget and firefox appear to do). So if
there's more data to read, it  reads more data, but if less
data is available, it gives an IOError
msg47258 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-12-06 21:48
Logged In: YES 
user_id=469548

Sorry Irmen, I'm a bit late with this, but now is the time
to get new "features" checked into the trunk. Could you add
a doc patch that explains the behavior as you did in your
previous message and a tests patch? I can check it in then.
msg47259 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-12-21 00:00
Logged In: YES 
user_id=80475

Irmin, please attach the new patch.

If you're having trouble with the docs, that's fine, just
include the text you want changed.

Do include tests with your patch.
msg47260 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2004-12-24 14:10
Logged In: YES 
user_id=129426

Yes I'm having trouble building the docs from source, so I
will just add the text that I would like to change in the docs.
When I have some time left (it's Christmas after all :) )
I'll also create a regression test for the new behavior.
In the meantime, the "urllib.patch" may be deleted because
"urllib.patch2" is the correct patch (I seem to be unable to
delete the attachment myself)
msg47261 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2004-12-24 14:30
Logged In: YES 
user_id=129426

Suggested addition to the doc of urllib (liburllib.tex, if
I'm not mistaken):

"""

urlretrieve will raise IOError when it detects that the
amount of data available 
was less than the expected amount (which is the size
reported by a Content-Length
header). This can occur, for example, when the download is
interrupted.

The Content-Length is treated as a lower bound (just like
tools such as wget and 
Ffirefox appear to do): if there's more data to read,
urlretrieve reads more data, but 
if less data is available, it raises IOError.

If no Content-Length header was supplied, urlretrieve can
not check the size
of the data it has downloaded, and just returns it. In this
case you
just have to assume that the download was successful.
"""
msg47262 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-02-24 21:07
Logged In: YES 
user_id=21627

I think the patch is essentially right. However, I'm
concerned with losing the data that got just downloaded - I
propose to stick them into the IOError (or, better, subclass
IOError to keep the data, and document where to find them).
msg47263 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-07-15 08:51
Logged In: YES 
user_id=1188172

Attaching new patch which implements Martin's suggestion
(urllib-truncate.diff). Please review.
msg47264 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2005-08-24 07:11
Logged In: YES 
user_id=80475

Martin, is this okay?
msg47265 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-08-24 08:49
Logged In: YES 
user_id=21627

The patch is fine, please apply. As a minor nit: I don't
think we should have SF bug report numbers in the code. The
comments in the code should be self-contained, without the
need to look at a bug report. If somebody really wants to
find out what triggered addition to the code, they should do
"cvs ann", and then look at the commit message (which should
have the bugid).

I also think there should be a \versionadded for the new
exception. Not sure whether it can go into the 2.4 branch
because of the new exception.
msg47266 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-08-24 18:47
Logged In: YES 
user_id=1188172

Changed according to your nits and committed only to 2.5 branch.

Lib/urllib.py 1.167
Doc/lib/liburllib.tex 1.58
History
Date User Action Args
2022-04-11 14:56:08adminsetgithub: 41147
2004-11-07 20:15:32irmencreate