This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tests and processors patch for urllib2
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: jhylton Nosy List: jepler, jhylton, jjlee, rhettinger
Priority: normal Keywords: patch

Created on 2003-12-03 00:53 by jjlee, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
urllib2.py.patch jjlee, 2003-12-03 00:53 v1
test_urllib2.py.patch jjlee, 2003-12-03 00:58
liburllib2.tex.patch jjlee, 2003-12-03 00:59 v1
NEWS.patch jjlee, 2003-12-14 16:11
Messages (9)
msg45002 - (view) Author: John J Lee (jjlee) Date: 2003-12-03 00:53
Here are some unit tests for urllib2 and a revised version 
of my urllib2 "processors" patch (originally posted as 
RFE 759792 -- I'm posting it here since it is a patch, not 
just a wish).  The tests depend on the patch, but test 
more than just the changes introduced by the patch. 
 
A fuller discussion is in the original RFE tracker item, but 
briefly: the patch makes it possible to implement 
functionality like HTTP cookie handling, Refresh 
handling, etc. etc. using handler objects.  At the moment 
urllib2's handler objects aren't quite up to the job, which 
results in a lot of cut-n-paste and subclassing.  I believe 
the changes are backwards-compatible, with the 
exception of people who've reimplemented 
build_opener()'s functionality -- those people would need 
to call opener.add_handler(HTTPErrorProcessor). 
 
The main change is allowing handlers to implement 
methods like: 
 
 http_request(request) 
 http_response(request, response) 
 
In addition to the usual 
 
 http_open(request) 
 http_error{_*}(...) 
 
I call handlers that implement these methods 
"processors".  These methods get called for *every* 
processor (in contrast to the ordinary handler methods, 
where the OpenerDirector stops calling the methods as 
soon as the first handler handles the request by returning 
a response) to pre-process requests and post-process 
responses. 
 
If this is accepted, I can submit patches for handlers 
(processors) that do HTTP Refresh redirection, cookie 
handling etc. 
 
 
Changes in the patch: 
 
 
-OpenerDirector changes to call new <protocol>_request 
and <protocol>_response methods.  I haven't put all the 
documentation for this interface in this set of patches 
because there's no obvious place for it: handlers aren't 
really documented either.  The urllib2 docs need a 
cleanup, but I'll do that in a separate patch. 
 
-Added .unredirected_hdrs dict to Request, together 
with .add_unredirected_headers() and .has_header() 
methods.  These headers don't get copied to redirected 
requests.  I didn't add this as a feature for people calling 
urlopen on a Request.  Rather, the motivation comes from 
the fact that processors need to explicitly add headers to 
Requests (Cookie, Referer, Content-Length, etc.), rather 
than directly sending them over the wire.  The problem is, 
if they add them to the regular .headers attribute of 
requests, processors will end up clobbering headers 
added by the user who called urlopen (which would 
break backwards-compatibility).  Having processors use 
a separate set of headers that never get redirected 
makes this problem go away: users can add headers 
(with either .add_header() or .add_unredirected_header(), 
since processors don't clobber either) and know that they 
won't get clobbered by any handler. 
 
-HTTPErrorProcessor is necessary to allow response 
processors to see responses before redirections &c 
happen, by moving the call to parent.error() out of 
AbstractHTTPHandler.do_open().  It has the side-effect of 
stopping people grumbling that 200 is not the only 
success code in HTTP <0.5 wink>, since it makes it 
feasible to override urllib2's behaviour of raising an 
exception unless the HTTP code == 200. 
 
-Split part of AbstractHTTPHandler.do_open (which 
implements http_open / https_open in the 
HTTP/HTTPSHandler subclasses) into a new .do_request 
(which implements http_request in the subclasses).  Just 
because I could, really (with the new *_request methods).  
It seems clearer to me. 
 
-Single string-formatting-character change to 
OpenerDirector.error() to allow "refresh" as an error 
code. 
 
-Added .code and .msg attributes to HTTP response 
objects, so that processors can know what the response 
code and message are.  I haven't documented these, 
because they're HTTP-specific. 
 
-Renamed HTTPRedirectHandler.error_302_dict 
--> .redirect_dict 
 
-Finally, there's one bugfix to HTTPRedirectHandler 
included in the patch, because the tests test for it: 
multiple visits to a single URL with different redirect codes 
is no longer erroneously detected as a loop. 
 
 http://a.com/a --> 302 --> http://a.com/b --> Refresh --> 
http://a.com/a 
 
 Yes, I have seen a site where this really happens! 
 
 
There are a few other bugs that turned up while writing 
the tests, and those tests are commented out ATM.  I'll 
file bug reports for those separately after this one is 
sorted out. 
msg45003 - (view) Author: John J Lee (jjlee) Date: 2003-12-08 00:04
Logged In: YES 
user_id=261020

Oops, I left a couple of lines in urllib2.py that add an HTTP 
debugging method and constructor arg.  Please ignore that 
part of the patch. 
msg45004 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2003-12-14 05:28
Logged In: YES 
user_id=31392

Committed as rev 1.57 of urllib2.py
msg45005 - (view) Author: John J Lee (jjlee) Date: 2003-12-14 15:17
Logged In: YES 
user_id=261020

Just a note that my justification of 
Request.add_unredirected_hdrs above makes no sense. :-/ I 
mis-remembered my own reasons for adding this.  My original 
comment from 759792 is correct: 
 
Some headers (Content-Length, Referer, ...) mustn't be copied 
to requests for a redirected URL. This requires the addition of a 
new dict to Request. I've added an add_unredirected_headers 
method, too. The old implementation just sends these headers 
directly, but that's not possible if you want to use procesors to 
implement these things. 
msg45006 - (view) Author: John J Lee (jjlee) Date: 2003-12-14 16:11
Logged In: YES 
user_id=261020

Reopening to add a note to Misc/NEWS about the 
backwards-incompatibility for people who don't use 
build_opener().  I presume that's the right place for this note? 
msg45007 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-01-01 04:00
Logged In: YES 
user_id=80475

FWIW, the test for urllib2 is currently failing and needs to be 
fixed.
msg45008 - (view) Author: John J Lee (jjlee) Date: 2004-01-04 23:51
Logged In: YES 
user_id=261020

Fixed tests in patch 870606. 
msg45009 - (view) Author: Jeff Epler (jepler) Date: 2005-02-08 13:09
Logged In: YES 
user_id=2772

The fedora folks claim that this patch has caused "yum", an
apt-like tool for RPMs, to break in python 2.4.0.  

I haven't been able to verify this, and nobody has come up
with a simple, standalone script to produce the problem.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=138535
msg45010 - (view) Author: John J Lee (jjlee) Date: 2005-02-12 03:14
Logged In: YES 
user_id=261020

I added a detailed comment on the yum/urllib2 issue to the
bugzilla tracker (as user jjl at pobox dot com).  As I note
there, though this is strictly a yum bug, if there is a
good, safe way of fixing urllib2 here to be more forgiving,
I think it should be done.  If I think of one when I look at
it again, will submit a patch...
History
Date User Action Args
2022-04-11 14:56:01adminsetgithub: 39648
2003-12-03 00:53:50jjleecreate