[Patches] [ python-Patches-852995 ] tests and processors patch for
urllib2
SourceForge.net
noreply at sourceforge.net
Tue Dec 2 19:53:50 EST 2003
Patches item #852995, was opened at 2003-12-03 00:53
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=852995&group_id=5470
Category: Library (Lib)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: tests and processors patch for urllib2
Initial Comment:
Here are some unit tests for urllib2 and a revised version
of my urllib2 "processors" patch (originally posted as
RFE 759792 -- I'm posting it here since it is a patch, not
just a wish). The tests depend on the patch, but test
more than just the changes introduced by the patch.
A fuller discussion is in the original RFE tracker item, but
briefly: the patch makes it possible to implement
functionality like HTTP cookie handling, Refresh
handling, etc. etc. using handler objects. At the moment
urllib2's handler objects aren't quite up to the job, which
results in a lot of cut-n-paste and subclassing. I believe
the changes are backwards-compatible, with the
exception of people who've reimplemented
build_opener()'s functionality -- those people would need
to call opener.add_handler(HTTPErrorProcessor).
The main change is allowing handlers to implement
methods like:
http_request(request)
http_response(request, response)
In addition to the usual
http_open(request)
http_error{_*}(...)
I call handlers that implement these methods
"processors". These methods get called for *every*
processor (in contrast to the ordinary handler methods,
where the OpenerDirector stops calling the methods as
soon as the first handler handles the request by returning
a response) to pre-process requests and post-process
responses.
If this is accepted, I can submit patches for handlers
(processors) that do HTTP Refresh redirection, cookie
handling etc.
Changes in the patch:
-OpenerDirector changes to call new <protocol>_request
and <protocol>_response methods. I haven't put all the
documentation for this interface in this set of patches
because there's no obvious place for it: handlers aren't
really documented either. The urllib2 docs need a
cleanup, but I'll do that in a separate patch.
-Added .unredirected_hdrs dict to Request, together
with .add_unredirected_headers() and .has_header()
methods. These headers don't get copied to redirected
requests. I didn't add this as a feature for people calling
urlopen on a Request. Rather, the motivation comes from
the fact that processors need to explicitly add headers to
Requests (Cookie, Referer, Content-Length, etc.), rather
than directly sending them over the wire. The problem is,
if they add them to the regular .headers attribute of
requests, processors will end up clobbering headers
added by the user who called urlopen (which would
break backwards-compatibility). Having processors use
a separate set of headers that never get redirected
makes this problem go away: users can add headers
(with either .add_header() or .add_unredirected_header(),
since processors don't clobber either) and know that they
won't get clobbered by any handler.
-HTTPErrorProcessor is necessary to allow response
processors to see responses before redirections &c
happen, by moving the call to parent.error() out of
AbstractHTTPHandler.do_open(). It has the side-effect of
stopping people grumbling that 200 is not the only
success code in HTTP <0.5 wink>, since it makes it
feasible to override urllib2's behaviour of raising an
exception unless the HTTP code == 200.
-Split part of AbstractHTTPHandler.do_open (which
implements http_open / https_open in the
HTTP/HTTPSHandler subclasses) into a new .do_request
(which implements http_request in the subclasses). Just
because I could, really (with the new *_request methods).
It seems clearer to me.
-Single string-formatting-character change to
OpenerDirector.error() to allow "refresh" as an error
code.
-Added .code and .msg attributes to HTTP response
objects, so that processors can know what the response
code and message are. I haven't documented these,
because they're HTTP-specific.
-Renamed HTTPRedirectHandler.error_302_dict
--> .redirect_dict
-Finally, there's one bugfix to HTTPRedirectHandler
included in the patch, because the tests test for it:
multiple visits to a single URL with different redirect codes
is no longer erroneously detected as a loop.
http://a.com/a --> 302 --> http://a.com/b --> Refresh -->
http://a.com/a
Yes, I have seen a site where this really happens!
There are a few other bugs that turned up while writing
the tests, and those tests are commented out ATM. I'll
file bug reports for those separately after this one is
sorted out.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=852995&group_id=5470
More information about the Patches
mailing list