|Title:||Extended HTTP functionality and WebDAV|
|Last-Modified:||2007-06-19 04:20:07 +0000 (Tue, 19 Jun 2007)|
|Author:||gstein at lyra.org (Greg Stein)|
This PEP has been rejected. It has failed to generate sufficient community support in the six years since its proposal.
This PEP discusses new modules and extended functionality for Python's HTTP support. Notably, the addition of authenticated requests, proxy support, authenticated proxy usage, and WebDAV  capabilities.
Python has been quite popular as a result of its "batteries included" positioning. One of the most heavily used protocols, HTTP (see RFC 2616), has been included with Python for years (httplib). However, this support has not kept up with the full needs and requirements of many HTTP-based applications and systems. In addition, new protocols based on HTTP, such as WebDAV and XML-RPC, are becoming useful and are seeing increasing usage. Supplying this functionality meets Python's "batteries included" role and also keeps Python at the leading edge of new technologies.
While authentication and proxy support are two very notable features missing from Python's core HTTP processing, they are minimally handled as part of Python's URL handling (urllib and urllib2). However, applications that need fine-grained or sophisticated HTTP handling cannot make use of the features while they reside in urllib. Refactoring these features into a location where they can be directly associated with an HTTP connection will improve their utility for both urllib and for sophisticated applications.
The motivation for this PEP was from several people requesting these features directly, and from a number of feature requests on SourceForge. Since the exact form of the modules to be provided and the classes/architecture used could be subject to debate, this PEP was created to provide a focal point for those discussions.
Two modules will be added to the standard library: httpx (HTTP extended functionality), and davlib (WebDAV library).
[ suggestions for module names are welcome; davlib has some precedence, but something like webdav might be desirable ]
The httpx module will provide a mixin for performing HTTP authentication (for both proxy and origin server authentication). This mixin (httpx.HandleAuthentication) can be combined with the HTTPConnection and the HTTPSConnection classes (the mixin may possibly work with the HTTP and HTTPS compatibility classes, but that is not a requirement).
The mixin will delegate the authentication process to one or more "authenticator" objects, allowing multiple connections to share authenticators. The use of a separate object allows for a long term connection to an authentication system (e.g. LDAP). An authenticator for the Basic and Digest mechanisms (see RFC 2617) will be provided. User-supplied authenticator subclasses can be registered and used by the connections.
A "credentials" object (httpx.Credentials) is also associated with the mixin, and stores the credentials (e.g. username and password) needed by the authenticators. Subclasses of Credentials can be created to hold additional information (e.g. NT domain).
The mixin overrides the getresponse() method to detect 401 (Unauthorized) and 407 (Proxy Authentication Required) responses. When this is found, the response object, the connection, and the credentials are passed to the authenticator corresponding with the authentication scheme specified in the response (multiple authenticators are tried in decreasing order of security if multiple schemes are in the response). Each authenticator can examine the response headers and decide whether and how to resend the request with the correct authentication headers. If no authenticator can successfully handle the authentication, then an exception is raised.
Resending a request, with the appropriate credentials, is one of the more difficult portions of the authentication system. The difficulty arises in recording what was sent originally: the request line, the headers, and the body. By overriding putrequest, putheader, and endheaders, we can capture all but the body. Once the endheaders method is called, then we capture all calls to send() (until the next putrequest method call) to hold the body content. The mixin will have a configurable limit for the amount of data to hold in this fashion (e.g. only hold up to 100k of body content). Assuming that the entire body has been stored, then we can resend the request with the appropriate authentication information.
If the body is too large to be stored, then the getresponse() simply returns the response object, indicating the 401 or 407 error. Since the authentication information has been computed and cached (into the Credentials object; see below), the caller can simply regenerate the request. The mixin will attach the appropriate credentials.
A "protection space" (see RFC 2617, section 1.2) is defined as a tuple of the host, port, and authentication realm. When a request is initially sent to an HTTP server, we do not know the authentication realm (the realm is only returned when authentication fails). However, we do have the path from the URL, and that can be useful in determining the credentials to send to the server. The Basic authentication scheme is typically set up hierarchically: the credentials for /path can be tried for /path/subpath. The Digest authentication scheme has explicit support for the hierarchical setup. The httpx.Credentials object will store credentials for multiple protection spaces, and can be looked up in two differents ways:
- looked up using (host, port, path) -- this lookup scheme is used when generating a request for a path where we don't know the authentication realm.
- looked up using (host, port, realm) -- this mechanism is used during the authentication process when the server has specified that the Request-URI resides within a specific realm.
The HandleAuthentication mixin will override putrequest() to automatically insert credentials, if available. The URL from the putrequest is used to determine the appropriate authentication information to use.
It is also important to note that two sets of credentials are used, and stored by the mixin. One set for any proxy that may be used, and one used for the target origin server. Since proxies do not have paths, the protection spaces in the proxy credentials will always use "/" for storing and looking up via a path.
The httpx module will provide a mixin for using a proxy to perform HTTP(S) operations. This mixin (httpx.UseProxy) can be combined with the HTTPConnection and the HTTPSConnection classes (the mixin may possibly work with the HTTP and HTTPS compatibility classes, but that is not a requirement).
The mixin will record the (host, port) of the proxy to use. XXX will be overridden to use this host/port combination for connections and to rewrite request URLs into the absoluteURIs referring to the origin server (these URIs are passed to the proxy server).
Proxy authentication is handled by the httpx.HandleAuthentication class since a user may directly use HTTP(S)Connection to speak with proxies.
The davlib module will provide a mixin for sending WebDAV requests to a WebDAV-enabled server. This mixin (davlib.DAVClient) can be combined with the HTTPConnection and the HTTPSConnection classes (the mixin may possibly work with the HTTP and HTTPS compatibility classes, but that is not a requirement).
A custom response object is used to decode 207 (Multi-Status) responses. The response object will use the standard library's xml package to parse the multistatus XML information, producing a simple structure of objects to hold the multistatus data. Multiple parsing schemes will be tried/used, in order of decreasing speed.
The actual (future/final) implementation is being developed in the /nondist/sandbox/Lib directory, until it is accepted and moved into the main Lib directory.
This document has been placed in the public domain.