[Python-checkins] r56756 - sandbox/trunk/urilib/libcgi.tex sandbox/trunk/urilib/liburllib2.tex sandbox/trunk/urilib/liburlparse.tex sandbox/trunk/urilib/test_urllib2.py sandbox/trunk/urilib/urllib2.py

senthil.kumaran python-checkins at python.org
Sun Aug 5 23:42:34 CEST 2007


Author: senthil.kumaran
Date: Sun Aug  5 23:42:33 2007
New Revision: 56756

Added:
   sandbox/trunk/urilib/libcgi.tex   (contents, props changed)
   sandbox/trunk/urilib/liburllib2.tex   (contents, props changed)
   sandbox/trunk/urilib/liburlparse.tex   (contents, props changed)
   sandbox/trunk/urilib/test_urllib2.py   (contents, props changed)
   sandbox/trunk/urilib/urllib2.py   (contents, props changed)
Log:
SoC Tasks update. Added Docs, urllib2 cache redirection

Added: sandbox/trunk/urilib/libcgi.tex
==============================================================================
--- (empty file)
+++ sandbox/trunk/urilib/libcgi.tex	Sun Aug  5 23:42:33 2007
@@ -0,0 +1,609 @@
+\section{\module{cgi} ---
+         Common Gateway Interface support.}
+\declaremodule{standard}{cgi}
+
+\modulesynopsis{Common Gateway Interface support, used to interpret
+forms in server-side scripts.}
+
+\indexii{WWW}{server}
+\indexii{CGI}{protocol}
+\indexii{HTTP}{protocol}
+\indexii{MIME}{headers}
+\index{URL}
+
+
+Support module for Common Gateway Interface (CGI) scripts.%
+\index{Common Gateway Interface}
+
+This module defines a number of utilities for use by CGI scripts
+written in Python.
+
+\subsection{Introduction}
+\nodename{cgi-intro}
+
+A CGI script is invoked by an HTTP server, usually to process user
+input submitted through an HTML \code{<FORM>} or \code{<ISINDEX>} element.
+
+Most often, CGI scripts live in the server's special \file{cgi-bin}
+directory.  The HTTP server places all sorts of information about the
+request (such as the client's hostname, the requested URL, the query
+string, and lots of other goodies) in the script's shell environment,
+executes the script, and sends the script's output back to the client.
+
+The script's input is connected to the client too, and sometimes the
+form data is read this way; at other times the form data is passed via
+the ``query string'' part of the URL.  This module is intended
+to take care of the different cases and provide a simpler interface to
+the Python script.  It also provides a number of utilities that help
+in debugging scripts, and the latest addition is support for file
+uploads from a form (if your browser supports it).
+
+The output of a CGI script should consist of two sections, separated
+by a blank line.  The first section contains a number of headers,
+telling the client what kind of data is following.  Python code to
+generate a minimal header section looks like this:
+
+\begin{verbatim}
+print "Content-Type: text/html"     # HTML is following
+print                               # blank line, end of headers
+\end{verbatim}
+
+The second section is usually HTML, which allows the client software
+to display nicely formatted text with header, in-line images, etc.
+Here's Python code that prints a simple piece of HTML:
+
+\begin{verbatim}
+print "<TITLE>CGI script output</TITLE>"
+print "<H1>This is my first CGI script</H1>"
+print "Hello, world!"
+\end{verbatim}
+
+\subsection{Using the cgi module}
+\nodename{Using the cgi module}
+
+Begin by writing \samp{import cgi}.  Do not use \samp{from cgi import
+*} --- the module defines all sorts of names for its own use or for
+backward compatibility that you don't want in your namespace.
+
+When you write a new script, consider adding the line:
+
+\begin{verbatim}
+import cgitb; cgitb.enable()
+\end{verbatim}
+
+This activates a special exception handler that will display detailed
+reports in the Web browser if any errors occur.  If you'd rather not
+show the guts of your program to users of your script, you can have
+the reports saved to files instead, with a line like this:
+
+\begin{verbatim}
+import cgitb; cgitb.enable(display=0, logdir="/tmp")
+\end{verbatim}
+
+It's very helpful to use this feature during script development.
+The reports produced by \refmodule{cgitb} provide information that
+can save you a lot of time in tracking down bugs.  You can always
+remove the \code{cgitb} line later when you have tested your script
+and are confident that it works correctly.
+
+To get at submitted form data,
+it's best to use the \class{FieldStorage} class.  The other classes
+defined in this module are provided mostly for backward compatibility.
+Instantiate it exactly once, without arguments.  This reads the form
+contents from standard input or the environment (depending on the
+value of various environment variables set according to the CGI
+standard).  Since it may consume standard input, it should be
+instantiated only once.
+
+The \class{FieldStorage} instance can be indexed like a Python
+dictionary, and also supports the standard dictionary methods
+\method{has_key()} and \method{keys()}.  The built-in \function{len()}
+is also supported.  Form fields containing empty strings are ignored
+and do not appear in the dictionary; to keep such values, provide
+a true value for the optional \var{keep_blank_values} keyword
+parameter when creating the \class{FieldStorage} instance.
+
+For instance, the following code (which assumes that the 
+\mailheader{Content-Type} header and blank line have already been
+printed) checks that the fields \code{name} and \code{addr} are both
+set to a non-empty string:
+
+\begin{verbatim}
+form = cgi.FieldStorage()
+if not (form.has_key("name") and form.has_key("addr")):
+    print "<H1>Error</H1>"
+    print "Please fill in the name and addr fields."
+    return
+print "<p>name:", form["name"].value
+print "<p>addr:", form["addr"].value
+...further form processing here...
+\end{verbatim}
+
+Here the fields, accessed through \samp{form[\var{key}]}, are
+themselves instances of \class{FieldStorage} (or
+\class{MiniFieldStorage}, depending on the form encoding).
+The \member{value} attribute of the instance yields the string value
+of the field.  The \method{getvalue()} method returns this string value
+directly; it also accepts an optional second argument as a default to
+return if the requested key is not present.
+
+If the submitted form data contains more than one field with the same
+name, the object retrieved by \samp{form[\var{key}]} is not a
+\class{FieldStorage} or \class{MiniFieldStorage}
+instance but a list of such instances.  Similarly, in this situation,
+\samp{form.getvalue(\var{key})} would return a list of strings.
+If you expect this possibility
+(when your HTML form contains multiple fields with the same name), use
+the \function{getlist()} function, which always returns a list of values (so that you
+do not need to special-case the single item case).  For example, this
+code concatenates any number of username fields, separated by
+commas:
+
+\begin{verbatim}
+value = form.getlist("username")
+usernames = ",".join(value)
+\end{verbatim}
+
+If a field represents an uploaded file, accessing the value via the
+\member{value} attribute or the \function{getvalue()} method reads the
+entire file in memory as a string.  This may not be what you want.
+You can test for an uploaded file by testing either the \member{filename}
+attribute or the \member{file} attribute.  You can then read the data at
+leisure from the \member{file} attribute:
+
+\begin{verbatim}
+fileitem = form["userfile"]
+if fileitem.file:
+    # It's an uploaded file; count lines
+    linecount = 0
+    while 1:
+        line = fileitem.file.readline()
+        if not line: break
+        linecount = linecount + 1
+\end{verbatim}
+
+The file upload draft standard entertains the possibility of uploading
+multiple files from one field (using a recursive
+\mimetype{multipart/*} encoding).  When this occurs, the item will be
+a dictionary-like \class{FieldStorage} item.  This can be determined
+by testing its \member{type} attribute, which should be
+\mimetype{multipart/form-data} (or perhaps another MIME type matching
+\mimetype{multipart/*}).  In this case, it can be iterated over
+recursively just like the top-level form object.
+
+When a form is submitted in the ``old'' format (as the query string or
+as a single data part of type
+\mimetype{application/x-www-form-urlencoded}), the items will actually
+be instances of the class \class{MiniFieldStorage}.  In this case, the
+\member{list}, \member{file}, and \member{filename} attributes are
+always \code{None}.
+
+
+\subsection{Higher Level Interface}
+
+\versionadded{2.2}  % XXX: Is this true ? 
+
+The previous section explains how to read CGI form data using the
+\class{FieldStorage} class.  This section describes a higher level
+interface which was added to this class to allow one to do it in a
+more readable and intuitive way.  The interface doesn't make the
+techniques described in previous sections obsolete --- they are still
+useful to process file uploads efficiently, for example.
+
+The interface consists of two simple methods. Using the methods
+you can process form data in a generic way, without the need to worry
+whether only one or more values were posted under one name.
+
+In the previous section, you learned to write following code anytime
+you expected a user to post more than one value under one name:
+
+\begin{verbatim}
+item = form.getvalue("item")
+if isinstance(item, list):
+    # The user is requesting more than one item.
+else:
+    # The user is requesting only one item.
+\end{verbatim}
+
+This situation is common for example when a form contains a group of
+multiple checkboxes with the same name:
+
+\begin{verbatim}
+<input type="checkbox" name="item" value="1" />
+<input type="checkbox" name="item" value="2" />
+\end{verbatim}
+
+In most situations, however, there's only one form control with a
+particular name in a form and then you expect and need only one value
+associated with this name.  So you write a script containing for
+example this code:
+
+\begin{verbatim}
+user = form.getvalue("user").upper()
+\end{verbatim}
+
+The problem with the code is that you should never expect that a
+client will provide valid input to your scripts.  For example, if a
+curious user appends another \samp{user=foo} pair to the query string,
+then the script would crash, because in this situation the
+\code{getvalue("user")} method call returns a list instead of a
+string.  Calling the \method{toupper()} method on a list is not valid
+(since lists do not have a method of this name) and results in an
+\exception{AttributeError} exception.
+
+Therefore, the appropriate way to read form data values was to always
+use the code which checks whether the obtained value is a single value
+or a list of values.  That's annoying and leads to less readable
+scripts.
+
+A more convenient approach is to use the methods \method{getfirst()}
+and \method{getlist()} provided by this higher level interface.
+
+\begin{methoddesc}[FieldStorage]{getfirst}{name\optional{, default}}
+  This method always returns only one value associated with form field
+  \var{name}.  The method returns only the first value in case that
+  more values were posted under such name.  Please note that the order
+  in which the values are received may vary from browser to browser
+  and should not be counted on.\footnote{Note that some recent
+      versions of the HTML specification do state what order the
+      field values should be supplied in, but knowing whether a
+      request was received from a conforming browser, or even from a
+      browser at all, is tedious and error-prone.}  If no such form
+  field or value exists then the method returns the value specified by
+  the optional parameter \var{default}.  This parameter defaults to
+  \code{None} if not specified.
+\end{methoddesc}
+
+\begin{methoddesc}[FieldStorage]{getlist}{name}
+  This method always returns a list of values associated with form
+  field \var{name}.  The method returns an empty list if no such form
+  field or value exists for \var{name}.  It returns a list consisting
+  of one item if only one such value exists.
+\end{methoddesc}
+
+Using these methods you can write nice compact code:
+
+\begin{verbatim}
+import cgi
+form = cgi.FieldStorage()
+user = form.getfirst("user", "").upper()    # This way it's safe.
+for item in form.getlist("item"):
+    do_something(item)
+\end{verbatim}
+
+
+\subsection{Old classes}
+
+These classes, present in earlier versions of the \module{cgi} module,
+are still supported for backward compatibility.  New applications
+should use the \class{FieldStorage} class.
+
+\class{SvFormContentDict} stores single value form content as
+dictionary; it assumes each field name occurs in the form only once.
+
+\class{FormContentDict} stores multiple value form content as a
+dictionary (the form items are lists of values).  Useful if your form
+contains multiple fields with the same name.
+
+Other classes (\class{FormContent}, \class{InterpFormContentDict}) are
+present for backwards compatibility with really old applications only.
+If you still use these and would be inconvenienced when they
+disappeared from a next version of this module, drop me a note.
+
+
+\subsection{Functions}
+\nodename{Functions in cgi module}
+
+These are useful if you want more control, or if you want to employ
+some of the algorithms implemented in this module in other
+circumstances.
+
+\begin{funcdesc}{parse}{fp\optional{, keep_blank_values\optional{,
+                        strict_parsing}}}
+  Parse a query in the environment or from a file (the file defaults
+  to \code{sys.stdin}).  The \var{keep_blank_values} and
+  \var{strict_parsing} parameters are passed to \function{parse_qs()}
+  unchanged.
+\end{funcdesc}
+
+\begin{funcdesc}{parse_qs}{qs\optional{, keep_blank_values\optional{,
+                           strict_parsing}}}
+Parse a query string given as a string argument (data of type 
+\mimetype{application/x-www-form-urlencoded}).  Data are
+returned as a dictionary.  The dictionary keys are the unique query
+variable names and the values are lists of values for each name.
+
+The optional argument \var{keep_blank_values} is
+a flag indicating whether blank values in
+URL encoded queries should be treated as blank strings.  
+A true value indicates that blanks should be retained as 
+blank strings.  The default false value indicates that
+blank values are to be ignored and treated as if they were
+not included.
+
+The optional argument \var{strict_parsing} is a flag indicating what
+to do with parsing errors.  If false (the default), errors
+are silently ignored.  If true, errors raise a \exception{ValueError}
+exception.
+
+Use the \function{\refmodule{urllib}.urlencode()} function to convert
+such dictionaries into query strings.
+
+\function{\refmodule{urlparse}.parse_qs()} is internally called to return the
+parsed query string.
+
+\end{funcdesc}
+
+\begin{funcdesc}{parse_qsl}{qs\optional{, keep_blank_values\optional{,
+                            strict_parsing}}}
+Parse a query string given as a string argument (data of type 
+\mimetype{application/x-www-form-urlencoded}).  Data are
+returned as a list of name, value pairs.
+
+The optional argument \var{keep_blank_values} is
+a flag indicating whether blank values in
+URL encoded queries should be treated as blank strings.  
+A true value indicates that blanks should be retained as 
+blank strings.  The default false value indicates that
+blank values are to be ignored and treated as if they were
+not included.
+
+The optional argument \var{strict_parsing} is a flag indicating what
+to do with parsing errors.  If false (the default), errors
+are silently ignored.  If true, errors raise a \exception{ValueError}
+exception.
+
+Use the \function{\refmodule{urllib}.urlencode()} function to convert
+such lists of pairs into query strings.
+
+\function{\refmodule{urlparse}.parse_qsl()} is internally called to return the
+parsed query string list.
+
+\end{funcdesc}
+
+\begin{funcdesc}{parse_multipart}{fp, pdict}
+Parse input of type \mimetype{multipart/form-data} (for 
+file uploads).  Arguments are \var{fp} for the input file and
+\var{pdict} for a dictionary containing other parameters in
+the \mailheader{Content-Type} header.
+
+Returns a dictionary just like \function{parse_qs()} keys are the
+field names, each value is a list of values for that field.  This is
+easy to use but not much good if you are expecting megabytes to be
+uploaded --- in that case, use the \class{FieldStorage} class instead
+which is much more flexible.
+
+Note that this does not parse nested multipart parts --- use
+\class{FieldStorage} for that.
+\end{funcdesc}
+
+\begin{funcdesc}{parse_header}{string}
+Parse a MIME header (such as \mailheader{Content-Type}) into a main
+value and a dictionary of parameters.
+\end{funcdesc}
+
+\begin{funcdesc}{test}{}
+Robust test CGI script, usable as main program.
+Writes minimal HTTP headers and formats all information provided to
+the script in HTML form.
+\end{funcdesc}
+
+\begin{funcdesc}{print_environ}{}
+Format the shell environment in HTML.
+\end{funcdesc}
+
+\begin{funcdesc}{print_form}{form}
+Format a form in HTML.
+\end{funcdesc}
+
+\begin{funcdesc}{print_directory}{}
+Format the current directory in HTML.
+\end{funcdesc}
+
+\begin{funcdesc}{print_environ_usage}{}
+Print a list of useful (used by CGI) environment variables in
+HTML.
+\end{funcdesc}
+
+\begin{funcdesc}{escape}{s\optional{, quote}}
+Convert the characters
+\character{\&}, \character{<} and \character{>} in string \var{s} to
+HTML-safe sequences.  Use this if you need to display text that might
+contain such characters in HTML.  If the optional flag \var{quote} is
+true, the quotation mark character (\character{"}) is also translated;
+this helps for inclusion in an HTML attribute value, as in \code{<A
+HREF="...">}.  If the value to be quoted might include single- or
+double-quote characters, or both, consider using the
+\function{quoteattr()} function in the \refmodule{xml.sax.saxutils}
+module instead.
+\end{funcdesc}
+
+
+\subsection{Caring about security \label{cgi-security}}
+
+\indexii{CGI}{security}
+
+There's one important rule: if you invoke an external program (via the
+\function{os.system()} or \function{os.popen()} functions. or others
+with similar functionality), make very sure you don't pass arbitrary
+strings received from the client to the shell.  This is a well-known
+security hole whereby clever hackers anywhere on the Web can exploit a
+gullible CGI script to invoke arbitrary shell commands.  Even parts of
+the URL or field names cannot be trusted, since the request doesn't
+have to come from your form!
+
+To be on the safe side, if you must pass a string gotten from a form
+to a shell command, you should make sure the string contains only
+alphanumeric characters, dashes, underscores, and periods.
+
+
+\subsection{Installing your CGI script on a \UNIX\ system}
+
+Read the documentation for your HTTP server and check with your local
+system administrator to find the directory where CGI scripts should be
+installed; usually this is in a directory \file{cgi-bin} in the server tree.
+
+Make sure that your script is readable and executable by ``others''; the
+\UNIX{} file mode should be \code{0755} octal (use \samp{chmod 0755
+\var{filename}}).  Make sure that the first line of the script contains
+\code{\#!} starting in column 1 followed by the pathname of the Python
+interpreter, for instance:
+
+\begin{verbatim}
+#!/usr/local/bin/python
+\end{verbatim}
+
+Make sure the Python interpreter exists and is executable by ``others''.
+
+Make sure that any files your script needs to read or write are
+readable or writable, respectively, by ``others'' --- their mode
+should be \code{0644} for readable and \code{0666} for writable.  This
+is because, for security reasons, the HTTP server executes your script
+as user ``nobody'', without any special privileges.  It can only read
+(write, execute) files that everybody can read (write, execute).  The
+current directory at execution time is also different (it is usually
+the server's cgi-bin directory) and the set of environment variables
+is also different from what you get when you log in.  In particular, don't
+count on the shell's search path for executables (\envvar{PATH}) or
+the Python module search path (\envvar{PYTHONPATH}) to be set to
+anything interesting.
+
+If you need to load modules from a directory which is not on Python's
+default module search path, you can change the path in your script,
+before importing other modules.  For example:
+
+\begin{verbatim}
+import sys
+sys.path.insert(0, "/usr/home/joe/lib/python")
+sys.path.insert(0, "/usr/local/lib/python")
+\end{verbatim}
+
+(This way, the directory inserted last will be searched first!)
+
+Instructions for non-\UNIX{} systems will vary; check your HTTP server's
+documentation (it will usually have a section on CGI scripts).
+
+
+\subsection{Testing your CGI script}
+
+Unfortunately, a CGI script will generally not run when you try it
+from the command line, and a script that works perfectly from the
+command line may fail mysteriously when run from the server.  There's
+one reason why you should still test your script from the command
+line: if it contains a syntax error, the Python interpreter won't
+execute it at all, and the HTTP server will most likely send a cryptic
+error to the client.
+
+Assuming your script has no syntax errors, yet it does not work, you
+have no choice but to read the next section.
+
+
+\subsection{Debugging CGI scripts} \indexii{CGI}{debugging}
+
+First of all, check for trivial installation errors --- reading the
+section above on installing your CGI script carefully can save you a
+lot of time.  If you wonder whether you have understood the
+installation procedure correctly, try installing a copy of this module
+file (\file{cgi.py}) as a CGI script.  When invoked as a script, the file
+will dump its environment and the contents of the form in HTML form.
+Give it the right mode etc, and send it a request.  If it's installed
+in the standard \file{cgi-bin} directory, it should be possible to send it a
+request by entering a URL into your browser of the form:
+
+\begin{verbatim}
+http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
+\end{verbatim}
+
+If this gives an error of type 404, the server cannot find the script
+-- perhaps you need to install it in a different directory.  If it
+gives another error, there's an installation problem that
+you should fix before trying to go any further.  If you get a nicely
+formatted listing of the environment and form content (in this
+example, the fields should be listed as ``addr'' with value ``At Home''
+and ``name'' with value ``Joe Blow''), the \file{cgi.py} script has been
+installed correctly.  If you follow the same procedure for your own
+script, you should now be able to debug it.
+
+The next step could be to call the \module{cgi} module's
+\function{test()} function from your script: replace its main code
+with the single statement
+
+\begin{verbatim}
+cgi.test()
+\end{verbatim}
+
+This should produce the same results as those gotten from installing
+the \file{cgi.py} file itself.
+
+When an ordinary Python script raises an unhandled exception (for
+whatever reason: of a typo in a module name, a file that can't be
+opened, etc.), the Python interpreter prints a nice traceback and
+exits.  While the Python interpreter will still do this when your CGI
+script raises an exception, most likely the traceback will end up in
+one of the HTTP server's log files, or be discarded altogether.
+
+Fortunately, once you have managed to get your script to execute
+\emph{some} code, you can easily send tracebacks to the Web browser
+using the \refmodule{cgitb} module.  If you haven't done so already,
+just add the line:
+
+\begin{verbatim}
+import cgitb; cgitb.enable()
+\end{verbatim}
+
+to the top of your script.  Then try running it again; when a
+problem occurs, you should see a detailed report that will
+likely make apparent the cause of the crash.
+
+If you suspect that there may be a problem in importing the
+\refmodule{cgitb} module, you can use an even more robust approach
+(which only uses built-in modules):
+
+\begin{verbatim}
+import sys
+sys.stderr = sys.stdout
+print "Content-Type: text/plain"
+print
+...your code here...
+\end{verbatim}
+
+This relies on the Python interpreter to print the traceback.  The
+content type of the output is set to plain text, which disables all
+HTML processing.  If your script works, the raw HTML will be displayed
+by your client.  If it raises an exception, most likely after the
+first two lines have been printed, a traceback will be displayed.
+Because no HTML interpretation is going on, the traceback will be
+readable.
+
+
+\subsection{Common problems and solutions}
+
+\begin{itemize}
+\item Most HTTP servers buffer the output from CGI scripts until the
+script is completed.  This means that it is not possible to display a
+progress report on the client's display while the script is running.
+
+\item Check the installation instructions above.
+
+\item Check the HTTP server's log files.  (\samp{tail -f logfile} in a
+separate window may be useful!)
+
+\item Always check a script for syntax errors first, by doing something
+like \samp{python script.py}.
+
+\item If your script does not have any syntax errors, try adding
+\samp{import cgitb; cgitb.enable()} to the top of the script.
+
+\item When invoking external programs, make sure they can be found.
+Usually, this means using absolute path names --- \envvar{PATH} is
+usually not set to a very useful value in a CGI script.
+
+\item When reading or writing external files, make sure they can be read
+or written by the userid under which your CGI script will be running:
+this is typically the userid under which the web server is running, or some
+explicitly specified userid for a web server's \samp{suexec} feature.
+
+\item Don't try to give a CGI script a set-uid mode.  This doesn't work on
+most systems, and is a security liability as well.
+\end{itemize}
+

Added: sandbox/trunk/urilib/liburllib2.tex
==============================================================================
--- (empty file)
+++ sandbox/trunk/urilib/liburllib2.tex	Sun Aug  5 23:42:33 2007
@@ -0,0 +1,873 @@
+\section{\module{urllib2} ---
+         extensible library for opening URLs}
+
+\declaremodule{standard}{urllib2}
+\moduleauthor{Jeremy Hylton}{jhylton at users.sourceforge.net}
+\sectionauthor{Moshe Zadka}{moshez at users.sourceforge.net}
+
+\modulesynopsis{An extensible library for opening URLs using a variety of 
+                protocols}
+
+The \module{urllib2} module defines functions and classes which help
+in opening URLs (mostly HTTP) in a complex world --- basic and digest
+authentication, redirections, cookies and more.
+
+The \module{urllib2} module defines the following functions:
+
+\begin{funcdesc}{urlopen}{url\optional{, data}\optional{, timeout}}
+Open the URL \var{url}, which can be either a string or a \class{Request}
+object.
+
+\var{data} may be a string specifying additional data to send to the
+server, or \code{None} if no such data is needed. 
+Currently HTTP requests are the only ones that use \var{data};
+the HTTP request will be a POST instead of a GET when the \var{data}
+parameter is provided.  \var{data} should be a buffer in the standard
+\mimetype{application/x-www-form-urlencoded} format.  The
+\function{urllib.urlencode()} function takes a mapping or sequence of
+2-tuples and returns a string in this format.
+
+The optional \var{timeout} parameter specifies a timeout in seconds for the
+connection attempt (if not specified, or passed as None, the global default
+timeout setting will be used). This actually only work for HTTP, HTTPS, FTP
+and FTPS connections.
+
+This function returns a file-like object with two additional methods:
+
+\begin{itemize}
+  \item \method{geturl()} --- return the URL of the resource retrieved
+  \item \method{info()} --- return the meta-information of the page, as
+                            a dictionary-like object
+\end{itemize}
+
+Raises \exception{URLError} on errors.
+
+Note that \code{None} may be returned if no handler handles the
+request (though the default installed global \class{OpenerDirector}
+uses \class{UnknownHandler} to ensure this never happens).
+
+\versionchanged[\var{timeout} was added]{2.6}
+\end{funcdesc}
+
+\begin{funcdesc}{install_opener}{opener}
+Install an \class{OpenerDirector} instance as the default global
+opener.  Installing an opener is only necessary if you want urlopen to
+use that opener; otherwise, simply call \method{OpenerDirector.open()}
+instead of \function{urlopen()}.  The code does not check for a real
+\class{OpenerDirector}, and any class with the appropriate interface
+will work.
+\end{funcdesc}
+
+\begin{funcdesc}{build_opener}{\optional{handler, \moreargs}}
+Return an \class{OpenerDirector} instance, which chains the
+handlers in the order given. \var{handler}s can be either instances
+of \class{BaseHandler}, or subclasses of \class{BaseHandler} (in
+which case it must be possible to call the constructor without
+any parameters).  Instances of the following classes will be in
+front of the \var{handler}s, unless the \var{handler}s contain
+them, instances of them or subclasses of them:
+\class{ProxyHandler}, \class{UnknownHandler}, \class{HTTPHandler},
+\class{HTTPDefaultErrorHandler}, \class{HTTPRedirectHandler},
+\class{FTPHandler}, \class{FileHandler}, \class{HTTPErrorProcessor}.
+
+If the Python installation has SSL support (\function{socket.ssl()}
+exists), \class{HTTPSHandler} will also be added.
+
+Beginning in Python 2.3, a \class{BaseHandler} subclass may also
+change its \member{handler_order} member variable to modify its
+position in the handlers list.
+\end{funcdesc}
+
+
+The following exceptions are raised as appropriate:
+
+\begin{excdesc}{URLError}
+The handlers raise this exception (or derived exceptions) when they
+run into a problem.  It is a subclass of \exception{IOError}.
+\end{excdesc}
+
+\begin{excdesc}{HTTPError}
+A subclass of \exception{URLError}, it can also function as a 
+non-exceptional file-like return value (the same thing that
+\function{urlopen()} returns).  This is useful when handling exotic
+HTTP errors, such as requests for authentication.
+\end{excdesc}
+
+
+The following classes are provided:
+
+\begin{classdesc}{Request}{url\optional{, data}\optional{, headers}
+    \optional{, origin_req_host}\optional{, unverifiable}}
+This class is an abstraction of a URL request.
+
+\var{url} should be a string containing a valid URL.  
+
+\var{data} may be a string specifying additional data to send to the
+server, or \code{None} if no such data is needed. 
+Currently HTTP requests are the only ones that use \var{data};
+the HTTP request will be a POST instead of a GET when the \var{data}
+parameter is provided.  \var{data} should be a buffer in the standard
+\mimetype{application/x-www-form-urlencoded} format.  The
+\function{urllib.urlencode()} function takes a mapping or sequence of
+2-tuples and returns a string in this format.
+
+\var{headers} should be a dictionary, and will be treated as if
+\method{add_header()} was called with each key and value as arguments.
+
+The final two arguments are only of interest for correct handling of
+third-party HTTP cookies:
+
+\var{origin_req_host} should be the request-host of the origin
+transaction, as defined by \rfc{2965}.  It defaults to
+\code{cookielib.request_host(self)}.  This is the host name or IP
+address of the original request that was initiated by the user.  For
+example, if the request is for an image in an HTML document, this
+should be the request-host of the request for the page containing the
+image.
+
+\var{unverifiable} should indicate whether the request is
+unverifiable, as defined by RFC 2965.  It defaults to False.  An
+unverifiable request is one whose URL the user did not have the option
+to approve.  For example, if the request is for an image in an HTML
+document, and the user had no option to approve the automatic fetching
+of the image, this should be true.
+\end{classdesc}
+
+\begin{classdesc}{OpenerDirector}{}
+The \class{OpenerDirector} class opens URLs via \class{BaseHandler}s
+chained together. It manages the chaining of handlers, and recovery
+from errors.
+\end{classdesc}
+
+\begin{classdesc}{BaseHandler}{}
+This is the base class for all registered handlers --- and handles only
+the simple mechanics of registration.
+\end{classdesc}
+
+\begin{classdesc}{HTTPDefaultErrorHandler}{}
+A class which defines a default handler for HTTP error responses; all
+responses are turned into \exception{HTTPError} exceptions.
+\end{classdesc}
+
+\begin{classdesc}{HTTPRedirectHandler}{}
+A class to handle redirections.
+\end{classdesc}
+
+\begin{classdesc}{HTTPCookieProcessor}{\optional{cookiejar}}
+A class to handle HTTP Cookies.
+\end{classdesc}
+
+\begin{classdesc}{ProxyHandler}{\optional{proxies}}
+Cause requests to go through a proxy.
+If \var{proxies} is given, it must be a dictionary mapping
+protocol names to URLs of proxies.
+The default is to read the list of proxies from the environment
+variables \envvar{<protocol>_proxy}.
+\end{classdesc}
+
+\begin{classdesc}{HTTPPasswordMgr}{}
+Keep a database of 
+\code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})}
+mappings.
+\end{classdesc}
+
+\begin{classdesc}{HTTPPasswordMgrWithDefaultRealm}{}
+Keep a database of 
+\code{(\var{realm}, \var{uri}) -> (\var{user}, \var{password})} mappings.
+A realm of \code{None} is considered a catch-all realm, which is searched
+if no other realm fits.
+\end{classdesc}
+
+\begin{classdesc}{AbstractBasicAuthHandler}{\optional{password_mgr}}
+This is a mixin class that helps with HTTP authentication, both
+to the remote host and to a proxy.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{HTTPBasicAuthHandler}{\optional{password_mgr}}
+Handle authentication with the remote host.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{ProxyBasicAuthHandler}{\optional{password_mgr}}
+Handle authentication with the proxy.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{AbstractDigestAuthHandler}{\optional{password_mgr}}
+This is a mixin class that helps with HTTP authentication, both
+to the remote host and to a proxy.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{HTTPDigestAuthHandler}{\optional{password_mgr}}
+Handle authentication with the remote host.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{ProxyDigestAuthHandler}{\optional{password_mgr}}
+Handle authentication with the proxy.
+\var{password_mgr}, if given, should be something that is compatible
+with \class{HTTPPasswordMgr}; refer to section~\ref{http-password-mgr}
+for information on the interface that must be supported.
+\end{classdesc}
+
+\begin{classdesc}{HTTPHandler}{}
+A class to handle opening of HTTP URLs.
+\end{classdesc}
+
+\begin{classdesc}{HTTPSHandler}{}
+A class to handle opening of HTTPS URLs.
+\end{classdesc}
+
+\begin{classdesc}{FileHandler}{}
+Open local files.
+\end{classdesc}
+
+\begin{classdesc}{FTPHandler}{}
+Open FTP URLs.
+\end{classdesc}
+
+\begin{classdesc}{CacheFTPHandler}{}
+Open FTP URLs, keeping a cache of open FTP connections to minimize
+delays.
+\end{classdesc}
+
+\begin{classdesc}{UnknownHandler}{}
+A catch-all class to handle unknown URLs.
+\end{classdesc}
+
+
+\subsection{Request Objects \label{request-objects}}
+
+The following methods describe all of \class{Request}'s public interface,
+and so all must be overridden in subclasses.
+
+\begin{methoddesc}[Request]{add_data}{data}
+Set the \class{Request} data to \var{data}.  This is ignored by all
+handlers except HTTP handlers --- and there it should be a byte
+string, and will change the request to be \code{POST} rather than
+\code{GET}.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_method}{}
+Return a string indicating the HTTP request method.  This is only
+meaningful for HTTP requests, and currently always returns
+\code{'GET'} or \code{'POST'}.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{has_data}{}
+Return whether the instance has a non-\code{None} data.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_data}{}
+Return the instance's data.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{add_header}{key, val}
+Add another header to the request.  Headers are currently ignored by
+all handlers except HTTP handlers, where they are added to the list
+of headers sent to the server.  Note that there cannot be more than
+one header with the same name, and later calls will overwrite
+previous calls in case the \var{key} collides.  Currently, this is
+no loss of HTTP functionality, since all headers which have meaning
+when used more than once have a (header-specific) way of gaining the
+same functionality using only one header.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{add_unredirected_header}{key, header}
+Add a header that will not be added to a redirected request.
+\versionadded{2.4}
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{has_header}{header}
+Return whether the instance has the named header (checks both regular
+and unredirected).
+\versionadded{2.4}
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_full_url}{}
+Return the URL given in the constructor.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_type}{}
+Return the type of the URL --- also known as the scheme.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_host}{}
+Return the host to which a connection will be made.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_selector}{}
+Return the selector --- the part of the URL that is sent to
+the server.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{set_proxy}{host, type}
+Prepare the request by connecting to a proxy server. The \var{host}
+and \var{type} will replace those of the instance, and the instance's
+selector will be the original URL given in the constructor.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{get_origin_req_host}{}
+Return the request-host of the origin transaction, as defined by
+\rfc{2965}.  See the documentation for the \class{Request}
+constructor.
+\end{methoddesc}
+
+\begin{methoddesc}[Request]{is_unverifiable}{}
+Return whether the request is unverifiable, as defined by RFC 2965.
+See the documentation for the \class{Request} constructor.
+\end{methoddesc}
+
+
+\subsection{OpenerDirector Objects \label{opener-director-objects}}
+
+\class{OpenerDirector} instances have the following methods:
+
+\begin{methoddesc}[OpenerDirector]{add_handler}{handler}
+\var{handler} should be an instance of \class{BaseHandler}.  The
+following methods are searched, and added to the possible chains (note
+that HTTP errors are a special case).
+
+\begin{itemize}
+  \item \method{\var{protocol}_open()} ---
+    signal that the handler knows how to open \var{protocol} URLs.
+  \item \method{http_error_\var{type}()} ---
+    signal that the handler knows how to handle HTTP errors with HTTP
+    error code \var{type}.
+  \item \method{\var{protocol}_error()} ---
+    signal that the handler knows how to handle errors from
+    (non-\code{http}) \var{protocol}.
+  \item \method{\var{protocol}_request()} ---
+    signal that the handler knows how to pre-process \var{protocol}
+    requests.
+  \item \method{\var{protocol}_response()} ---
+    signal that the handler knows how to post-process \var{protocol}
+    responses.
+\end{itemize}
+\end{methoddesc}
+
+\begin{methoddesc}[OpenerDirector]{open}{url\optional{, data}{\optional{, timeout}}}
+Open the given \var{url} (which can be a request object or a string),
+optionally passing the given \var{data}.
+Arguments, return values and exceptions raised are the same as those
+of \function{urlopen()} (which simply calls the \method{open()} method
+on the currently installed global \class{OpenerDirector}).  The optional
+\var{timeout} parameter specifies a timeout in seconds for the connection 
+attempt (if not specified, or passed as None, the global default timeout 
+setting will be used; this actually only work for HTTP, HTTPS, FTP
+and FTPS connections).
+
+\versionchanged[\var{timeout} was added]{2.6}
+\end{methoddesc}
+
+\begin{methoddesc}[OpenerDirector]{error}{proto\optional{,
+                                          arg\optional{, \moreargs}}}
+Handle an error of the given protocol.  This will call the registered
+error handlers for the given protocol with the given arguments (which
+are protocol specific).  The HTTP protocol is a special case which
+uses the HTTP response code to determine the specific error handler;
+refer to the \method{http_error_*()} methods of the handler classes.
+
+Return values and exceptions raised are the same as those
+of \function{urlopen()}.
+\end{methoddesc}
+
+OpenerDirector objects open URLs in three stages:
+
+The order in which these methods are called within each stage is
+determined by sorting the handler instances.
+
+\begin{enumerate}
+  \item Every handler with a method named like
+    \method{\var{protocol}_request()} has that method called to
+    pre-process the request.
+
+  \item Handlers with a method named like
+    \method{\var{protocol}_open()} are called to handle the request.
+    This stage ends when a handler either returns a
+    non-\constant{None} value (ie. a response), or raises an exception
+    (usually \exception{URLError}).  Exceptions are allowed to propagate.
+
+    In fact, the above algorithm is first tried for methods named
+    \method{default_open}.  If all such methods return
+    \constant{None}, the algorithm is repeated for methods named like
+    \method{\var{protocol}_open()}.  If all such methods return
+    \constant{None}, the algorithm is repeated for methods named
+    \method{unknown_open()}.
+
+    Note that the implementation of these methods may involve calls of
+    the parent \class{OpenerDirector} instance's \method{.open()} and
+    \method{.error()} methods.
+
+  \item Every handler with a method named like
+    \method{\var{protocol}_response()} has that method called to
+    post-process the response.
+
+\end{enumerate}
+
+\subsection{BaseHandler Objects \label{base-handler-objects}}
+
+\class{BaseHandler} objects provide a couple of methods that are
+directly useful, and others that are meant to be used by derived
+classes.  These are intended for direct use:
+
+\begin{methoddesc}[BaseHandler]{add_parent}{director}
+Add a director as parent.
+\end{methoddesc}
+
+\begin{methoddesc}[BaseHandler]{close}{}
+Remove any parents.
+\end{methoddesc}
+
+The following members and methods should only be used by classes
+derived from \class{BaseHandler}.  \note{The convention has been
+adopted that subclasses defining \method{\var{protocol}_request()} or
+\method{\var{protocol}_response()} methods are named
+\class{*Processor}; all others are named \class{*Handler}.}
+
+
+\begin{memberdesc}[BaseHandler]{parent}
+A valid \class{OpenerDirector}, which can be used to open using a
+different protocol, or handle errors.
+\end{memberdesc}
+
+\begin{methoddesc}[BaseHandler]{default_open}{req}
+This method is \emph{not} defined in \class{BaseHandler}, but
+subclasses should define it if they want to catch all URLs.
+
+This method, if implemented, will be called by the parent
+\class{OpenerDirector}.  It should return a file-like object as
+described in the return value of the \method{open()} of
+\class{OpenerDirector}, or \code{None}.  It should raise
+\exception{URLError}, unless a truly exceptional thing happens (for
+example, \exception{MemoryError} should not be mapped to
+\exception{URLError}).
+
+This method will be called before any protocol-specific open method.
+\end{methoddesc}
+
+\begin{methoddescni}[BaseHandler]{\var{protocol}_open}{req}
+This method is \emph{not} defined in \class{BaseHandler}, but
+subclasses should define it if they want to handle URLs with the given
+protocol.
+
+This method, if defined, will be called by the parent
+\class{OpenerDirector}.  Return values should be the same as for 
+\method{default_open()}.
+\end{methoddescni}
+
+\begin{methoddesc}[BaseHandler]{unknown_open}{req}
+This method is \var{not} defined in \class{BaseHandler}, but
+subclasses should define it if they want to catch all URLs with no
+specific registered handler to open it.
+
+This method, if implemented, will be called by the \member{parent} 
+\class{OpenerDirector}.  Return values should be the same as for 
+\method{default_open()}.
+\end{methoddesc}
+
+\begin{methoddesc}[BaseHandler]{http_error_default}{req, fp, code, msg, hdrs}
+This method is \emph{not} defined in \class{BaseHandler}, but
+subclasses should override it if they intend to provide a catch-all
+for otherwise unhandled HTTP errors.  It will be called automatically
+by the  \class{OpenerDirector} getting the error, and should not
+normally be called in other circumstances.
+
+\var{req} will be a \class{Request} object, \var{fp} will be a
+file-like object with the HTTP error body, \var{code} will be the
+three-digit code of the error, \var{msg} will be the user-visible
+explanation of the code and \var{hdrs} will be a mapping object with
+the headers of the error.
+
+Return values and exceptions raised should be the same as those
+of \function{urlopen()}.
+\end{methoddesc}
+
+\begin{methoddesc}[BaseHandler]{http_error_\var{nnn}}{req, fp, code, msg, hdrs}
+\var{nnn} should be a three-digit HTTP error code.  This method is
+also not defined in \class{BaseHandler}, but will be called, if it
+exists, on an instance of a subclass, when an HTTP error with code
+\var{nnn} occurs.
+
+Subclasses should override this method to handle specific HTTP
+errors.
+
+Arguments, return values and exceptions raised should be the same as
+for \method{http_error_default()}.
+\end{methoddesc}
+
+\begin{methoddescni}[BaseHandler]{\var{protocol}_request}{req}
+This method is \emph{not} defined in \class{BaseHandler}, but
+subclasses should define it if they want to pre-process requests of
+the given protocol.
+
+This method, if defined, will be called by the parent
+\class{OpenerDirector}.  \var{req} will be a \class{Request} object.
+The return value should be a \class{Request} object.
+\end{methoddescni}
+
+\begin{methoddescni}[BaseHandler]{\var{protocol}_response}{req, response}
+This method is \emph{not} defined in \class{BaseHandler}, but
+subclasses should define it if they want to post-process responses of
+the given protocol.
+
+This method, if defined, will be called by the parent
+\class{OpenerDirector}.  \var{req} will be a \class{Request} object.
+\var{response} will be an object implementing the same interface as
+the return value of \function{urlopen()}.  The return value should
+implement the same interface as the return value of
+\function{urlopen()}.
+\end{methoddescni}
+
+\subsection{HTTPRedirectHandler Objects \label{http-redirect-handler}}
+
+\note{Some HTTP redirections require action from this module's client
+  code.  If this is the case, \exception{HTTPError} is raised.  See
+  \rfc{2616} for details of the precise meanings of the various
+  redirection codes.}
+
+\begin{methoddesc}[HTTPRedirectHandler]{redirect_request}{req,
+                                                  fp, code, msg, hdrs}
+Return a \class{Request} or \code{None} in response to a redirect.
+This is called by the default implementations of the
+\method{http_error_30*()} methods when a redirection is received from
+the server.  If a redirection should take place, return a new
+\class{Request} to allow \method{http_error_30*()} to perform the
+redirect.  Otherwise, raise \exception{HTTPError} if no other handler
+should try to handle this URL, or return \code{None} if you can't but
+another handler might.
+
+\begin{notice}
+ The default implementation of this method does not strictly
+ follow \rfc{2616}, which says that 301 and 302 responses to \code{POST}
+ requests must not be automatically redirected without confirmation by
+ the user.  In reality, browsers do allow automatic redirection of
+ these responses, changing the POST to a \code{GET}, and the default
+ implementation reproduces this behavior.
+\end{notice}
+\end{methoddesc}
+
+
+\begin{methoddesc}[HTTPRedirectHandler]{http_error_301}{req,
+                                                  fp, code, msg, hdrs}
+Redirect to the \code{Location:} URL.  This method is called by
+the parent \class{OpenerDirector} when getting an HTTP
+`moved permanently' response. The 301 redirection is cached as per
+\rfc{2616}.
+\end{methoddesc}
+
+\begin{methoddesc}[HTTPRedirectHandler]{http_error_302}{req,
+                                                  fp, code, msg, hdrs}
+The same as \method{http_error_301()}, but called for the
+`found' response.
+\end{methoddesc}
+
+\begin{methoddesc}[HTTPRedirectHandler]{http_error_303}{req,
+                                                  fp, code, msg, hdrs}
+The same as \method{http_error_301()}, but called for the
+`see other' response.
+\end{methoddesc}
+
+\begin{methoddesc}[HTTPRedirectHandler]{http_error_307}{req,
+                                                  fp, code, msg, hdrs}
+The same as \method{http_error_301()}, but called for the
+`temporary redirect' response.
+\end{methoddesc}
+
+
+\subsection{HTTPCookieProcessor Objects \label{http-cookie-processor}}
+
+\versionadded{2.4}
+
+\class{HTTPCookieProcessor} instances have one attribute:
+
+\begin{memberdesc}[HTTPCookieProcessor]{cookiejar}
+The \class{cookielib.CookieJar} in which cookies are stored.
+\end{memberdesc}
+
+
+\subsection{ProxyHandler Objects \label{proxy-handler}}
+
+\begin{methoddescni}[ProxyHandler]{\var{protocol}_open}{request}
+The \class{ProxyHandler} will have a method
+\method{\var{protocol}_open()} for every \var{protocol} which has a
+proxy in the \var{proxies} dictionary given in the constructor.  The
+method will modify requests to go through the proxy, by calling
+\code{request.set_proxy()}, and call the next handler in the chain to
+actually execute the protocol.
+\end{methoddescni}
+
+
+\subsection{HTTPPasswordMgr Objects \label{http-password-mgr}}
+
+These methods are available on \class{HTTPPasswordMgr} and
+\class{HTTPPasswordMgrWithDefaultRealm} objects.
+
+\begin{methoddesc}[HTTPPasswordMgr]{add_password}{realm, uri, user, passwd}
+\var{uri} can be either a single URI, or a sequence of URIs. \var{realm},
+\var{user} and \var{passwd} must be strings. This causes
+\code{(\var{user}, \var{passwd})} to be used as authentication tokens
+when authentication for \var{realm} and a super-URI of any of the
+given URIs is given.
+\end{methoddesc}  
+
+\begin{methoddesc}[HTTPPasswordMgr]{find_user_password}{realm, authuri}
+Get user/password for given realm and URI, if any.  This method will
+return \code{(None, None)} if there is no matching user/password.
+
+For \class{HTTPPasswordMgrWithDefaultRealm} objects, the realm
+\code{None} will be searched if the given \var{realm} has no matching
+user/password.
+\end{methoddesc}
+
+
+\subsection{AbstractBasicAuthHandler Objects
+            \label{abstract-basic-auth-handler}}
+
+\begin{methoddesc}[AbstractBasicAuthHandler]{http_error_auth_reqed}
+                                            {authreq, host, req, headers}
+Handle an authentication request by getting a user/password pair, and
+re-trying the request.  \var{authreq} should be the name of the header
+where the information about the realm is included in the request,
+\var{host} specifies the URL and path to authenticate for, \var{req}
+should be the (failed) \class{Request} object, and \var{headers}
+should be the error headers.
+
+\var{host} is either an authority (e.g. \code{"python.org"}) or a URL
+containing an authority component (e.g. \code{"http://python.org/"}).
+In either case, the authority must not contain a userinfo component
+(so, \code{"python.org"} and \code{"python.org:80"} are fine,
+\code{"joe:password at python.org"} is not).
+\end{methoddesc}
+
+
+\subsection{HTTPBasicAuthHandler Objects
+            \label{http-basic-auth-handler}}
+
+\begin{methoddesc}[HTTPBasicAuthHandler]{http_error_401}{req, fp, code, 
+                                                        msg, hdrs}
+Retry the request with authentication information, if available.
+\end{methoddesc}
+
+
+\subsection{ProxyBasicAuthHandler Objects
+            \label{proxy-basic-auth-handler}}
+
+\begin{methoddesc}[ProxyBasicAuthHandler]{http_error_407}{req, fp, code, 
+                                                        msg, hdrs}
+Retry the request with authentication information, if available.
+\end{methoddesc}
+
+
+\subsection{AbstractDigestAuthHandler Objects
+            \label{abstract-digest-auth-handler}}
+
+\begin{methoddesc}[AbstractDigestAuthHandler]{http_error_auth_reqed}
+                                            {authreq, host, req, headers}
+\var{authreq} should be the name of the header where the information about
+the realm is included in the request, \var{host} should be the host to
+authenticate to, \var{req} should be the (failed) \class{Request}
+object, and \var{headers} should be the error headers.
+\end{methoddesc}
+
+
+\subsection{HTTPDigestAuthHandler Objects
+            \label{http-digest-auth-handler}}
+
+\begin{methoddesc}[HTTPDigestAuthHandler]{http_error_401}{req, fp, code, 
+                                                        msg, hdrs}
+Retry the request with authentication information, if available.
+\end{methoddesc}
+
+
+\subsection{ProxyDigestAuthHandler Objects
+            \label{proxy-digest-auth-handler}}
+
+\begin{methoddesc}[ProxyDigestAuthHandler]{http_error_407}{req, fp, code, 
+                                                        msg, hdrs}
+Retry the request with authentication information, if available.
+\end{methoddesc}
+
+
+\subsection{HTTPHandler Objects \label{http-handler-objects}}
+
+\begin{methoddesc}[HTTPHandler]{http_open}{req}
+Send an HTTP request, which can be either GET or POST, depending on
+\code{\var{req}.has_data()}.
+\end{methoddesc}
+
+
+\subsection{HTTPSHandler Objects \label{https-handler-objects}}
+
+\begin{methoddesc}[HTTPSHandler]{https_open}{req}
+Send an HTTPS request, which can be either GET or POST, depending on
+\code{\var{req}.has_data()}.
+\end{methoddesc}
+
+
+\subsection{FileHandler Objects \label{file-handler-objects}}
+
+\begin{methoddesc}[FileHandler]{file_open}{req}
+Open the file locally, if there is no host name, or
+the host name is \code{'localhost'}. Change the
+protocol to \code{ftp} otherwise, and retry opening
+it using \member{parent}.
+\end{methoddesc}
+
+
+\subsection{FTPHandler Objects \label{ftp-handler-objects}}
+
+\begin{methoddesc}[FTPHandler]{ftp_open}{req}
+Open the FTP file indicated by \var{req}.
+The login is always done with empty username and password.
+\end{methoddesc}
+
+
+\subsection{CacheFTPHandler Objects \label{cacheftp-handler-objects}}
+
+\class{CacheFTPHandler} objects are \class{FTPHandler} objects with
+the following additional methods:
+
+\begin{methoddesc}[CacheFTPHandler]{setTimeout}{t}
+Set timeout of connections to \var{t} seconds.
+\end{methoddesc}
+
+\begin{methoddesc}[CacheFTPHandler]{setMaxConns}{m}
+Set maximum number of cached connections to \var{m}.
+\end{methoddesc}
+
+
+\subsection{UnknownHandler Objects \label{unknown-handler-objects}}
+
+\begin{methoddesc}[UnknownHandler]{unknown_open}{}
+Raise a \exception{URLError} exception.
+\end{methoddesc}
+
+
+\subsection{HTTPErrorProcessor Objects \label{http-error-processor-objects}}
+
+\versionadded{2.4}
+
+\begin{methoddesc}[HTTPErrorProcessor]{unknown_open}{}
+Process HTTP error responses.
+
+For 200 error codes, the response object is returned immediately.
+
+For non-200 error codes, this simply passes the job on to the
+\method{\var{protocol}_error_\var{code}()} handler methods, via
+\method{OpenerDirector.error()}.  Eventually,
+\class{urllib2.HTTPDefaultErrorHandler} will raise an
+\exception{HTTPError} if no other handler handles the error.
+\end{methoddesc}
+
+
+\subsection{Examples \label{urllib2-examples}}
+
+This example gets the python.org main page and displays the first 100
+bytes of it:
+
+\begin{verbatim}
+>>> import urllib2
+>>> f = urllib2.urlopen('http://www.python.org/')
+>>> print f.read(100)
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<?xml-stylesheet href="./css/ht2html
+\end{verbatim}
+
+Here we are sending a data-stream to the stdin of a CGI and reading
+the data it returns to us. Note that this example will only work when the
+Python installation supports SSL.
+
+\begin{verbatim}
+>>> import urllib2
+>>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
+...                       data='This data is passed to stdin of the CGI')
+>>> f = urllib2.urlopen(req)
+>>> print f.read()
+Got Data: "This data is passed to stdin of the CGI"
+\end{verbatim}
+
+The code for the sample CGI used in the above example is:
+
+\begin{verbatim}
+#!/usr/bin/env python
+import sys
+data = sys.stdin.read()
+print 'Content-type: text-plain\n\nGot Data: "%s"' % data
+\end{verbatim}
+
+
+Use of Basic HTTP Authentication:
+
+\begin{verbatim}
+import urllib2
+# Create an OpenerDirector with support for Basic HTTP Authentication...
+auth_handler = urllib2.HTTPBasicAuthHandler()
+auth_handler.add_password(realm='PDQ Application',
+                          uri='https://mahler:8092/site-updates.py',
+                          user='klem',
+                          passwd='kadidd!ehopper')
+opener = urllib2.build_opener(auth_handler)
+# ...and install it globally so it can be used with urlopen.
+urllib2.install_opener(opener)
+urllib2.urlopen('http://www.example.com/login.html')
+\end{verbatim}
+
+\function{build_opener()} provides many handlers by default, including a
+\class{ProxyHandler}.  By default, \class{ProxyHandler} uses the
+environment variables named \code{<scheme>_proxy}, where \code{<scheme>}
+is the URL scheme involved.  For example, the \envvar{http_proxy}
+environment variable is read to obtain the HTTP proxy's URL.
+
+This example replaces the default \class{ProxyHandler} with one that uses
+programatically-supplied proxy URLs, and adds proxy authorization support
+with \class{ProxyBasicAuthHandler}.
+
+\begin{verbatim}
+proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
+proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
+proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
+
+opener = build_opener(proxy_handler, proxy_auth_handler)
+# This time, rather than install the OpenerDirector, we use it directly:
+opener.open('http://www.example.com/login.html')
+\end{verbatim}
+
+
+Adding HTTP headers:
+
+Use the \var{headers} argument to the \class{Request} constructor, or:
+
+\begin{verbatim}
+import urllib2
+req = urllib2.Request('http://www.example.com/')
+req.add_header('Referer', 'http://www.python.org/')
+r = urllib2.urlopen(req)
+\end{verbatim}
+
+\class{OpenerDirector} automatically adds a \mailheader{User-Agent}
+header to every \class{Request}.  To change this:
+
+\begin{verbatim}
+import urllib2
+opener = urllib2.build_opener()
+opener.addheaders = [('User-agent', 'Mozilla/5.0')]
+opener.open('http://www.example.com/')
+\end{verbatim}
+
+Also, remember that a few standard headers
+(\mailheader{Content-Length}, \mailheader{Content-Type} and
+\mailheader{Host}) are added when the \class{Request} is passed to
+\function{urlopen()} (or \method{OpenerDirector.open()}).

Added: sandbox/trunk/urilib/liburlparse.tex
==============================================================================
--- (empty file)
+++ sandbox/trunk/urilib/liburlparse.tex	Sun Aug  5 23:42:33 2007
@@ -0,0 +1,310 @@
+\section{\module{urlparse} ---
+         Parse URLs into components}
+\declaremodule{standard}{urlparse}
+
+\modulesynopsis{Parse URLs into components.}
+
+\index{WWW}
+\index{World Wide Web}
+\index{URL}
+\indexii{URL}{parsing}
+\indexii{relative}{URL}
+
+
+This module defines a standard interface to break Uniform Resource
+Locator (URL) strings up in components (addressing scheme, network
+location, path etc.), to combine the components back into a URL
+string, and to convert a ``relative URL'' to an absolute URL given a
+``base URL.''
+
+The module has been designed to match the Internet RFC on Relative
+Uniform Resource Locators (and discovered a bug in an earlier
+draft!). It supports the following URL schemes:
+\code{file}, \code{ftp}, \code{gopher}, \code{hdl}, \code{http}, 
+\code{https}, \code{imap}, \code{mailto}, \code{mms}, \code{news}, 
+\code{nntp}, \code{prospero}, \code{rsync}, \code{rtsp}, \code{rtspu}, 
+\code{sftp}, \code{shttp}, \code{sip}, \code{sips}, \code{snews}, \code{svn}, 
+\code{svn+ssh}, \code{telnet}, \code{wais}.
+
+\versionadded[Support for the \code{sftp} and \code{sips} schemes]{2.5}
+
+The \module{urlparse} module defines the following functions:
+
+\begin{funcdesc}{urlparse}{urlstring\optional{,
+                           default_scheme\optional{, allow_fragments}}}
+Parse a URL into six components, returning a 6-tuple.  This
+corresponds to the general structure of a URL:
+\code{\var{scheme}://\var{netloc}/\var{path};\var{parameters}?\var{query}\#\var{fragment}}.
+Each tuple item is a string, possibly empty.
+The components are not broken up in smaller parts (for example, the network
+location is a single string), and \% escapes are not expanded.
+The delimiters as shown above are not part of the result,
+except for a leading slash in the \var{path} component, which is
+retained if present.  For example:
+
+\begin{verbatim}
+>>> from urlparse import urlparse
+>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
+>>> o
+('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '')
+>>> o.scheme
+'http'
+>>> o.port
+80
+>>> o.geturl()
+'http://www.cwi.nl:80/%7Eguido/Python.html'
+\end{verbatim}
+
+If the \var{default_scheme} argument is specified, it gives the
+default addressing scheme, to be used only if the URL does not
+specify one.  The default value for this argument is the empty string.
+
+If the \var{allow_fragments} argument is false, fragment identifiers
+are not allowed, even if the URL's addressing scheme normally does
+support them.  The default value for this argument is \constant{True}.
+
+The return value is actually an instance of a subclass of
+\pytype{tuple}.  This class has the following additional read-only
+convenience attributes:
+
+\begin{tableiv}{l|c|l|c}{member}{Attribute}{Index}{Value}{Value if not present}
+  \lineiv{scheme}  {0} {URL scheme specifier}             {empty string}
+  \lineiv{netloc}  {1} {Network location part}            {empty string}
+  \lineiv{path}    {2} {Hierarchical path}                {empty string}
+  \lineiv{params}  {3} {Parameters for last path element} {empty string}
+  \lineiv{query}   {4} {Query component}                  {empty string}
+  \lineiv{fragment}{5} {Fragment identifier}              {empty string}
+  \lineiv{username}{ } {User name}                        {\constant{None}}
+  \lineiv{password}{ } {Password}                         {\constant{None}}
+  \lineiv{hostname}{ } {Host name (lower case)}           {\constant{None}}
+  \lineiv{port}    { } {Port number as integer, if present} {\constant{None}}
+\end{tableiv}
+
+See section~\ref{urlparse-result-object}, ``Results of
+\function{urlparse()} and \function{urlsplit()},'' for more
+information on the result object.
+
+\versionchanged[Added attributes to return value]{2.5}
+\end{funcdesc}
+
+\begin{funcdesc}{urlunparse}{parts}
+Construct a URL from a tuple as returned by \code{urlparse()}.
+The \var{parts} argument can be any six-item iterable.
+This may result in a slightly different, but equivalent URL, if the
+URL that was parsed originally had unnecessary delimiters (for example,
+a ? with an empty query; the RFC states that these are equivalent).
+\end{funcdesc}
+
+\begin{funcdesc}{urlsplit}{urlstring\optional{,
+                           default_scheme\optional{, allow_fragments}}}
+This is similar to \function{urlparse()}, but does not split the
+params from the URL.  This should generally be used instead of
+\function{urlparse()} if the more recent URL syntax allowing
+parameters to be applied to each segment of the \var{path} portion of
+the URL (see \rfc{2396}) is wanted.  A separate function is needed to
+separate the path segments and parameters.  This function returns a
+5-tuple: (addressing scheme, network location, path, query, fragment
+identifier).
+
+The return value is actually an instance of a subclass of
+\pytype{tuple}.  This class has the following additional read-only
+convenience attributes:
+
+\begin{tableiv}{l|c|l|c}{member}{Attribute}{Index}{Value}{Value if not present}
+  \lineiv{scheme}   {0} {URL scheme specifier}   {empty string}
+  \lineiv{netloc}   {1} {Network location part}  {empty string}
+  \lineiv{path}     {2} {Hierarchical path}      {empty string}
+  \lineiv{query}    {3} {Query component}        {empty string}
+  \lineiv{fragment} {4} {Fragment identifier}    {empty string}
+  \lineiv{username} { } {User name}              {\constant{None}}
+  \lineiv{password} { } {Password}               {\constant{None}}
+  \lineiv{hostname} { } {Host name (lower case)} {\constant{None}}
+  \lineiv{port}     { } {Port number as integer, if present} {\constant{None}}
+  \lineiv{parsedquery} { } {parsed query string}  {empty string}
+  \lineiv{parsedquerylist} { } {parsed query string as list} {empty string}
+\end{tableiv}
+
+See section~\ref{urlparse-result-object}, ``Results of
+\function{urlparse()} and \function{urlsplit()},'' for more
+information on the result object.
+
+\versionadded{2.2}
+\versionchanged[Added attributes to return value]{2.5}
+\end{funcdesc}
+
+\begin{funcdesc}{urlunsplit}{parts}
+Combine the elements of a tuple as returned by \function{urlsplit()}
+into a complete URL as a string.
+The \var{parts} argument can be any five-item iterable.
+This may result in a slightly different, but equivalent URL, if the
+URL that was parsed originally had unnecessary delimiters (for example,
+a ? with an empty query; the RFC states that these are equivalent).
+\versionadded{2.2}
+\end{funcdesc}
+
+\begin{funcdesc}{urljoin}{base, url\optional{, allow_fragments}}
+Construct a full (``absolute'') URL by combining a ``base URL''
+(\var{base}) with another URL (\var{url}).  Informally, this
+uses components of the base URL, in particular the addressing scheme,
+the network location and (part of) the path, to provide missing
+components in the relative URL.  For example:
+
+\begin{verbatim}
+>>> from urlparse import urljoin
+>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
+'http://www.cwi.nl/%7Eguido/FAQ.html'
+\end{verbatim}
+
+The \var{allow_fragments} argument has the same meaning and default as
+for \function{urlparse()}.
+
+\note{If \var{url} is an absolute URL (that is, starting with \code{//}
+      or \code{scheme://}), the \var{url}'s host name and/or scheme
+      will be present in the result.  For example:}
+
+\begin{verbatim}
+>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
+...         '//www.python.org/%7Eguido')
+'http://www.python.org/%7Eguido'
+\end{verbatim}
+      
+If you do not want that behavior, preprocess
+the \var{url} with \function{urlsplit()} and \function{urlunsplit()},
+removing possible \emph{scheme} and \emph{netloc} parts.
+\end{funcdesc}
+
+\begin{funcdesc}{urldefrag}{url}
+If \var{url} contains a fragment identifier, returns a modified
+version of \var{url} with no fragment identifier, and the fragment
+identifier as a separate string.  If there is no fragment identifier
+in \var{url}, returns \var{url} unmodified and an empty string.
+\end{funcdesc}
+
+\begin{funcdesc}{parse_qs}{qs\optional{, keep_blank_values\optional{,
+                           strict_parsing}}}
+Parse a query string given as a string argument (data of type 
+\mimetype{application/x-www-form-urlencoded}).  Data are
+returned as a dictionary.  The dictionary keys are the unique query
+variable names and the values are lists of values for each name.
+
+The optional argument \var{keep_blank_values} is
+a flag indicating whether blank values in
+URL encoded queries should be treated as blank strings.  
+A true value indicates that blanks should be retained as 
+blank strings.  The default false value indicates that
+blank values are to be ignored and treated as if they were
+not included.
+
+The optional argument \var{strict_parsing} is a flag indicating what
+to do with parsing errors.  If false (the default), errors
+are silently ignored.  If true, errors raise a \exception{ValueError}
+exception.
+
+Use the \function{\refmodule{urllib}.urlencode()} function to convert
+such dictionaries into query strings. 
+
+This is also available as \function{parsedquery()} function at the instance of
+urlparse.
+
+\end{funcdesc}
+
+\begin{funcdesc}{parse_qsl}{qs\optional{, keep_blank_values\optional{,
+                            strict_parsing}}}
+Parse a query string given as a string argument (data of type 
+\mimetype{application/x-www-form-urlencoded}).  Data are
+returned as a list of name, value pairs.
+
+The optional argument \var{keep_blank_values} is
+a flag indicating whether blank values in
+URL encoded queries should be treated as blank strings.  
+A true value indicates that blanks should be retained as 
+blank strings.  The default false value indicates that
+blank values are to be ignored and treated as if they were
+not included.
+
+The optional argument \var{strict_parsing} is a flag indicating what
+to do with parsing errors.  If false (the default), errors
+are silently ignored.  If true, errors raise a \exception{ValueError}
+exception.
+
+Use the \function{\refmodule{urllib}.urlencode()} function to convert
+such lists of pairs into query strings.
+\end{funcdesc}
+
+This is also available as \function{parsedquerylist()} function at the
+instance of urlparse.
+
+
+
+\begin{seealso}
+  \seerfc{1738}{Uniform Resource Locators (URL)}{
+        This specifies the formal syntax and semantics of absolute
+        URLs.}
+  \seerfc{1808}{Relative Uniform Resource Locators}{
+        This Request For Comments includes the rules for joining an
+        absolute and a relative URL, including a fair number of
+        ``Abnormal Examples'' which govern the treatment of border
+        cases.}
+  \seerfc{2396}{Uniform Resource Identifiers (URI): Generic Syntax}{
+        Document describing the generic syntactic requirements for
+        both Uniform Resource Names (URNs) and Uniform Resource
+        Locators (URLs).}
+\end{seealso}
+
+
+\subsection{Results of \function{urlparse()} and \function{urlsplit()}
+            \label{urlparse-result-object}}
+
+The result objects from the \function{urlparse()} and
+\function{urlsplit()} functions are subclasses of the \pytype{tuple}
+type.  These subclasses add the attributes described in those
+functions, as well as provide an additional method:
+
+\begin{methoddesc}[ParseResult]{geturl}{}
+  Return the re-combined version of the original URL as a string.
+  This may differ from the original URL in that the scheme will always
+  be normalized to lower case and empty components may be dropped.
+  Specifically, empty parameters, queries, and fragment identifiers
+  will be removed.
+
+  The result of this method is a fixpoint if passed back through the
+  original parsing function:
+
+\begin{verbatim}
+>>> import urlparse
+>>> url = 'HTTP://www.Python.org/doc/#'
+
+>>> r1 = urlparse.urlsplit(url)
+>>> r1.geturl()
+'http://www.Python.org/doc/'
+
+>>> r2 = urlparse.urlsplit(r1.geturl())
+>>> r2.geturl()
+'http://www.Python.org/doc/'
+\end{verbatim}
+
+\versionadded{2.5}
+\end{methoddesc}
+
+The following classes provide the implementations of the parse results::
+
+\begin{classdesc*}{BaseResult}
+  Base class for the concrete result classes.  This provides most of
+  the attribute definitions.  It does not provide a \method{geturl()}
+  method.  It is derived from \class{tuple}, but does not override the
+  \method{__init__()} or \method{__new__()} methods.
+\end{classdesc*}
+
+
+\begin{classdesc}{ParseResult}{scheme, netloc, path, params, query, fragment}
+  Concrete class for \function{urlparse()} results.  The
+  \method{__new__()} method is overridden to support checking that the
+  right number of arguments are passed.
+\end{classdesc}
+
+
+\begin{classdesc}{SplitResult}{scheme, netloc, path, query, fragment}
+  Concrete class for \function{urlsplit()} results.  The
+  \method{__new__()} method is overridden to support checking that the
+  right number of arguments are passed.
+\end{classdesc}

Added: sandbox/trunk/urilib/test_urllib2.py
==============================================================================
--- (empty file)
+++ sandbox/trunk/urilib/test_urllib2.py	Sun Aug  5 23:42:33 2007
@@ -0,0 +1,1089 @@
+import unittest
+from test import test_support
+
+import os, socket
+import StringIO
+
+import urllib2
+from urllib2 import Request, OpenerDirector
+
+# XXX
+# Request
+# CacheFTPHandler (hard to write)
+# parse_keqv_list, parse_http_list, HTTPDigestAuthHandler
+
+class TrivialTests(unittest.TestCase):
+    def test_trivial(self):
+        # A couple trivial tests
+
+        self.assertRaises(ValueError, urllib2.urlopen, 'bogus url')
+
+        # XXX Name hacking to get this to work on Windows.
+        fname = os.path.abspath(urllib2.__file__).replace('\\', '/')
+        if fname[1:2] == ":":
+            fname = fname[2:]
+        # And more hacking to get it to work on MacOS. This assumes
+        # urllib.pathname2url works, unfortunately...
+        if os.name == 'mac':
+            fname = '/' + fname.replace(':', '/')
+        elif os.name == 'riscos':
+            import string
+            fname = os.expand(fname)
+            fname = fname.translate(string.maketrans("/.", "./"))
+
+        file_url = "file://%s" % fname
+        f = urllib2.urlopen(file_url)
+
+        buf = f.read()
+        f.close()
+
+    def test_parse_http_list(self):
+        tests = [('a,b,c', ['a', 'b', 'c']),
+                 ('path"o,l"og"i"cal, example', ['path"o,l"og"i"cal', 'example']),
+                 ('a, b, "c", "d", "e,f", g, h', ['a', 'b', '"c"', '"d"', '"e,f"', 'g', 'h']),
+                 ('a="b\\"c", d="e\\,f", g="h\\\\i"', ['a="b"c"', 'd="e,f"', 'g="h\\i"'])]
+        for string, list in tests:
+            self.assertEquals(urllib2.parse_http_list(string), list)
+
+
+def test_request_headers_dict():
+    """
+    The Request.headers dictionary is not a documented interface.  It should
+    stay that way, because the complete set of headers are only accessible
+    through the .get_header(), .has_header(), .header_items() interface.
+    However, .headers pre-dates those methods, and so real code will be using
+    the dictionary.
+
+    The introduction in 2.4 of those methods was a mistake for the same reason:
+    code that previously saw all (urllib2 user)-provided headers in .headers
+    now sees only a subset (and the function interface is ugly and incomplete).
+    A better change would have been to replace .headers dict with a dict
+    subclass (or UserDict.DictMixin instance?)  that preserved the .headers
+    interface and also provided access to the "unredirected" headers.  It's
+    probably too late to fix that, though.
+
+
+    Check .capitalize() case normalization:
+
+    >>> url = "http://example.com"
+    >>> Request(url, headers={"Spam-eggs": "blah"}).headers["Spam-eggs"]
+    'blah'
+    >>> Request(url, headers={"spam-EggS": "blah"}).headers["Spam-eggs"]
+    'blah'
+
+    Currently, Request(url, "Spam-eggs").headers["Spam-Eggs"] raises KeyError,
+    but that could be changed in future.
+
+    """
+
+def test_request_headers_methods():
+    """
+    Note the case normalization of header names here, to .capitalize()-case.
+    This should be preserved for backwards-compatibility.  (In the HTTP case,
+    normalization to .title()-case is done by urllib2 before sending headers to
+    httplib).
+
+    >>> url = "http://example.com"
+    >>> r = Request(url, headers={"Spam-eggs": "blah"})
+    >>> r.has_header("Spam-eggs")
+    True
+    >>> r.header_items()
+    [('Spam-eggs', 'blah')]
+    >>> r.add_header("Foo-Bar", "baz")
+    >>> items = r.header_items()
+    >>> items.sort()
+    >>> items
+    [('Foo-bar', 'baz'), ('Spam-eggs', 'blah')]
+
+    Note that e.g. r.has_header("spam-EggS") is currently False, and
+    r.get_header("spam-EggS") returns None, but that could be changed in
+    future.
+
+    >>> r.has_header("Not-there")
+    False
+    >>> print r.get_header("Not-there")
+    None
+    >>> r.get_header("Not-there", "default")
+    'default'
+
+    """
+
+
+def test_password_manager(self):
+    """
+    >>> mgr = urllib2.HTTPPasswordMgr()
+    >>> add = mgr.add_password
+    >>> add("Some Realm", "http://example.com/", "joe", "password")
+    >>> add("Some Realm", "http://example.com/ni", "ni", "ni")
+    >>> add("c", "http://example.com/foo", "foo", "ni")
+    >>> add("c", "http://example.com/bar", "bar", "nini")
+    >>> add("b", "http://example.com/", "first", "blah")
+    >>> add("b", "http://example.com/", "second", "spam")
+    >>> add("a", "http://example.com", "1", "a")
+    >>> add("Some Realm", "http://c.example.com:3128", "3", "c")
+    >>> add("Some Realm", "d.example.com", "4", "d")
+    >>> add("Some Realm", "e.example.com:3128", "5", "e")
+
+    >>> mgr.find_user_password("Some Realm", "example.com")
+    ('joe', 'password')
+    >>> mgr.find_user_password("Some Realm", "http://example.com")
+    ('joe', 'password')
+    >>> mgr.find_user_password("Some Realm", "http://example.com/")
+    ('joe', 'password')
+    >>> mgr.find_user_password("Some Realm", "http://example.com/spam")
+    ('joe', 'password')
+    >>> mgr.find_user_password("Some Realm", "http://example.com/spam/spam")
+    ('joe', 'password')
+    >>> mgr.find_user_password("c", "http://example.com/foo")
+    ('foo', 'ni')
+    >>> mgr.find_user_password("c", "http://example.com/bar")
+    ('bar', 'nini')
+
+    Actually, this is really undefined ATM
+##     Currently, we use the highest-level path where more than one match:
+
+##     >>> mgr.find_user_password("Some Realm", "http://example.com/ni")
+##     ('joe', 'password')
+
+    Use latest add_password() in case of conflict:
+
+    >>> mgr.find_user_password("b", "http://example.com/")
+    ('second', 'spam')
+
+    No special relationship between a.example.com and example.com:
+
+    >>> mgr.find_user_password("a", "http://example.com/")
+    ('1', 'a')
+    >>> mgr.find_user_password("a", "http://a.example.com/")
+    (None, None)
+
+    Ports:
+
+    >>> mgr.find_user_password("Some Realm", "c.example.com")
+    (None, None)
+    >>> mgr.find_user_password("Some Realm", "c.example.com:3128")
+    ('3', 'c')
+    >>> mgr.find_user_password("Some Realm", "http://c.example.com:3128")
+    ('3', 'c')
+    >>> mgr.find_user_password("Some Realm", "d.example.com")
+    ('4', 'd')
+    >>> mgr.find_user_password("Some Realm", "e.example.com:3128")
+    ('5', 'e')
+
+    """
+    pass
+
+
+def test_password_manager_default_port(self):
+    """
+    >>> mgr = urllib2.HTTPPasswordMgr()
+    >>> add = mgr.add_password
+
+    The point to note here is that we can't guess the default port if there's
+    no scheme.  This applies to both add_password and find_user_password.
+
+    >>> add("f", "http://g.example.com:80", "10", "j")
+    >>> add("g", "http://h.example.com", "11", "k")
+    >>> add("h", "i.example.com:80", "12", "l")
+    >>> add("i", "j.example.com", "13", "m")
+    >>> mgr.find_user_password("f", "g.example.com:100")
+    (None, None)
+    >>> mgr.find_user_password("f", "g.example.com:80")
+    ('10', 'j')
+    >>> mgr.find_user_password("f", "g.example.com")
+    (None, None)
+    >>> mgr.find_user_password("f", "http://g.example.com:100")
+    (None, None)
+    >>> mgr.find_user_password("f", "http://g.example.com:80")
+    ('10', 'j')
+    >>> mgr.find_user_password("f", "http://g.example.com")
+    ('10', 'j')
+    >>> mgr.find_user_password("g", "h.example.com")
+    ('11', 'k')
+    >>> mgr.find_user_password("g", "h.example.com:80")
+    ('11', 'k')
+    >>> mgr.find_user_password("g", "http://h.example.com:80")
+    ('11', 'k')
+    >>> mgr.find_user_password("h", "i.example.com")
+    (None, None)
+    >>> mgr.find_user_password("h", "i.example.com:80")
+    ('12', 'l')
+    >>> mgr.find_user_password("h", "http://i.example.com:80")
+    ('12', 'l')
+    >>> mgr.find_user_password("i", "j.example.com")
+    ('13', 'm')
+    >>> mgr.find_user_password("i", "j.example.com:80")
+    (None, None)
+    >>> mgr.find_user_password("i", "http://j.example.com")
+    ('13', 'm')
+    >>> mgr.find_user_password("i", "http://j.example.com:80")
+    (None, None)
+
+    """
+
+class MockOpener:
+    addheaders = []
+    def open(self, req, data=None):
+        self.req, self.data = req, data
+    def error(self, proto, *args):
+        self.proto, self.args = proto, args
+
+class MockFile:
+    def read(self, count=None): pass
+    def readline(self, count=None): pass
+    def close(self): pass
+
+class MockHeaders(dict):
+    def getheaders(self, name):
+        return self.values()
+
+class MockResponse(StringIO.StringIO):
+    def __init__(self, code, msg, headers, data, url=None):
+        StringIO.StringIO.__init__(self, data)
+        self.code, self.msg, self.headers, self.url = code, msg, headers, url
+    def info(self):
+        return self.headers
+    def geturl(self):
+        return self.url
+
+class MockCookieJar:
+    def add_cookie_header(self, request):
+        self.ach_req = request
+    def extract_cookies(self, response, request):
+        self.ec_req, self.ec_r = request, response
+
+class FakeMethod:
+    def __init__(self, meth_name, action, handle):
+        self.meth_name = meth_name
+        self.handle = handle
+        self.action = action
+    def __call__(self, *args):
+        return self.handle(self.meth_name, self.action, *args)
+
+class MockHandler:
+    # useful for testing handler machinery
+    # see add_ordered_mock_handlers() docstring
+    handler_order = 500
+    def __init__(self, methods):
+        self._define_methods(methods)
+    def _define_methods(self, methods):
+        for spec in methods:
+            if len(spec) == 2: name, action = spec
+            else: name, action = spec, None
+            meth = FakeMethod(name, action, self.handle)
+            setattr(self.__class__, name, meth)
+    def handle(self, fn_name, action, *args, **kwds):
+        self.parent.calls.append((self, fn_name, args, kwds))
+        if action is None:
+            return None
+        elif action == "return self":
+            return self
+        elif action == "return response":
+            res = MockResponse(200, "OK", {}, "")
+            return res
+        elif action == "return request":
+            return Request("http://blah/")
+        elif action.startswith("error"):
+            code = action[action.rfind(" ")+1:]
+            try:
+                code = int(code)
+            except ValueError:
+                pass
+            res = MockResponse(200, "OK", {}, "")
+            return self.parent.error("http", args[0], res, code, "", {})
+        elif action == "raise":
+            raise urllib2.URLError("blah")
+        assert False
+    def close(self): pass
+    def add_parent(self, parent):
+        self.parent = parent
+        self.parent.calls = []
+    def __lt__(self, other):
+        if not hasattr(other, "handler_order"):
+            # No handler_order, leave in original order.  Yuck.
+            return True
+        return self.handler_order < other.handler_order
+
+def add_ordered_mock_handlers(opener, meth_spec):
+    """Create MockHandlers and add them to an OpenerDirector.
+
+    meth_spec: list of lists of tuples and strings defining methods to define
+    on handlers.  eg:
+
+    [["http_error", "ftp_open"], ["http_open"]]
+
+    defines methods .http_error() and .ftp_open() on one handler, and
+    .http_open() on another.  These methods just record their arguments and
+    return None.  Using a tuple instead of a string causes the method to
+    perform some action (see MockHandler.handle()), eg:
+
+    [["http_error"], [("http_open", "return request")]]
+
+    defines .http_error() on one handler (which simply returns None), and
+    .http_open() on another handler, which returns a Request object.
+
+    """
+    handlers = []
+    count = 0
+    for meths in meth_spec:
+        class MockHandlerSubclass(MockHandler): pass
+        h = MockHandlerSubclass(meths)
+        h.handler_order += count
+        h.add_parent(opener)
+        count = count + 1
+        handlers.append(h)
+        opener.add_handler(h)
+    return handlers
+
+def build_test_opener(*handler_instances):
+    opener = OpenerDirector()
+    for h in handler_instances:
+        opener.add_handler(h)
+    return opener
+
+class MockHTTPHandler(urllib2.BaseHandler):
+    # useful for testing redirections and auth
+    # sends supplied headers and code as first response
+    # sends 200 OK as second response
+    def __init__(self, code, headers):
+        self.code = code
+        self.headers = headers
+        self.reset()
+    def reset(self):
+        self._count = 0
+        self.requests = []
+    def http_open(self, req):
+        import mimetools, httplib, copy
+        from StringIO import StringIO
+        self.requests.append(copy.deepcopy(req))
+        if self._count == 0:
+            self._count = self._count + 1
+            name = httplib.responses[self.code]
+            msg = mimetools.Message(StringIO(self.headers))
+            return self.parent.error(
+                "http", req, MockFile(), self.code, name, msg)
+        else:
+            self.req = req
+            msg = mimetools.Message(StringIO("\r\n\r\n"))
+            return MockResponse(200, "OK", msg, "", req.get_full_url())
+
+class MockPasswordManager:
+    def add_password(self, realm, uri, user, password):
+        self.realm = realm
+        self.url = uri
+        self.user = user
+        self.password = password
+    def find_user_password(self, realm, authuri):
+        self.target_realm = realm
+        self.target_url = authuri
+        return self.user, self.password
+
+
+class OpenerDirectorTests(unittest.TestCase):
+
+    def test_add_non_handler(self):
+        class NonHandler(object):
+            pass
+        self.assertRaises(TypeError,
+                          OpenerDirector().add_handler, NonHandler())
+
+    def test_badly_named_methods(self):
+        # test work-around for three methods that accidentally follow the
+        # naming conventions for handler methods
+        # (*_open() / *_request() / *_response())
+
+        # These used to call the accidentally-named methods, causing a
+        # TypeError in real code; here, returning self from these mock
+        # methods would either cause no exception, or AttributeError.
+
+        from urllib2 import URLError
+
+        o = OpenerDirector()
+        meth_spec = [
+            [("do_open", "return self"), ("proxy_open", "return self")],
+            [("redirect_request", "return self")],
+            ]
+        handlers = add_ordered_mock_handlers(o, meth_spec)
+        o.add_handler(urllib2.UnknownHandler())
+        for scheme in "do", "proxy", "redirect":
+            self.assertRaises(URLError, o.open, scheme+"://example.com/")
+
+    def test_handled(self):
+        # handler returning non-None means no more handlers will be called
+        o = OpenerDirector()
+        meth_spec = [
+            ["http_open", "ftp_open", "http_error_302"],
+            ["ftp_open"],
+            [("http_open", "return self")],
+            [("http_open", "return self")],
+            ]
+        handlers = add_ordered_mock_handlers(o, meth_spec)
+
+        req = Request("http://example.com/")
+        r = o.open(req)
+        # Second .http_open() gets called, third doesn't, since second returned
+        # non-None.  Handlers without .http_open() never get any methods called
+        # on them.
+        # In fact, second mock handler defining .http_open() returns self
+        # (instead of response), which becomes the OpenerDirector's return
+        # value.
+        self.assertEqual(r, handlers[2])
+        calls = [(handlers[0], "http_open"), (handlers[2], "http_open")]
+        for expected, got in zip(calls, o.calls):
+            handler, name, args, kwds = got
+            self.assertEqual((handler, name), expected)
+            self.assertEqual(args, (req,))
+
+    def test_handler_order(self):
+        o = OpenerDirector()
+        handlers = []
+        for meths, handler_order in [
+            ([("http_open", "return self")], 500),
+            (["http_open"], 0),
+            ]:
+            class MockHandlerSubclass(MockHandler): pass
+            h = MockHandlerSubclass(meths)
+            h.handler_order = handler_order
+            handlers.append(h)
+            o.add_handler(h)
+
+        r = o.open("http://example.com/")
+        # handlers called in reverse order, thanks to their sort order
+        self.assertEqual(o.calls[0][0], handlers[1])
+        self.assertEqual(o.calls[1][0], handlers[0])
+
+    def test_raise(self):
+        # raising URLError stops processing of request
+        o = OpenerDirector()
+        meth_spec = [
+            [("http_open", "raise")],
+            [("http_open", "return self")],
+            ]
+        handlers = add_ordered_mock_handlers(o, meth_spec)
+
+        req = Request("http://example.com/")
+        self.assertRaises(urllib2.URLError, o.open, req)
+        self.assertEqual(o.calls, [(handlers[0], "http_open", (req,), {})])
+
+##     def test_error(self):
+##         # XXX this doesn't actually seem to be used in standard library,
+##         #  but should really be tested anyway...
+
+    def test_http_error(self):
+        # XXX http_error_default
+        # http errors are a special case
+        o = OpenerDirector()
+        meth_spec = [
+            [("http_open", "error 302")],
+            [("http_error_400", "raise"), "http_open"],
+            [("http_error_302", "return response"), "http_error_303",
+             "http_error"],
+            [("http_error_302")],
+            ]
+        handlers = add_ordered_mock_handlers(o, meth_spec)
+
+        class Unknown:
+            def __eq__(self, other): return True
+
+        req = Request("http://example.com/")
+        r = o.open(req)
+        assert len(o.calls) == 2
+        calls = [(handlers[0], "http_open", (req,)),
+                 (handlers[2], "http_error_302",
+                  (req, Unknown(), 302, "", {}))]
+        for expected, got in zip(calls, o.calls):
+            handler, method_name, args = expected
+            self.assertEqual((handler, method_name), got[:2])
+            self.assertEqual(args, got[2])
+
+    def test_processors(self):
+        # *_request / *_response methods get called appropriately
+        o = OpenerDirector()
+        meth_spec = [
+            [("http_request", "return request"),
+             ("http_response", "return response")],
+            [("http_request", "return request"),
+             ("http_response", "return response")],
+            ]
+        handlers = add_ordered_mock_handlers(o, meth_spec)
+
+        req = Request("http://example.com/")
+        r = o.open(req)
+        # processor methods are called on *all* handlers that define them,
+        # not just the first handler that handles the request
+        calls = [
+            (handlers[0], "http_request"), (handlers[1], "http_request"),
+            (handlers[0], "http_response"), (handlers[1], "http_response")]
+
+        for i, (handler, name, args, kwds) in enumerate(o.calls):
+            if i < 2:
+                # *_request
+                self.assertEqual((handler, name), calls[i])
+                self.assertEqual(len(args), 1)
+                self.assert_(isinstance(args[0], Request))
+            else:
+                # *_response
+                self.assertEqual((handler, name), calls[i])
+                self.assertEqual(len(args), 2)
+                self.assert_(isinstance(args[0], Request))
+                # response from opener.open is None, because there's no
+                # handler that defines http_open to handle it
+                self.assert_(args[1] is None or
+                             isinstance(args[1], MockResponse))
+
+
+def sanepathname2url(path):
+    import urllib
+    urlpath = urllib.pathname2url(path)
+    if os.name == "nt" and urlpath.startswith("///"):
+        urlpath = urlpath[2:]
+    # XXX don't ask me about the mac...
+    return urlpath
+
+class HandlerTests(unittest.TestCase):
+
+    def test_ftp(self):
+        class MockFTPWrapper:
+            def __init__(self, data): self.data = data
+            def retrfile(self, filename, filetype):
+                self.filename, self.filetype = filename, filetype
+                return StringIO.StringIO(self.data), len(self.data)
+
+        class NullFTPHandler(urllib2.FTPHandler):
+            def __init__(self, data): self.data = data
+            def connect_ftp(self, user, passwd, host, port, dirs, timeout=None):
+                self.user, self.passwd = user, passwd
+                self.host, self.port = host, port
+                self.dirs = dirs
+                self.ftpwrapper = MockFTPWrapper(self.data)
+                return self.ftpwrapper
+
+        import ftplib, socket
+        data = "rheum rhaponicum"
+        h = NullFTPHandler(data)
+        o = h.parent = MockOpener()
+
+        for url, host, port, type_, dirs, filename, mimetype in [
+            ("ftp://localhost/foo/bar/baz.html",
+             "localhost", ftplib.FTP_PORT, "I",
+             ["foo", "bar"], "baz.html", "text/html"),
+            ("ftp://localhost:80/foo/bar/",
+             "localhost", 80, "D",
+             ["foo", "bar"], "", None),
+            ("ftp://localhost/baz.gif;type=a",
+             "localhost", ftplib.FTP_PORT, "A",
+             [], "baz.gif", None),  # XXX really this should guess image/gif
+            ]:
+            req = Request(url)
+            req.timeout = None
+            r = h.ftp_open(req)
+            # ftp authentication not yet implemented by FTPHandler
+            self.assert_(h.user == h.passwd == "")
+            self.assertEqual(h.host, socket.gethostbyname(host))
+            self.assertEqual(h.port, port)
+            self.assertEqual(h.dirs, dirs)
+            self.assertEqual(h.ftpwrapper.filename, filename)
+            self.assertEqual(h.ftpwrapper.filetype, type_)
+            headers = r.info()
+            self.assertEqual(headers.get("Content-type"), mimetype)
+            self.assertEqual(int(headers["Content-length"]), len(data))
+
+    def test_file(self):
+        import time, rfc822, socket
+        h = urllib2.FileHandler()
+        o = h.parent = MockOpener()
+
+        TESTFN = test_support.TESTFN
+        urlpath = sanepathname2url(os.path.abspath(TESTFN))
+        towrite = "hello, world\n"
+        urls = [
+            "file://localhost%s" % urlpath,
+            "file://%s" % urlpath,
+            "file://%s%s" % (socket.gethostbyname('localhost'), urlpath),
+            ]
+        try:
+            localaddr = socket.gethostbyname(socket.gethostname())
+        except socket.gaierror:
+            localaddr = ''
+        if localaddr:
+            urls.append("file://%s%s" % (localaddr, urlpath))
+
+        for url in urls:
+            f = open(TESTFN, "wb")
+            try:
+                try:
+                    f.write(towrite)
+                finally:
+                    f.close()
+
+                r = h.file_open(Request(url))
+                try:
+                    data = r.read()
+                    headers = r.info()
+                    newurl = r.geturl()
+                finally:
+                    r.close()
+                stats = os.stat(TESTFN)
+                modified = rfc822.formatdate(stats.st_mtime)
+            finally:
+                os.remove(TESTFN)
+            self.assertEqual(data, towrite)
+            self.assertEqual(headers["Content-type"], "text/plain")
+            self.assertEqual(headers["Content-length"], "13")
+            self.assertEqual(headers["Last-modified"], modified)
+
+        for url in [
+            "file://localhost:80%s" % urlpath,
+            "file:///file_does_not_exist.txt",
+            "file://%s:80%s/%s" % (socket.gethostbyname('localhost'),
+                                   os.getcwd(), TESTFN),
+            "file://somerandomhost.ontheinternet.com%s/%s" %
+            (os.getcwd(), TESTFN),
+            ]:
+            try:
+                f = open(TESTFN, "wb")
+                try:
+                    f.write(towrite)
+                finally:
+                    f.close()
+
+                self.assertRaises(urllib2.URLError,
+                                  h.file_open, Request(url))
+            finally:
+                os.remove(TESTFN)
+
+        h = urllib2.FileHandler()
+        o = h.parent = MockOpener()
+        # XXXX why does // mean ftp (and /// mean not ftp!), and where
+        #  is file: scheme specified?  I think this is really a bug, and
+        #  what was intended was to distinguish between URLs like:
+        # file:/blah.txt (a file)
+        # file://localhost/blah.txt (a file)
+        # file:///blah.txt (a file)
+        # file://ftp.example.com/blah.txt (an ftp URL)
+        for url, ftp in [
+            ("file://ftp.example.com//foo.txt", True),
+            ("file://ftp.example.com///foo.txt", False),
+# XXXX bug: fails with OSError, should be URLError
+            ("file://ftp.example.com/foo.txt", False),
+            ]:
+            req = Request(url)
+            try:
+                h.file_open(req)
+            # XXXX remove OSError when bug fixed
+            except (urllib2.URLError, OSError):
+                self.assert_(not ftp)
+            else:
+                self.assert_(o.req is req)
+                self.assertEqual(req.type, "ftp")
+
+    def test_http(self):
+        class MockHTTPResponse:
+            def __init__(self, fp, msg, status, reason):
+                self.fp = fp
+                self.msg = msg
+                self.status = status
+                self.reason = reason
+            def read(self):
+                return ''
+        class MockHTTPClass:
+            def __init__(self):
+                self.req_headers = []
+                self.data = None
+                self.raise_on_endheaders = False
+            def __call__(self, host, timeout=None):
+                self.host = host
+                self.timeout = timeout
+                return self
+            def set_debuglevel(self, level):
+                self.level = level
+            def request(self, method, url, body=None, headers={}):
+                self.method = method
+                self.selector = url
+                self.req_headers += headers.items()
+                self.req_headers.sort()
+                if body:
+                    self.data = body
+                if self.raise_on_endheaders:
+                    import socket
+                    raise socket.error()
+            def getresponse(self):
+                return MockHTTPResponse(MockFile(), {}, 200, "OK")
+
+        h = urllib2.AbstractHTTPHandler()
+        o = h.parent = MockOpener()
+
+        url = "http://example.com/"
+        for method, data in [("GET", None), ("POST", "blah")]:
+            req = Request(url, data, {"Foo": "bar"})
+            req.timeout = None
+            req.add_unredirected_header("Spam", "eggs")
+            http = MockHTTPClass()
+            r = h.do_open(http, req)
+
+            # result attributes
+            r.read; r.readline  # wrapped MockFile methods
+            r.info; r.geturl  # addinfourl methods
+            r.code, r.msg == 200, "OK"  # added from MockHTTPClass.getreply()
+            hdrs = r.info()
+            hdrs.get; hdrs.has_key  # r.info() gives dict from .getreply()
+            self.assertEqual(r.geturl(), url)
+
+            self.assertEqual(http.host, "example.com")
+            self.assertEqual(http.level, 0)
+            self.assertEqual(http.method, method)
+            self.assertEqual(http.selector, "/")
+            self.assertEqual(http.req_headers,
+                             [("Connection", "close"),
+                              ("Foo", "bar"), ("Spam", "eggs")])
+            self.assertEqual(http.data, data)
+
+        # check socket.error converted to URLError
+        http.raise_on_endheaders = True
+        self.assertRaises(urllib2.URLError, h.do_open, http, req)
+
+        # check adding of standard headers
+        o.addheaders = [("Spam", "eggs")]
+        for data in "", None:  # POST, GET
+            req = Request("http://example.com/", data)
+            r = MockResponse(200, "OK", {}, "")
+            newreq = h.do_request_(req)
+            if data is None:  # GET
+                self.assert_("Content-length" not in req.unredirected_hdrs)
+                self.assert_("Content-type" not in req.unredirected_hdrs)
+            else:  # POST
+                self.assertEqual(req.unredirected_hdrs["Content-length"], "0")
+                self.assertEqual(req.unredirected_hdrs["Content-type"],
+                             "application/x-www-form-urlencoded")
+            # XXX the details of Host could be better tested
+            self.assertEqual(req.unredirected_hdrs["Host"], "example.com")
+            self.assertEqual(req.unredirected_hdrs["Spam"], "eggs")
+
+            # don't clobber existing headers
+            req.add_unredirected_header("Content-length", "foo")
+            req.add_unredirected_header("Content-type", "bar")
+            req.add_unredirected_header("Host", "baz")
+            req.add_unredirected_header("Spam", "foo")
+            newreq = h.do_request_(req)
+            self.assertEqual(req.unredirected_hdrs["Content-length"], "foo")
+            self.assertEqual(req.unredirected_hdrs["Content-type"], "bar")
+            self.assertEqual(req.unredirected_hdrs["Host"], "baz")
+            self.assertEqual(req.unredirected_hdrs["Spam"], "foo")
+
+    def test_errors(self):
+        h = urllib2.HTTPErrorProcessor()
+        o = h.parent = MockOpener()
+
+        url = "http://example.com/"
+        req = Request(url)
+        # all 2xx are passed through
+        r = MockResponse(200, "OK", {}, "", url)
+        newr = h.http_response(req, r)
+        self.assert_(r is newr)
+        self.assert_(not hasattr(o, "proto"))  # o.error not called
+        r = MockResponse(202, "Accepted", {}, "", url)
+        newr = h.http_response(req, r)
+        self.assert_(r is newr)
+        self.assert_(not hasattr(o, "proto"))  # o.error not called
+        r = MockResponse(206, "Partial content", {}, "", url)
+        newr = h.http_response(req, r)
+        self.assert_(r is newr)
+        self.assert_(not hasattr(o, "proto"))  # o.error not called
+        # anything else calls o.error (and MockOpener returns None, here)
+        r = MockResponse(502, "Bad gateway", {}, "", url)
+        self.assert_(h.http_response(req, r) is None)
+        self.assertEqual(o.proto, "http")  # o.error called
+        self.assertEqual(o.args, (req, r, 502, "Bad gateway", {}))
+
+    def test_cookies(self):
+        cj = MockCookieJar()
+        h = urllib2.HTTPCookieProcessor(cj)
+        o = h.parent = MockOpener()
+
+        req = Request("http://example.com/")
+        r = MockResponse(200, "OK", {}, "")
+        newreq = h.http_request(req)
+        self.assert_(cj.ach_req is req is newreq)
+        self.assertEquals(req.get_origin_req_host(), "example.com")
+        self.assert_(not req.is_unverifiable())
+        newr = h.http_response(req, r)
+        self.assert_(cj.ec_req is req)
+        self.assert_(cj.ec_r is r is newr)
+
+    def test_redirect(self):
+        from_url = "http://example.com/a.html"
+        to_url = "http://example.com/b.html"
+        h = urllib2.HTTPRedirectHandler()
+        o = h.parent = MockOpener()
+
+        # ordinary redirect behaviour
+        for code in 301, 302, 303, 307:
+            for data in None, "blah\nblah\n":
+                method = getattr(h, "http_error_%s" % code)
+                req = Request(from_url, data)
+                req.add_header("Nonsense", "viking=withhold")
+                req.add_unredirected_header("Spam", "spam")
+                try:
+                    method(req, MockFile(), code, "Blah",
+                           MockHeaders({"location": to_url}))
+                except urllib2.HTTPError:
+                    # 307 in response to POST requires user OK
+                    self.assert_(code == 307 and data is not None)
+                self.assertEqual(o.req.get_full_url(), to_url)
+                try:
+                    self.assertEqual(o.req.get_method(), "GET")
+                except AttributeError:
+                    self.assert_(not o.req.has_data())
+                self.assertEqual(o.req.headers["Nonsense"],
+                                 "viking=withhold")
+                self.assert_("Spam" not in o.req.headers)
+                self.assert_("Spam" not in o.req.unredirected_hdrs)
+
+        # loop detection
+        req = Request(from_url)
+        def redirect(h, req, url=to_url):
+            h.http_error_302(req, MockFile(), 302, "Blah",
+                             MockHeaders({"location": url}))
+        # Note that the *original* request shares the same record of
+        # redirections with the sub-requests caused by the redirections.
+
+        # detect infinite loop redirect of a URL to itself
+        req = Request(from_url, origin_req_host="example.com")
+        count = 0
+        try:
+            while 1:
+                redirect(h, req, "http://example.com/")
+                count = count + 1
+        except urllib2.HTTPError:
+            # don't stop until max_repeats, because cookies may introduce state
+            self.assertEqual(count, urllib2.HTTPRedirectHandler.max_repeats)
+
+        # detect endless non-repeating chain of redirects
+        req = Request(from_url, origin_req_host="example.com")
+        count = 0
+        try:
+            while 1:
+                redirect(h, req, "http://example.com/%d" % count)
+                count = count + 1
+        except urllib2.HTTPError:
+            self.assertEqual(count,
+                             urllib2.HTTPRedirectHandler.max_redirections)
+        # test cached redirection implemented for 301 redirection
+        def cached_redirect(h, req, url=to_url):
+            h.http_error_301(req, MockFile(), 301, "Blah",
+                    MockHeaders({"location":url}))
+        req = Request(from_url, origin_req_host="example.com")
+        count = 0
+        try:
+            while 1:
+                cached_redirect(h, req, "http://example.com/")
+                count = count + 1
+                if count > 1:
+                    # Check for presence of a cached dictionary.
+                    # Content not checked as tests returns None.
+                    # Independent redirection tests for 301 handled,however.
+                    self.assert_(h.cache)
+        except urllib2.HTTPError:
+            self.assertEqual(count, urllib2.HTTPRedirectHandler.max_repeats)
+
+    def test_cookie_redirect(self):
+        # cookies shouldn't leak into redirected requests
+        from cookielib import CookieJar
+
+        from test.test_cookielib import interact_netscape
+
+        cj = CookieJar()
+        interact_netscape(cj, "http://www.example.com/", "spam=eggs")
+        hh = MockHTTPHandler(302, "Location: http://www.cracker.com/\r\n\r\n")
+        hdeh = urllib2.HTTPDefaultErrorHandler()
+        hrh = urllib2.HTTPRedirectHandler()
+        cp = urllib2.HTTPCookieProcessor(cj)
+        o = build_test_opener(hh, hdeh, hrh, cp)
+        o.open("http://www.example.com/")
+        self.assert_(not hh.req.has_header("Cookie"))
+
+    def test_proxy(self):
+        o = OpenerDirector()
+        ph = urllib2.ProxyHandler(dict(http="proxy.example.com:3128"))
+        o.add_handler(ph)
+        meth_spec = [
+            [("http_open", "return response")]
+            ]
+        handlers = add_ordered_mock_handlers(o, meth_spec)
+
+        req = Request("http://acme.example.com/")
+        self.assertEqual(req.get_host(), "acme.example.com")
+        r = o.open(req)
+        self.assertEqual(req.get_host(), "proxy.example.com:3128")
+
+        self.assertEqual([(handlers[0], "http_open")],
+                         [tup[0:2] for tup in o.calls])
+
+    def test_basic_auth(self):
+        opener = OpenerDirector()
+        password_manager = MockPasswordManager()
+        auth_handler = urllib2.HTTPBasicAuthHandler(password_manager)
+        realm = "ACME Widget Store"
+        http_handler = MockHTTPHandler(
+            401, 'WWW-Authenticate: Basic realm="%s"\r\n\r\n' % realm)
+        opener.add_handler(auth_handler)
+        opener.add_handler(http_handler)
+        self._test_basic_auth(opener, auth_handler, "Authorization",
+                              realm, http_handler, password_manager,
+                              "http://acme.example.com/protected",
+                              "http://acme.example.com/protected",
+                              )
+
+    def test_proxy_basic_auth(self):
+        opener = OpenerDirector()
+        ph = urllib2.ProxyHandler(dict(http="proxy.example.com:3128"))
+        opener.add_handler(ph)
+        password_manager = MockPasswordManager()
+        auth_handler = urllib2.ProxyBasicAuthHandler(password_manager)
+        realm = "ACME Networks"
+        http_handler = MockHTTPHandler(
+            407, 'Proxy-Authenticate: Basic realm="%s"\r\n\r\n' % realm)
+        opener.add_handler(auth_handler)
+        opener.add_handler(http_handler)
+        self._test_basic_auth(opener, auth_handler, "Proxy-authorization",
+                              realm, http_handler, password_manager,
+                              "http://acme.example.com:3128/protected",
+                              "proxy.example.com:3128",
+                              )
+
+    def test_basic_and_digest_auth_handlers(self):
+        # HTTPDigestAuthHandler threw an exception if it couldn't handle a 40*
+        # response (http://python.org/sf/1479302), where it should instead
+        # return None to allow another handler (especially
+        # HTTPBasicAuthHandler) to handle the response.
+
+        # Also (http://python.org/sf/14797027, RFC 2617 section 1.2), we must
+        # try digest first (since it's the strongest auth scheme), so we record
+        # order of calls here to check digest comes first:
+        class RecordingOpenerDirector(OpenerDirector):
+            def __init__(self):
+                OpenerDirector.__init__(self)
+                self.recorded = []
+            def record(self, info):
+                self.recorded.append(info)
+        class TestDigestAuthHandler(urllib2.HTTPDigestAuthHandler):
+            def http_error_401(self, *args, **kwds):
+                self.parent.record("digest")
+                urllib2.HTTPDigestAuthHandler.http_error_401(self,
+                                                             *args, **kwds)
+        class TestBasicAuthHandler(urllib2.HTTPBasicAuthHandler):
+            def http_error_401(self, *args, **kwds):
+                self.parent.record("basic")
+                urllib2.HTTPBasicAuthHandler.http_error_401(self,
+                                                            *args, **kwds)
+
+        opener = RecordingOpenerDirector()
+        password_manager = MockPasswordManager()
+        digest_handler = TestDigestAuthHandler(password_manager)
+        basic_handler = TestBasicAuthHandler(password_manager)
+        realm = "ACME Networks"
+        http_handler = MockHTTPHandler(
+            401, 'WWW-Authenticate: Basic realm="%s"\r\n\r\n' % realm)
+        opener.add_handler(basic_handler)
+        opener.add_handler(digest_handler)
+        opener.add_handler(http_handler)
+
+        # check basic auth isn't blocked by digest handler failing
+        self._test_basic_auth(opener, basic_handler, "Authorization",
+                              realm, http_handler, password_manager,
+                              "http://acme.example.com/protected",
+                              "http://acme.example.com/protected",
+                              )
+        # check digest was tried before basic (twice, because
+        # _test_basic_auth called .open() twice)
+        self.assertEqual(opener.recorded, ["digest", "basic"]*2)
+
+    def _test_basic_auth(self, opener, auth_handler, auth_header,
+                         realm, http_handler, password_manager,
+                         request_url, protected_url):
+        import base64, httplib
+        user, password = "wile", "coyote"
+
+        # .add_password() fed through to password manager
+        auth_handler.add_password(realm, request_url, user, password)
+        self.assertEqual(realm, password_manager.realm)
+        self.assertEqual(request_url, password_manager.url)
+        self.assertEqual(user, password_manager.user)
+        self.assertEqual(password, password_manager.password)
+
+        r = opener.open(request_url)
+
+        # should have asked the password manager for the username/password
+        self.assertEqual(password_manager.target_realm, realm)
+        self.assertEqual(password_manager.target_url, protected_url)
+
+        # expect one request without authorization, then one with
+        self.assertEqual(len(http_handler.requests), 2)
+        self.assertFalse(http_handler.requests[0].has_header(auth_header))
+        userpass = '%s:%s' % (user, password)
+        auth_hdr_value = 'Basic '+base64.encodestring(userpass).strip()
+        self.assertEqual(http_handler.requests[1].get_header(auth_header),
+                         auth_hdr_value)
+
+        # if the password manager can't find a password, the handler won't
+        # handle the HTTP auth error
+        password_manager.user = password_manager.password = None
+        http_handler.reset()
+        r = opener.open(request_url)
+        self.assertEqual(len(http_handler.requests), 1)
+        self.assertFalse(http_handler.requests[0].has_header(auth_header))
+
+
+class MiscTests(unittest.TestCase):
+
+    def test_build_opener(self):
+        class MyHTTPHandler(urllib2.HTTPHandler): pass
+        class FooHandler(urllib2.BaseHandler):
+            def foo_open(self): pass
+        class BarHandler(urllib2.BaseHandler):
+            def bar_open(self): pass
+
+        build_opener = urllib2.build_opener
+
+        o = build_opener(FooHandler, BarHandler)
+        self.opener_has_handler(o, FooHandler)
+        self.opener_has_handler(o, BarHandler)
+
+        # can take a mix of classes and instances
+        o = build_opener(FooHandler, BarHandler())
+        self.opener_has_handler(o, FooHandler)
+        self.opener_has_handler(o, BarHandler)
+
+        # subclasses of default handlers override default handlers
+        o = build_opener(MyHTTPHandler)
+        self.opener_has_handler(o, MyHTTPHandler)
+
+        # a particular case of overriding: default handlers can be passed
+        # in explicitly
+        o = build_opener()
+        self.opener_has_handler(o, urllib2.HTTPHandler)
+        o = build_opener(urllib2.HTTPHandler)
+        self.opener_has_handler(o, urllib2.HTTPHandler)
+        o = build_opener(urllib2.HTTPHandler())
+        self.opener_has_handler(o, urllib2.HTTPHandler)
+
+    def opener_has_handler(self, opener, handler_class):
+        for h in opener.handlers:
+            if h.__class__ == handler_class:
+                break
+        else:
+            self.assert_(False)
+
+
+def test_main(verbose=None):
+    from test import test_urllib2
+    test_support.run_doctest(test_urllib2, verbose)
+    test_support.run_doctest(urllib2, verbose)
+    tests = (TrivialTests,
+             OpenerDirectorTests,
+             HandlerTests,
+             MiscTests)
+    test_support.run_unittest(*tests)
+
+if __name__ == "__main__":
+    test_main(verbose=True)

Added: sandbox/trunk/urilib/urllib2.py
==============================================================================
--- (empty file)
+++ sandbox/trunk/urilib/urllib2.py	Sun Aug  5 23:42:33 2007
@@ -0,0 +1,1371 @@
+"""An extensible library for opening URLs using a variety of protocols
+
+The simplest way to use this module is to call the urlopen function,
+which accepts a string containing a URL or a Request object (described
+below).  It opens the URL and returns the results as file-like
+object; the returned object has some extra methods described below.
+
+The OpenerDirector manages a collection of Handler objects that do
+all the actual work.  Each Handler implements a particular protocol or
+option.  The OpenerDirector is a composite object that invokes the
+Handlers needed to open the requested URL.  For example, the
+HTTPHandler performs HTTP GET and POST requests and deals with
+non-error returns.  The HTTPRedirectHandler automatically deals with
+HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
+deals with digest authentication.
+
+urlopen(url, data=None) -- Basic usage is the same as original
+urllib.  pass the url and optionally data to post to an HTTP URL, and
+get a file-like object back.  One difference is that you can also pass
+a Request instance instead of URL.  Raises a URLError (subclass of
+IOError); for HTTP errors, raises an HTTPError, which can also be
+treated as a valid response.
+
+build_opener -- Function that creates a new OpenerDirector instance.
+Will install the default handlers.  Accepts one or more Handlers as
+arguments, either instances or Handler classes that it will
+instantiate.  If one of the argument is a subclass of the default
+handler, the argument will be installed instead of the default.
+
+install_opener -- Installs a new opener as the default opener.
+
+objects of interest:
+OpenerDirector --
+
+Request -- An object that encapsulates the state of a request.  The
+state can be as simple as the URL.  It can also include extra HTTP
+headers, e.g. a User-Agent.
+
+BaseHandler --
+
+exceptions:
+URLError -- A subclass of IOError, individual protocols have their own
+specific subclass.
+
+HTTPError -- Also a valid HTTP response, so you can treat an HTTP error
+as an exceptional event or valid response.
+
+internals:
+BaseHandler and parent
+_call_chain conventions
+
+Example usage:
+
+import urllib2
+
+# set up authentication info
+authinfo = urllib2.HTTPBasicAuthHandler()
+authinfo.add_password(realm='PDQ Application',
+                      uri='https://mahler:8092/site-updates.py',
+                      user='klem',
+                      passwd='geheim$parole')
+
+proxy_support = urllib2.ProxyHandler({"http" : "http://ahad-haam:3128"})
+
+# build a new opener that adds authentication and caching FTP handlers
+opener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)
+
+# install it
+urllib2.install_opener(opener)
+
+f = urllib2.urlopen('http://www.python.org/')
+
+
+"""
+
+# XXX issues:
+# If an authentication error handler that tries to perform
+# authentication for some reason but fails, how should the error be
+# signalled?  The client needs to know the HTTP error code.  But if
+# the handler knows that the problem was, e.g., that it didn't know
+# that hash algo that requested in the challenge, it would be good to
+# pass that information along to the client, too.
+# ftp errors aren't handled cleanly
+# check digest against correct (i.e. non-apache) implementation
+
+# Possible extensions:
+# complex proxies  XXX not sure what exactly was meant by this
+# abstract factory for opener
+
+import base64
+import hashlib
+import httplib
+import mimetools
+import os
+import posixpath
+import random
+import re
+import socket
+import sys
+import time
+import urlparse
+import bisect
+
+try:
+    from cStringIO import StringIO
+except ImportError:
+    from StringIO import StringIO
+
+from urllib import (unwrap, unquote, splittype, splithost, quote,
+     addinfourl, splitport, splitquery,
+     splitattr, ftpwrapper, noheaders, splituser, splitpasswd, splitvalue)
+
+# support for FileHandler, proxies via environment variables
+from urllib import localhost, url2pathname, getproxies
+
+# used in User-Agent header sent
+__version__ = sys.version[:3]
+
+_opener = None
+def urlopen(url, data=None, timeout=None):
+    global _opener
+    if _opener is None:
+        _opener = build_opener()
+    return _opener.open(url, data, timeout)
+
+def install_opener(opener):
+    global _opener
+    _opener = opener
+
+# do these error classes make sense?
+# make sure all of the IOError stuff is overridden.  we just want to be
+# subtypes.
+
+class URLError(IOError):
+    # URLError is a sub-type of IOError, but it doesn't share any of
+    # the implementation.  need to override __init__ and __str__.
+    # It sets self.args for compatibility with other EnvironmentError
+    # subclasses, but args doesn't have the typical format with errno in
+    # slot 0 and strerror in slot 1.  This may be better than nothing.
+    def __init__(self, reason):
+        self.args = reason,
+        self.reason = reason
+
+    def __str__(self):
+        return '<urlopen error %s>' % self.reason
+
+class HTTPError(URLError, addinfourl):
+    """Raised when HTTP error occurs, but also acts like non-error return"""
+    __super_init = addinfourl.__init__
+
+    def __init__(self, url, code, msg, hdrs, fp):
+        self.code = code
+        self.msg = msg
+        self.hdrs = hdrs
+        self.fp = fp
+        self.filename = url
+        # The addinfourl classes depend on fp being a valid file
+        # object.  In some cases, the HTTPError may not have a valid
+        # file object.  If this happens, the simplest workaround is to
+        # not initialize the base classes.
+        if fp is not None:
+            self.__super_init(fp, hdrs, url)
+
+    def __str__(self):
+        return 'HTTP Error %s: %s' % (self.code, self.msg)
+
+# copied from cookielib.py
+_cut_port_re = re.compile(r":\d+$")
+def request_host(request):
+    """Return request-host, as defined by RFC 2965.
+
+    Variation from RFC: returned value is lowercased, for convenient
+    comparison.
+
+    """
+    url = request.get_full_url()
+    host = urlparse.urlparse(url)[1]
+    if host == "":
+        host = request.get_header("Host", "")
+
+    # remove port, if present
+    host = _cut_port_re.sub("", host, 1)
+    return host.lower()
+
+class Request:
+
+    def __init__(self, url, data=None, headers={},
+                 origin_req_host=None, unverifiable=False):
+        # unwrap('<URL:type://host/path>') --> 'type://host/path'
+        self.__original = unwrap(url)
+        self.type = None
+        # self.__r_type is what's left after doing the splittype
+        self.host = None
+        self.port = None
+        self.data = data
+        self.headers = {}
+        for key, value in headers.items():
+            self.add_header(key, value)
+        self.unredirected_hdrs = {}
+        if origin_req_host is None:
+            origin_req_host = request_host(self)
+        self.origin_req_host = origin_req_host
+        self.unverifiable = unverifiable
+
+    def __getattr__(self, attr):
+        # XXX this is a fallback mechanism to guard against these
+        # methods getting called in a non-standard order.  this may be
+        # too complicated and/or unnecessary.
+        # XXX should the __r_XXX attributes be public?
+        if attr[:12] == '_Request__r_':
+            name = attr[12:]
+            if hasattr(Request, 'get_' + name):
+                getattr(self, 'get_' + name)()
+                return getattr(self, attr)
+        raise AttributeError, attr
+
+    def get_method(self):
+        if self.has_data():
+            return "POST"
+        else:
+            return "GET"
+
+    # XXX these helper methods are lame
+
+    def add_data(self, data):
+        self.data = data
+
+    def has_data(self):
+        return self.data is not None
+
+    def get_data(self):
+        return self.data
+
+    def get_full_url(self):
+        return self.__original
+
+    def get_type(self):
+        if self.type is None:
+            self.type, self.__r_type = splittype(self.__original)
+            if self.type is None:
+                raise ValueError, "unknown url type: %s" % self.__original
+        return self.type
+
+    def get_host(self):
+        if self.host is None:
+            self.host, self.__r_host = splithost(self.__r_type)
+            if self.host:
+                self.host = unquote(self.host)
+        return self.host
+
+    def get_selector(self):
+        return self.__r_host
+
+    def set_proxy(self, host, type):
+        self.host, self.type = host, type
+        self.__r_host = self.__original
+
+    def get_origin_req_host(self):
+        return self.origin_req_host
+
+    def is_unverifiable(self):
+        return self.unverifiable
+
+    def add_header(self, key, val):
+        # useful for something like authentication
+        self.headers[key.capitalize()] = val
+
+    def add_unredirected_header(self, key, val):
+        # will not be added to a redirected request
+        self.unredirected_hdrs[key.capitalize()] = val
+
+    def has_header(self, header_name):
+        return (header_name in self.headers or
+                header_name in self.unredirected_hdrs)
+
+    def get_header(self, header_name, default=None):
+        return self.headers.get(
+            header_name,
+            self.unredirected_hdrs.get(header_name, default))
+
+    def header_items(self):
+        hdrs = self.unredirected_hdrs.copy()
+        hdrs.update(self.headers)
+        return hdrs.items()
+
+class OpenerDirector:
+    def __init__(self):
+        client_version = "Python-urllib/%s" % __version__
+        self.addheaders = [('User-agent', client_version)]
+        # manage the individual handlers
+        self.handlers = []
+        self.handle_open = {}
+        self.handle_error = {}
+        self.process_response = {}
+        self.process_request = {}
+
+    def add_handler(self, handler):
+        if not hasattr(handler, "add_parent"):
+            raise TypeError("expected BaseHandler instance, got %r" %
+                            type(handler))
+
+        added = False
+        for meth in dir(handler):
+            if meth in ["redirect_request", "do_open", "proxy_open"]:
+                # oops, coincidental match
+                continue
+
+            i = meth.find("_")
+            protocol = meth[:i]
+            condition = meth[i+1:]
+
+            if condition.startswith("error"):
+                j = condition.find("_") + i + 1
+                kind = meth[j+1:]
+                try:
+                    kind = int(kind)
+                except ValueError:
+                    pass
+                lookup = self.handle_error.get(protocol, {})
+                self.handle_error[protocol] = lookup
+            elif condition == "open":
+                kind = protocol
+                lookup = self.handle_open
+            elif condition == "response":
+                kind = protocol
+                lookup = self.process_response
+            elif condition == "request":
+                kind = protocol
+                lookup = self.process_request
+            else:
+                continue
+
+            handlers = lookup.setdefault(kind, [])
+            if handlers:
+                bisect.insort(handlers, handler)
+            else:
+                handlers.append(handler)
+            added = True
+
+        if added:
+            # the handlers must work in an specific order, the order
+            # is specified in a Handler attribute
+            bisect.insort(self.handlers, handler)
+            handler.add_parent(self)
+
+    def close(self):
+        # Only exists for backwards compatibility.
+        pass
+
+    def _call_chain(self, chain, kind, meth_name, *args):
+        # Handlers raise an exception if no one else should try to handle
+        # the request, or return None if they can't but another handler
+        # could.  Otherwise, they return the response.
+        handlers = chain.get(kind, ())
+        for handler in handlers:
+            func = getattr(handler, meth_name)
+
+            result = func(*args)
+            if result is not None:
+                return result
+
+    def open(self, fullurl, data=None, timeout=None):
+        # accept a URL or a Request object
+        if isinstance(fullurl, basestring):
+            req = Request(fullurl, data)
+        else:
+            req = fullurl
+            if data is not None:
+                req.add_data(data)
+
+        req.timeout = timeout
+        protocol = req.get_type()
+
+        # pre-process request
+        meth_name = protocol+"_request"
+        for processor in self.process_request.get(protocol, []):
+            meth = getattr(processor, meth_name)
+            req = meth(req)
+
+        response = self._open(req, data)
+
+        # post-process response
+        meth_name = protocol+"_response"
+        for processor in self.process_response.get(protocol, []):
+            meth = getattr(processor, meth_name)
+            response = meth(req, response)
+
+        return response
+
+    def _open(self, req, data=None):
+        result = self._call_chain(self.handle_open, 'default',
+                                  'default_open', req)
+        if result:
+            return result
+
+        protocol = req.get_type()
+        result = self._call_chain(self.handle_open, protocol, protocol +
+                                  '_open', req)
+        if result:
+            return result
+
+        return self._call_chain(self.handle_open, 'unknown',
+                                'unknown_open', req)
+
+    def error(self, proto, *args):
+        if proto in ('http', 'https'):
+            # XXX http[s] protocols are special-cased
+            dict = self.handle_error['http'] # https is not different than http
+            proto = args[2]  # YUCK!
+            meth_name = 'http_error_%s' % proto
+            http_err = 1
+            orig_args = args
+        else:
+            dict = self.handle_error
+            meth_name = proto + '_error'
+            http_err = 0
+        args = (dict, proto, meth_name) + args
+        result = self._call_chain(*args)
+        if result:
+            return result
+
+        if http_err:
+            args = (dict, 'default', 'http_error_default') + orig_args
+            return self._call_chain(*args)
+
+# XXX probably also want an abstract factory that knows when it makes
+# sense to skip a superclass in favor of a subclass and when it might
+# make sense to include both
+
+def build_opener(*handlers):
+    """Create an opener object from a list of handlers.
+
+    The opener will use several default handlers, including support
+    for HTTP and FTP.
+
+    If any of the handlers passed as arguments are subclasses of the
+    default handlers, the default handlers will not be used.
+    """
+    import types
+    def isclass(obj):
+        return isinstance(obj, types.ClassType) or hasattr(obj, "__bases__")
+
+    opener = OpenerDirector()
+    default_classes = [ProxyHandler, UnknownHandler, HTTPHandler,
+                       HTTPDefaultErrorHandler, HTTPRedirectHandler,
+                       FTPHandler, FileHandler, HTTPErrorProcessor]
+    if hasattr(httplib, 'HTTPS'):
+        default_classes.append(HTTPSHandler)
+    skip = []
+    for klass in default_classes:
+        for check in handlers:
+            if isclass(check):
+                if issubclass(check, klass):
+                    skip.append(klass)
+            elif isinstance(check, klass):
+                skip.append(klass)
+    for klass in skip:
+        default_classes.remove(klass)
+
+    for klass in default_classes:
+        opener.add_handler(klass())
+
+    for h in handlers:
+        if isclass(h):
+            h = h()
+        opener.add_handler(h)
+    return opener
+
+class BaseHandler:
+    handler_order = 500
+
+    def add_parent(self, parent):
+        self.parent = parent
+
+    def close(self):
+        # Only exists for backwards compatibility
+        pass
+
+    def __lt__(self, other):
+        if not hasattr(other, "handler_order"):
+            # Try to preserve the old behavior of having custom classes
+            # inserted after default ones (works only for custom user
+            # classes which are not aware of handler_order).
+            return True
+        return self.handler_order < other.handler_order
+
+
+class HTTPErrorProcessor(BaseHandler):
+    """Process HTTP error responses."""
+    handler_order = 1000  # after all other processing
+
+    def http_response(self, request, response):
+        code, msg, hdrs = response.code, response.msg, response.info()
+
+        # According to RFC 2616, "2xx" code indicates that the client's
+        # request was successfully received, understood, and accepted.
+        if not (200 <= code < 300):
+            response = self.parent.error(
+                'http', request, response, code, msg, hdrs)
+
+        return response
+
+    https_response = http_response
+
+class HTTPDefaultErrorHandler(BaseHandler):
+    def http_error_default(self, req, fp, code, msg, hdrs):
+        raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
+
+class HTTPRedirectHandler(BaseHandler):
+    # maximum number of redirections to any single URL
+    # this is needed because of the state that cookies introduce
+    max_repeats = 4
+    # maximum total number of redirections (regardless of URL) before
+    # assuming we're in a loop
+    max_redirections = 10
+
+    def __init__(self):
+        self.cache = {}
+
+    def redirect_request(self, req, fp, code, msg, headers, newurl):
+        """Return a Request or None in response to a redirect.
+
+        This is called by the http_error_30x methods when a
+        redirection response is received.  If a redirection should
+        take place, return a new Request to allow http_error_30x to
+        perform the redirect.  Otherwise, raise HTTPError if no-one
+        else should try to handle this url.  Return None if you can't
+        but another Handler might.
+        """
+        m = req.get_method()
+        if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
+            or code in (301, 302, 303) and m == "POST"):
+            # Strictly (according to RFC 2616), 301 or 302 in response
+            # to a POST MUST NOT cause a redirection without confirmation
+            # from the user (of urllib2, in this case).  In practice,
+            # essentially all clients do redirect in this case, so we
+            # do the same.
+            # be conciliant with URIs containing a space
+            newurl = newurl.replace(' ', '%20')
+            return Request(newurl,
+                           headers=req.headers,
+                           origin_req_host=req.get_origin_req_host(),
+                           unverifiable=True)
+        else:
+            raise HTTPError(req.get_full_url(), code, msg, headers, fp)
+
+    # Implementation note: To avoid the server sending us into an
+    # infinite loop, the request object needs to track what URLs we
+    # have already seen.  Do this by adding a handler-specific
+    # attribute to the Request object.
+    
+    def http_error_301(self, req, fp, code, msg, headers):
+        if req in self.cache:
+            if 'location' in headers:
+                newurl = headers.getheaders('location')[0]
+            elif 'uri' in headers:
+                newurl = headers.getheaders('uri')[0]
+            else:
+                return
+            if hasattr(req, 'redirect_dict'):
+                visited = req.redirect_dict
+                if (visited.get(newurl, 0) > (self.max_repeats -1 ) or
+                        len(visited) >= (self.max_redirections - 1)):
+                    raise HTTPError(req.get_full_url(), code, self.inf_msg +
+                            msg, headers, fp)
+            else:
+                visited = req.redirect_dict = {}
+            visited[newurl] = visited.get(newurl, 0) + 1
+            return self.cache[req]
+        self.cache[req] = self.http_error_302(req, fp, code, msg, headers)
+        return self.cache[req]
+
+    def http_error_302(self, req, fp, code, msg, headers):
+        # Some servers (incorrectly) return multiple Location headers
+        # (so probably same goes for URI).  Use first header.
+        if 'location' in headers:
+            newurl = headers.getheaders('location')[0]
+        elif 'uri' in headers:
+            newurl = headers.getheaders('uri')[0]
+        else:
+            return
+        newurl = urlparse.urljoin(req.get_full_url(), newurl)
+
+        # XXX Probably want to forget about the state of the current
+        # request, although that might interact poorly with other
+        # handlers that also use handler-specific request attributes
+        new = self.redirect_request(req, fp, code, msg, headers, newurl)
+        if new is None:
+            return
+
+        # loop detection
+        # .redirect_dict has a key url if url was previously visited.
+        if hasattr(req, 'redirect_dict'):
+            visited = new.redirect_dict = req.redirect_dict
+            if (visited.get(newurl, 0) >= self.max_repeats or
+                len(visited) >= self.max_redirections):
+                raise HTTPError(req.get_full_url(), code,
+                                self.inf_msg + msg, headers, fp)
+        else:
+            visited = new.redirect_dict = req.redirect_dict = {}
+        visited[newurl] = visited.get(newurl, 0) + 1
+
+        # Don't close the fp until we are sure that we won't use it
+        # with HTTPError.
+        fp.read()
+        fp.close()
+
+        return self.parent.open(new)
+
+    http_error_303 = http_error_307 = http_error_302
+
+    inf_msg = "The HTTP server returned a redirect error that would " \
+              "lead to an infinite loop.\n" \
+              "The last 30x error message was:\n"
+
+
+def _parse_proxy(proxy):
+    """Return (scheme, user, password, host/port) given a URL or an authority.
+
+    If a URL is supplied, it must have an authority (host:port) component.
+    According to RFC 3986, having an authority component means the URL must
+    have two slashes after the scheme:
+
+    >>> _parse_proxy('file:/ftp.example.com/')
+    Traceback (most recent call last):
+    ValueError: proxy URL with no authority: 'file:/ftp.example.com/'
+
+    The first three items of the returned tuple may be None.
+
+    Examples of authority parsing:
+
+    >>> _parse_proxy('proxy.example.com')
+    (None, None, None, 'proxy.example.com')
+    >>> _parse_proxy('proxy.example.com:3128')
+    (None, None, None, 'proxy.example.com:3128')
+
+    The authority component may optionally include userinfo (assumed to be
+    username:password):
+
+    >>> _parse_proxy('joe:password at proxy.example.com')
+    (None, 'joe', 'password', 'proxy.example.com')
+    >>> _parse_proxy('joe:password at proxy.example.com:3128')
+    (None, 'joe', 'password', 'proxy.example.com:3128')
+
+    Same examples, but with URLs instead:
+
+    >>> _parse_proxy('http://proxy.example.com/')
+    ('http', None, None, 'proxy.example.com')
+    >>> _parse_proxy('http://proxy.example.com:3128/')
+    ('http', None, None, 'proxy.example.com:3128')
+    >>> _parse_proxy('http://joe:password@proxy.example.com/')
+    ('http', 'joe', 'password', 'proxy.example.com')
+    >>> _parse_proxy('http://joe:password@proxy.example.com:3128')
+    ('http', 'joe', 'password', 'proxy.example.com:3128')
+
+    Everything after the authority is ignored:
+
+    >>> _parse_proxy('ftp://joe:password@proxy.example.com/rubbish:3128')
+    ('ftp', 'joe', 'password', 'proxy.example.com')
+
+    Test for no trailing '/' case:
+
+    >>> _parse_proxy('http://joe:password@proxy.example.com')
+    ('http', 'joe', 'password', 'proxy.example.com')
+
+    """
+    scheme, r_scheme = splittype(proxy)
+    if not r_scheme.startswith("/"):
+        # authority
+        scheme = None
+        authority = proxy
+    else:
+        # URL
+        if not r_scheme.startswith("//"):
+            raise ValueError("proxy URL with no authority: %r" % proxy)
+        # We have an authority, so for RFC 3986-compliant URLs (by ss 3.
+        # and 3.3.), path is empty or starts with '/'
+        end = r_scheme.find("/", 2)
+        if end == -1:
+            end = None
+        authority = r_scheme[2:end]
+    userinfo, hostport = splituser(authority)
+    if userinfo is not None:
+        user, password = splitpasswd(userinfo)
+    else:
+        user = password = None
+    return scheme, user, password, hostport
+
+class ProxyHandler(BaseHandler):
+    # Proxies must be in front
+    handler_order = 100
+
+    def __init__(self, proxies=None):
+        if proxies is None:
+            proxies = getproxies()
+        assert hasattr(proxies, 'has_key'), "proxies must be a mapping"
+        self.proxies = proxies
+        for type, url in proxies.items():
+            setattr(self, '%s_open' % type,
+                    lambda r, proxy=url, type=type, meth=self.proxy_open: \
+                    meth(r, proxy, type))
+
+    def proxy_open(self, req, proxy, type):
+        orig_type = req.get_type()
+        proxy_type, user, password, hostport = _parse_proxy(proxy)
+        if proxy_type is None:
+            proxy_type = orig_type
+        if user and password:
+            user_pass = '%s:%s' % (unquote(user), unquote(password))
+            creds = base64.b64encode(user_pass).strip()
+            req.add_header('Proxy-authorization', 'Basic ' + creds)
+        hostport = unquote(hostport)
+        req.set_proxy(hostport, proxy_type)
+        if orig_type == proxy_type:
+            # let other handlers take care of it
+            return None
+        else:
+            # need to start over, because the other handlers don't
+            # grok the proxy's URL type
+            # e.g. if we have a constructor arg proxies like so:
+            # {'http': 'ftp://proxy.example.com'}, we may end up turning
+            # a request for http://acme.example.com/a into one for
+            # ftp://proxy.example.com/a
+            return self.parent.open(req)
+
+class HTTPPasswordMgr:
+
+    def __init__(self):
+        self.passwd = {}
+
+    def add_password(self, realm, uri, user, passwd):
+        # uri could be a single URI or a sequence
+        if isinstance(uri, basestring):
+            uri = [uri]
+        if not realm in self.passwd:
+            self.passwd[realm] = {}
+        for default_port in True, False:
+            reduced_uri = tuple(
+                [self.reduce_uri(u, default_port) for u in uri])
+            self.passwd[realm][reduced_uri] = (user, passwd)
+
+    def find_user_password(self, realm, authuri):
+        domains = self.passwd.get(realm, {})
+        for default_port in True, False:
+            reduced_authuri = self.reduce_uri(authuri, default_port)
+            for uris, authinfo in domains.iteritems():
+                for uri in uris:
+                    if self.is_suburi(uri, reduced_authuri):
+                        return authinfo
+        return None, None
+
+    def reduce_uri(self, uri, default_port=True):
+        """Accept authority or URI and extract only the authority and path."""
+        # note HTTP URLs do not have a userinfo component
+        parts = urlparse.urlsplit(uri)
+        if parts[1]:
+            # URI
+            scheme = parts[0]
+            authority = parts[1]
+            path = parts[2] or '/'
+        else:
+            # host or host:port
+            scheme = None
+            authority = uri
+            path = '/'
+        host, port = splitport(authority)
+        if default_port and port is None and scheme is not None:
+            dport = {"http": 80,
+                     "https": 443,
+                     }.get(scheme)
+            if dport is not None:
+                authority = "%s:%d" % (host, dport)
+        return authority, path
+
+    def is_suburi(self, base, test):
+        """Check if test is below base in a URI tree
+
+        Both args must be URIs in reduced form.
+        """
+        if base == test:
+            return True
+        if base[0] != test[0]:
+            return False
+        common = posixpath.commonprefix((base[1], test[1]))
+        if len(common) == len(base[1]):
+            return True
+        return False
+
+
+class HTTPPasswordMgrWithDefaultRealm(HTTPPasswordMgr):
+
+    def find_user_password(self, realm, authuri):
+        user, password = HTTPPasswordMgr.find_user_password(self, realm,
+                                                            authuri)
+        if user is not None:
+            return user, password
+        return HTTPPasswordMgr.find_user_password(self, None, authuri)
+
+
+class AbstractBasicAuthHandler:
+
+    # XXX this allows for multiple auth-schemes, but will stupidly pick
+    # the last one with a realm specified.
+
+    rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+realm="([^"]*)"', re.I)
+
+    # XXX could pre-emptively send auth info already accepted (RFC 2617,
+    # end of section 2, and section 1.2 immediately after "credentials"
+    # production).
+
+    def __init__(self, password_mgr=None):
+        if password_mgr is None:
+            password_mgr = HTTPPasswordMgr()
+        self.passwd = password_mgr
+        self.add_password = self.passwd.add_password
+
+    def http_error_auth_reqed(self, authreq, host, req, headers):
+        # host may be an authority (without userinfo) or a URL with an
+        # authority
+        # XXX could be multiple headers
+        authreq = headers.get(authreq, None)
+        if authreq:
+            mo = AbstractBasicAuthHandler.rx.search(authreq)
+            if mo:
+                scheme, realm = mo.groups()
+                if scheme.lower() == 'basic':
+                    return self.retry_http_basic_auth(host, req, realm)
+
+    def retry_http_basic_auth(self, host, req, realm):
+        user, pw = self.passwd.find_user_password(realm, host)
+        if pw is not None:
+            raw = "%s:%s" % (user, pw)
+            auth = 'Basic %s' % base64.b64encode(raw).strip()
+            if req.headers.get(self.auth_header, None) == auth:
+                return None
+            req.add_header(self.auth_header, auth)
+            return self.parent.open(req)
+        else:
+            return None
+
+
+class HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
+
+    auth_header = 'Authorization'
+
+    def http_error_401(self, req, fp, code, msg, headers):
+        url = req.get_full_url()
+        return self.http_error_auth_reqed('www-authenticate',
+                                          url, req, headers)
+
+
+class ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
+
+    auth_header = 'Proxy-authorization'
+
+    def http_error_407(self, req, fp, code, msg, headers):
+        # http_error_auth_reqed requires that there is no userinfo component in
+        # authority.  Assume there isn't one, since urllib2 does not (and
+        # should not, RFC 3986 s. 3.2.1) support requests for URLs containing
+        # userinfo.
+        authority = req.get_host()
+        return self.http_error_auth_reqed('proxy-authenticate',
+                                          authority, req, headers)
+
+
+def randombytes(n):
+    """Return n random bytes."""
+    # Use /dev/urandom if it is available.  Fall back to random module
+    # if not.  It might be worthwhile to extend this function to use
+    # other platform-specific mechanisms for getting random bytes.
+    if os.path.exists("/dev/urandom"):
+        f = open("/dev/urandom")
+        s = f.read(n)
+        f.close()
+        return s
+    else:
+        L = [chr(random.randrange(0, 256)) for i in range(n)]
+        return "".join(L)
+
+class AbstractDigestAuthHandler:
+    # Digest authentication is specified in RFC 2617.
+
+    # XXX The client does not inspect the Authentication-Info header
+    # in a successful response.
+
+    # XXX It should be possible to test this implementation against
+    # a mock server that just generates a static set of challenges.
+
+    # XXX qop="auth-int" supports is shaky
+
+    def __init__(self, passwd=None):
+        if passwd is None:
+            passwd = HTTPPasswordMgr()
+        self.passwd = passwd
+        self.add_password = self.passwd.add_password
+        self.retried = 0
+        self.nonce_count = 0
+
+    def reset_retry_count(self):
+        self.retried = 0
+
+    def http_error_auth_reqed(self, auth_header, host, req, headers):
+        authreq = headers.get(auth_header, None)
+        if self.retried > 5:
+            # Don't fail endlessly - if we failed once, we'll probably
+            # fail a second time. Hm. Unless the Password Manager is
+            # prompting for the information. Crap. This isn't great
+            # but it's better than the current 'repeat until recursion
+            # depth exceeded' approach <wink>
+            raise HTTPError(req.get_full_url(), 401, "digest auth failed",
+                            headers, None)
+        else:
+            self.retried += 1
+        if authreq:
+            scheme = authreq.split()[0]
+            if scheme.lower() == 'digest':
+                return self.retry_http_digest_auth(req, authreq)
+
+    def retry_http_digest_auth(self, req, auth):
+        token, challenge = auth.split(' ', 1)
+        chal = parse_keqv_list(parse_http_list(challenge))
+        auth = self.get_authorization(req, chal)
+        if auth:
+            auth_val = 'Digest %s' % auth
+            if req.headers.get(self.auth_header, None) == auth_val:
+                return None
+            req.add_unredirected_header(self.auth_header, auth_val)
+            resp = self.parent.open(req)
+            return resp
+
+    def get_cnonce(self, nonce):
+        # The cnonce-value is an opaque
+        # quoted string value provided by the client and used by both client
+        # and server to avoid chosen plaintext attacks, to provide mutual
+        # authentication, and to provide some message integrity protection.
+        # This isn't a fabulous effort, but it's probably Good Enough.
+        dig = hashlib.sha1("%s:%s:%s:%s" % (self.nonce_count, nonce, time.ctime(),
+                                            randombytes(8))).hexdigest()
+        return dig[:16]
+
+    def get_authorization(self, req, chal):
+        try:
+            realm = chal['realm']
+            nonce = chal['nonce']
+            qop = chal.get('qop')
+            algorithm = chal.get('algorithm', 'MD5')
+            # mod_digest doesn't send an opaque, even though it isn't
+            # supposed to be optional
+            opaque = chal.get('opaque', None)
+        except KeyError:
+            return None
+
+        H, KD = self.get_algorithm_impls(algorithm)
+        if H is None:
+            return None
+
+        user, pw = self.passwd.find_user_password(realm, req.get_full_url())
+        if user is None:
+            return None
+
+        # XXX not implemented yet
+        if req.has_data():
+            entdig = self.get_entity_digest(req.get_data(), chal)
+        else:
+            entdig = None
+
+        A1 = "%s:%s:%s" % (user, realm, pw)
+        A2 = "%s:%s" % (req.get_method(),
+                        # XXX selector: what about proxies and full urls
+                        req.get_selector())
+        if qop == 'auth':
+            self.nonce_count += 1
+            ncvalue = '%08x' % self.nonce_count
+            cnonce = self.get_cnonce(nonce)
+            noncebit = "%s:%s:%s:%s:%s" % (nonce, ncvalue, cnonce, qop, H(A2))
+            respdig = KD(H(A1), noncebit)
+        elif qop is None:
+            respdig = KD(H(A1), "%s:%s" % (nonce, H(A2)))
+        else:
+            # XXX handle auth-int.
+            raise URLError("qop '%s' is not supported." % qop)
+
+        # XXX should the partial digests be encoded too?
+
+        base = 'username="%s", realm="%s", nonce="%s", uri="%s", ' \
+               'response="%s"' % (user, realm, nonce, req.get_selector(),
+                                  respdig)
+        if opaque:
+            base += ', opaque="%s"' % opaque
+        if entdig:
+            base += ', digest="%s"' % entdig
+        base += ', algorithm="%s"' % algorithm
+        if qop:
+            base += ', qop=auth, nc=%s, cnonce="%s"' % (ncvalue, cnonce)
+        return base
+
+    def get_algorithm_impls(self, algorithm):
+        # lambdas assume digest modules are imported at the top level
+        if algorithm == 'MD5':
+            H = lambda x: hashlib.md5(x).hexdigest()
+        elif algorithm == 'SHA':
+            H = lambda x: hashlib.sha1(x).hexdigest()
+        # XXX MD5-sess
+        KD = lambda s, d: H("%s:%s" % (s, d))
+        return H, KD
+
+    def get_entity_digest(self, data, chal):
+        # XXX not implemented yet
+        return None
+
+
+class HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
+    """An authentication protocol defined by RFC 2069
+
+    Digest authentication improves on basic authentication because it
+    does not transmit passwords in the clear.
+    """
+
+    auth_header = 'Authorization'
+    handler_order = 490  # before Basic auth
+
+    def http_error_401(self, req, fp, code, msg, headers):
+        host = urlparse.urlparse(req.get_full_url())[1]
+        retry = self.http_error_auth_reqed('www-authenticate',
+                                           host, req, headers)
+        self.reset_retry_count()
+        return retry
+
+
+class ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
+
+    auth_header = 'Proxy-Authorization'
+    handler_order = 490  # before Basic auth
+
+    def http_error_407(self, req, fp, code, msg, headers):
+        host = req.get_host()
+        retry = self.http_error_auth_reqed('proxy-authenticate',
+                                           host, req, headers)
+        self.reset_retry_count()
+        return retry
+
+class AbstractHTTPHandler(BaseHandler):
+
+    def __init__(self, debuglevel=0):
+        self._debuglevel = debuglevel
+
+    def set_http_debuglevel(self, level):
+        self._debuglevel = level
+
+    def do_request_(self, request):
+        host = request.get_host()
+        if not host:
+            raise URLError('no host given')
+
+        if request.has_data():  # POST
+            data = request.get_data()
+            if not request.has_header('Content-type'):
+                request.add_unredirected_header(
+                    'Content-type',
+                    'application/x-www-form-urlencoded')
+            if not request.has_header('Content-length'):
+                request.add_unredirected_header(
+                    'Content-length', '%d' % len(data))
+
+        scheme, sel = splittype(request.get_selector())
+        sel_host, sel_path = splithost(sel)
+        if not request.has_header('Host'):
+            request.add_unredirected_header('Host', sel_host or host)
+        for name, value in self.parent.addheaders:
+            name = name.capitalize()
+            if not request.has_header(name):
+                request.add_unredirected_header(name, value)
+
+        return request
+
+    def do_open(self, http_class, req):
+        """Return an addinfourl object for the request, using http_class.
+
+        http_class must implement the HTTPConnection API from httplib.
+        The addinfourl return value is a file-like object.  It also
+        has methods and attributes including:
+            - info(): return a mimetools.Message object for the headers
+            - geturl(): return the original request URL
+            - code: HTTP status code
+        """
+        host = req.get_host()
+        if not host:
+            raise URLError('no host given')
+
+        h = http_class(host, timeout=req.timeout) # will parse host:port
+        h.set_debuglevel(self._debuglevel)
+
+        headers = dict(req.headers)
+        headers.update(req.unredirected_hdrs)
+        # We want to make an HTTP/1.1 request, but the addinfourl
+        # class isn't prepared to deal with a persistent connection.
+        # It will try to read all remaining data from the socket,
+        # which will block while the server waits for the next request.
+        # So make sure the connection gets closed after the (only)
+        # request.
+        headers["Connection"] = "close"
+        headers = dict(
+            (name.title(), val) for name, val in headers.items())
+        try:
+            h.request(req.get_method(), req.get_selector(), req.data, headers)
+            r = h.getresponse()
+        except socket.error, err: # XXX what error?
+            raise URLError(err)
+
+        # Pick apart the HTTPResponse object to get the addinfourl
+        # object initialized properly.
+
+        # Wrap the HTTPResponse object in socket's file object adapter
+        # for Windows.  That adapter calls recv(), so delegate recv()
+        # to read().  This weird wrapping allows the returned object to
+        # have readline() and readlines() methods.
+
+        # XXX It might be better to extract the read buffering code
+        # out of socket._fileobject() and into a base class.
+
+        r.recv = r.read
+        fp = socket._fileobject(r, close=True)
+
+        resp = addinfourl(fp, r.msg, req.get_full_url())
+        resp.code = r.status
+        resp.msg = r.reason
+        return resp
+
+
+class HTTPHandler(AbstractHTTPHandler):
+
+    def http_open(self, req):
+        return self.do_open(httplib.HTTPConnection, req)
+
+    http_request = AbstractHTTPHandler.do_request_
+
+if hasattr(httplib, 'HTTPS'):
+    class HTTPSHandler(AbstractHTTPHandler):
+
+        def https_open(self, req):
+            return self.do_open(httplib.HTTPSConnection, req)
+
+        https_request = AbstractHTTPHandler.do_request_
+
+class HTTPCookieProcessor(BaseHandler):
+    def __init__(self, cookiejar=None):
+        import cookielib
+        if cookiejar is None:
+            cookiejar = cookielib.CookieJar()
+        self.cookiejar = cookiejar
+
+    def http_request(self, request):
+        self.cookiejar.add_cookie_header(request)
+        return request
+
+    def http_response(self, request, response):
+        self.cookiejar.extract_cookies(response, request)
+        return response
+
+    https_request = http_request
+    https_response = http_response
+
+class UnknownHandler(BaseHandler):
+    def unknown_open(self, req):
+        type = req.get_type()
+        raise URLError('unknown url type: %s' % type)
+
+def parse_keqv_list(l):
+    """Parse list of key=value strings where keys are not duplicated."""
+    parsed = {}
+    for elt in l:
+        k, v = elt.split('=', 1)
+        if v[0] == '"' and v[-1] == '"':
+            v = v[1:-1]
+        parsed[k] = v
+    return parsed
+
+def parse_http_list(s):
+    """Parse lists as described by RFC 2068 Section 2.
+
+    In particular, parse comma-separated lists where the elements of
+    the list may include quoted-strings.  A quoted-string could
+    contain a comma.  A non-quoted string could have quotes in the
+    middle.  Neither commas nor quotes count if they are escaped.
+    Only double-quotes count, not single-quotes.
+    """
+    res = []
+    part = ''
+
+    escape = quote = False
+    for cur in s:
+        if escape:
+            part += cur
+            escape = False
+            continue
+        if quote:
+            if cur == '\\':
+                escape = True
+                continue
+            elif cur == '"':
+                quote = False
+            part += cur
+            continue
+
+        if cur == ',':
+            res.append(part)
+            part = ''
+            continue
+
+        if cur == '"':
+            quote = True
+
+        part += cur
+
+    # append last part
+    if part:
+        res.append(part)
+
+    return [part.strip() for part in res]
+
+class FileHandler(BaseHandler):
+    # Use local file or FTP depending on form of URL
+    def file_open(self, req):
+        url = req.get_selector()
+        if url[:2] == '//' and url[2:3] != '/':
+            req.type = 'ftp'
+            return self.parent.open(req)
+        else:
+            return self.open_local_file(req)
+
+    # names for the localhost
+    names = None
+    def get_names(self):
+        if FileHandler.names is None:
+            try:
+                FileHandler.names = (socket.gethostbyname('localhost'),
+                                    socket.gethostbyname(socket.gethostname()))
+            except socket.gaierror:
+                FileHandler.names = (socket.gethostbyname('localhost'),)
+        return FileHandler.names
+
+    # not entirely sure what the rules are here
+    def open_local_file(self, req):
+        import email.utils
+        import mimetypes
+        host = req.get_host()
+        file = req.get_selector()
+        localfile = url2pathname(file)
+        try:
+            stats = os.stat(localfile)
+            size = stats.st_size
+            modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
+            mtype = mimetypes.guess_type(file)[0]
+            headers = mimetools.Message(StringIO(
+                'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
+                (mtype or 'text/plain', size, modified)))
+            if host:
+                host, port = splitport(host)
+            if not host or \
+                (not port and socket.gethostbyname(host) in self.get_names()):
+                return addinfourl(open(localfile, 'rb'),
+                                  headers, 'file:'+file)
+        except OSError, msg:
+            # urllib2 users shouldn't expect OSErrors coming from urlopen()
+            raise URLError(msg)
+        raise URLError('file not on local host')
+
+class FTPHandler(BaseHandler):
+    def ftp_open(self, req):
+        import ftplib
+        import mimetypes
+        host = req.get_host()
+        if not host:
+            raise IOError, ('ftp error', 'no host given')
+        host, port = splitport(host)
+        if port is None:
+            port = ftplib.FTP_PORT
+        else:
+            port = int(port)
+
+        # username/password handling
+        user, host = splituser(host)
+        if user:
+            user, passwd = splitpasswd(user)
+        else:
+            passwd = None
+        host = unquote(host)
+        user = unquote(user or '')
+        passwd = unquote(passwd or '')
+
+        try:
+            host = socket.gethostbyname(host)
+        except socket.error, msg:
+            raise URLError(msg)
+        path, attrs = splitattr(req.get_selector())
+        dirs = path.split('/')
+        dirs = map(unquote, dirs)
+        dirs, file = dirs[:-1], dirs[-1]
+        if dirs and not dirs[0]:
+            dirs = dirs[1:]
+        try:
+            fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
+            type = file and 'I' or 'D'
+            for attr in attrs:
+                attr, value = splitvalue(attr)
+                if attr.lower() == 'type' and \
+                   value in ('a', 'A', 'i', 'I', 'd', 'D'):
+                    type = value.upper()
+            fp, retrlen = fw.retrfile(file, type)
+            headers = ""
+            mtype = mimetypes.guess_type(req.get_full_url())[0]
+            if mtype:
+                headers += "Content-type: %s\n" % mtype
+            if retrlen is not None and retrlen >= 0:
+                headers += "Content-length: %d\n" % retrlen
+            sf = StringIO(headers)
+            headers = mimetools.Message(sf)
+            return addinfourl(fp, headers, req.get_full_url())
+        except ftplib.all_errors, msg:
+            raise IOError, ('ftp error', msg), sys.exc_info()[2]
+
+    def connect_ftp(self, user, passwd, host, port, dirs, timeout):
+        fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
+##        fw.ftp.set_debuglevel(1)
+        return fw
+
+class CacheFTPHandler(FTPHandler):
+    # XXX would be nice to have pluggable cache strategies
+    # XXX this stuff is definitely not thread safe
+    def __init__(self):
+        self.cache = {}
+        self.timeout = {}
+        self.soonest = 0
+        self.delay = 60
+        self.max_conns = 16
+
+    def setTimeout(self, t):
+        self.delay = t
+
+    def setMaxConns(self, m):
+        self.max_conns = m
+
+    def connect_ftp(self, user, passwd, host, port, dirs, timeout):
+        key = user, host, port, '/'.join(dirs), timeout
+        if key in self.cache:
+            self.timeout[key] = time.time() + self.delay
+        else:
+            self.cache[key] = ftpwrapper(user, passwd, host, port, dirs, timeout)
+            self.timeout[key] = time.time() + self.delay
+        self.check_cache()
+        return self.cache[key]
+
+    def check_cache(self):
+        # first check for old ones
+        t = time.time()
+        if self.soonest <= t:
+            for k, v in self.timeout.items():
+                if v < t:
+                    self.cache[k].close()
+                    del self.cache[k]
+                    del self.timeout[k]
+        self.soonest = min(self.timeout.values())
+
+        # then check the size
+        if len(self.cache) == self.max_conns:
+            for k, v in self.timeout.items():
+                if v == self.soonest:
+                    del self.cache[k]
+                    del self.timeout[k]
+                    break
+            self.soonest = min(self.timeout.values())


More information about the Python-checkins mailing list