[Tutor] urllib and urllib2 - better documentation for the sdupit?

Mon, 8 Jul 2002 16:21:52 -0400

Urllib and urllib2 seem to have an abundance of goodies for handling nasty
things like proxies. In fact when I look at the info provided:

>>> dir(urllib)
['FancyURLopener', 'MAXFTPCACHE', 'URLopener', '__all__', '__builtins__',
'__doc__', '__file__', '__name__', '__version__', '_fast_quote',
'_fast_safe', '_fast_safe_test', '_ftperrors', '_hostprog', '_localhost',
'_noheaders', '_nportprog', '_passwdprog', '_portprog', '_queryprog',
'_tagprog', '_thishost', '_typeprog', '_urlopener', '_userprog',
'_valueprog', 'addbase', 'addclosehook', 'addinfo', 'addinfourl',
'always_safe', 'basejoin', 'ftpcache', 'ftperrors', 'ftpwrapper',
'getproxies', 'getproxies_environment', 'getproxies_registry', 'localhost',
'main', 'noheaders', 'os', 'pathname2url', 'proxy_bypass', 'quote',
'quote_plus', 'reporthook', 'socket', 'splitattr', 'splitgophertype',
'splithost', 'splitnport', 'splitpasswd', 'splitport', 'splitquery',
'splittag', 'splittype', 'splituser', 'splitvalue', 'stat', 'string', 'sys',
'test', 'test1', 'thishost', 'time', 'toBytes', 'types', 'unquote',
'unquote_plus', 'unwrap', 'url2pathname', 'urlcleanup', 'urlencode',
'urlopen', 'urlretrieve']
>>> dir(urllib2)
['AbstractBasicAuthHandler', 'AbstractDigestAuthHandler',
'AbstractHTTPHandler', 'BaseHandler', 'CacheFTPHandler', 'CustomProxy',
'CustomProxyHandler', 'FTPHandler', 'FileHandler', 'GopherError',
'GopherHandler', 'HTTPBasicAuthHandler', 'HTTPDefaultErrorHandler',
'HTTPDigestAuthHandler', 'HTTPError', 'HTTPHandler', 'HTTPPasswordMgr',
'HTTPPasswordMgrWithDefaultRealm', 'HTTPRedirectHandler', 'OpenerDirector',
'OpenerFactory', 'ProxyBasicAuthHandler', 'ProxyDigestAuthHandler',
'ProxyHandler', 'Request', 'StringIO', 'URLError', 'UnknownHandler',
'__builtins__', '__doc__', '__file__', '__name__', '__version__', '_opener',
'addinfourl', 'base64', 'build_opener', 'encode_digest', 'ftplib',
'ftpwrapper', 'getproxies', 'gopherlib', 'httplib', 'inspect',
'install_opener', 'localhost', 'md5', 'mimetools', 'mimetypes', 'noheaders',
'os', 'parse_http_list', 'parse_keqv_list', 'posixpath', 're', 'rfc822',
'sha', 'socket', 'splitattr', 'splitgophertype', 'splithost', 'splitport',
'splitquery', 'splittype', 'stat', 'sys', 'time', 'types', 'unquote',
'unwrap', 'url2pathname', 'urlopen', 'urlparse']

...everything I could possibly want (with the exception of SSL?) seems to be
there... the only problem is, I can't find documentation/examples that
covers more than a few of the options. I read Fredrik Lundh's entry on
urllib in Python Standard Library and found the basic examples for reading
and returning each element of a page as a list, which is in itself very
handy.

Does anyone have a good source for step-by-step usage of all the
functionality provided by these modules?

Thanks!

d