should urlparse return user and pass in separate components?

John J. Lee jjl at pobox.com
Fri Sep 8 14:16:23 EDT 2006


"metaperl" <metaperl at gmail.com> writes:

> The urlparse with Python 2.4.3 includes the user and pass in the site
> aspect of its parse:
> 
> >>> scheme, site, path, parms, query, fid = urlparse.urlparse("http://bill:james@docs.python.org/lib/module-urlparse.html")
> 
> >>> site
> 'bill:james at docs.python.org'
> 
> 
> I personally would prefer that it be broken down a bit further. What
> are existing opinions on this?

Module urlparse should be deprecated in Python 2.6, to be replaced
with a new module (or modules) that implements the relevant parts of
RFC 3986 and 3987 (read the python-dev archives for discussion and
several people's first cuts at implementation).

Splitting "userinfo" (the bit before the '@' in
user:password at host.com) should be a separate function.  Mostly because
RFC 3986 talks a lot about 5-tuples into which ANY URL can be split,
and that splitting process doesn't involve splitting out userinfo.  So
it makes sense to have one function do the splitting into RFC 3986
5-tuples, and another split out the userinfo.  Also, though, the
userinfo syntax is deprecated, because people use it for semantic
spoofing attacks: people don't understand (or don't notice) that

http://microsoft.com&rhubarb=custard&confound=confabulate@192.168.0.1/more/stuff.htm

is not a microsoft.com URL.  Note that userinfo has always been
illegal in HTTP URLs, and is no longer supported by newer browsers.
So relegating it to a separate function is a good thing, IMO.


John




More information about the Python-list mailing list