[Web-SIG] urlparse method behaviour when handing abs/rel urls

O.R.Senthil Kumaran orsenthil at gmail.com
Fri Jun 27 20:31:58 CEST 2008


At http://bugs.python.org/issue754016, there is a discussion wherein if a URL
is given in a normal way to urlparse (For e.g. urlparse('www.python.org')), it
parses it as a path rather than as the net_loc component as is the comman case
with browsers.

urlparse module tries to follow RFC 1808, where it is specified that:

<quote_rfc1808>
2.4.3.  Parsing the Network Location/Login

   If the parse string begins with a double-slash "//", then the
   substring of characters after the double-slash and up to, but not
   including, the next slash "/" character is the network location/login
   (<net_loc>) of the URL.  

</quote_rfc1808>

For treating the url as a path, the RFC specifies that after parsing, scheme,
net_loc, parameters and query, whatever is left is path.

<quote_rfc1808>
2.4.6.  Parsing the Path

   After the above steps, all that is left of the parse string is the
   URL <path> and the slash "/" that may precede it. 
</quote_rfc1808>

So, when 'www.python.org' is not a scheme, net_loc (as per RFC), parameter or
query, it is a path. This case looks absurd for 'www.python.org' but perfect
for parsing relative urls like just 'a'. More over this makes sense when we
have relative urls with parameters and query, for e.g.'g:h','?x'

Now, the question comes as "How do we inform the users that if they want the
net_loc of the url, they have to use // in the front".

My suggestion is through the "Docs" and "Help" message.

There is a discussion and suggestion on raising an Exception for cases when url
does not start with '//'. 

As urlparse module is used for handling both absolute URLs as well as relative
URLS, this suggestion IMHO, would break the urlparse handling of all relative
urls. For e.g, Cases which are mentioned in the RFC 1808 (Section 5.1 Normal
Examples).

Another way to resolve this would be to break urlparse into two methods:
urlparse.absparse()
urlparse.relparse() 
and let the user decide what he wants.

Please provide your suggestions on this.
- Is the current method okay?
- Do we feel need for absparse and relparse()?


Thanks.
Senthil
-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org


More information about the Web-SIG mailing list