is there a bug in urlunparse/urlunsplit

Rob Williscroft rtw at freenet.co.uk
Sun May 18 16:37:19 EDT 2008


Alex wrote in news:09764c57-03ce-4ccb-a26d-
e851899dcc7c at a23g2000hsc.googlegroups.com in comp.lang.python:

> Hi all.
> 
> Is there a bug in the urlunparse/urlunsplit functions?
> Look at this fragment (I know is quite silly):
> 
> urlunparse(urlparse('www.example.org','http'))
> ---> 'http:///www.example.org'
>            ^^^^^

Try these 3:

  urlparse('www.example.org','http')
  urlparse('http://www.example.org','http')
  urlparse('//www.example.org','http')

The 1st returns www.example.org as the path part
with the other 2 its the location (domain) part.

Although it may not be immediately obvious that the result 
is correct, consider the follwing html fragment:

  <img src="aaa.gif">
  <img stc="http://anothersite.com/bbb.gif">

If you were to use urlparse to parse the src attributes
you would want:

  ( '', '', 'aaa.gif', '','','' )
  ( 'http', 'anothersite.com', '/bbb.gif', '','','' )


Which AIUI is what urlparse does.


Rob.
-- 
http://www.victim-prime.dsl.pipex.com/



More information about the Python-list mailing list