urllib interpretation of URL with ".."

Sat Jun 23 03:14:39 EDT 2007

John Nagle schrieb:
> Here's a URL, found in a link, which gives us trouble
> when we try to follow the link:
> 
>     http://sportsbra.co.uk/../acatalog/shop.html
> 
> Browsers immediately turn this into
> 
>     http://sportsbra.co.uk/acatalog/shop.html
> 
> and go from there, but urllib tries to open it explicitly, which
> results in an HTTP error 400.
> 
> Is "urllib" wrong?

I can't see how. HTTP 1.1 says that the parameter to the GET
request should be an abs_path; RFC 2396 says that
/../acatalog/shop.html is indeed an abs_path, as .. is a valid
segment. That RFC also has a section on relative identifiers
and normalization; it defines what .. means *in a relative path*.

Section 4 is explicit about .. in absolute URIs:
# The syntax for relative URI is a shortened form of that for absolute
# URI, where some prefix of the URI is missing and certain path
# components ("." and "..") have a special meaning when, and only when,
# interpreting a relative path.

Notice the "and only when": the browsers who modify above
URL before sending it seem to be in clear violation of
RFC 2396.

Regards,
Martin