urllib interpretation of URL with ".."

Sat Jun 23 19:41:48 EDT 2007

Martin v. Löwis wrote:
> John Nagle schrieb:
> 
>>Here's a URL, found in a link, which gives us trouble
>>when we try to follow the link:
>>
>>    http://sportsbra.co.uk/../acatalog/shop.html
>>
>>Browsers immediately turn this into
>>
>>    http://sportsbra.co.uk/acatalog/shop.html
>>
>>and go from there, but urllib tries to open it explicitly, which
>>results in an HTTP error 400.
>>
>>Is "urllib" wrong?
> 
> 
> I can't see how. HTTP 1.1 says that the parameter to the GET
> request should be an abs_path; RFC 2396 says that
> /../acatalog/shop.html is indeed an abs_path, as .. is a valid
> segment. That RFC also has a section on relative identifiers
> and normalization; it defines what .. means *in a relative path*.
> 
> Section 4 is explicit about .. in absolute URIs:
> # The syntax for relative URI is a shortened form of that for absolute
> # URI, where some prefix of the URI is missing and certain path
> # components ("." and "..") have a special meaning when, and only when,
> # interpreting a relative path.
> 
> Notice the "and only when": the browsers who modify above
> URL before sending it seem to be in clear violation of
> RFC 2396.
> 
> Regards,
> Martin

    I think you're right.  The problem is that there is apparently a de-facto
standard in browsers that any number of "../" sequences at the beginning of
the path part of a URL have no effect.  Even Google seems to use that
interpretation; not only does it follow that link, it lists it in Google
without the "..".

					John Nagle