urllib interpretation of URL with ".."

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Tue Jun 26 21:19:25 EDT 2007


En Tue, 26 Jun 2007 17:26:06 -0300, sergio <sergio at sergiomb.no-ip.org>  
escribió:

> John Nagle wrote:
>
>>  In Python, of course, "urlparse.urlparse", which is
>> the main function used to disassemble a URL, has no idea whether it's
>> being used by a client or a server, so it, reasonably enough, takes  
>> option
>> 1.
>
>>>> import urlparse
>>>> base="http://somesite.com/level1/"
>>>> path="../page.html"
>>>> urlparse.urljoin(base,path)
> 'http://somesite.com/page.html'
>>>> base="http://somesite.com/"
>>>> urlparse.urljoin(base,path)
> 'http://somesite.com/../page.html'
>
> For me this is a bug and is very annoying because I can't simply trip ../
> from path because base could have a level.

I'd say it's an annoyance, not a bug. Write your own urljoin function with  
your exact desired behavior - since all "meaningful" .. and . should have  
been already processed by urljoin, a simple url =  
url.replace("/../","/").replace("/./","/") may be enough.

-- 
Gabriel Genellina



More information about the Python-list mailing list