urllib interpretation of URL with ".."

sergio sergio at sergiomb.no-ip.org
Wed Jun 27 09:10:10 EDT 2007


Gabriel Genellina wrote:

> En Tue, 26 Jun 2007 17:26:06 -0300, sergio <sergio at sergiomb.no-ip.org>
> escribió:
> 
>> John Nagle wrote:
>>
>>>  In Python, of course, "urlparse.urlparse", which is
>>> the main function used to disassemble a URL, has no idea whether it's
>>> being used by a client or a server, so it, reasonably enough, takes
>>> option
>>> 1.
>>
>>>>> import urlparse
>>>>> base="http://somesite.com/level1/"
>>>>> path="../page.html"
>>>>> urlparse.urljoin(base,path)
>> 'http://somesite.com/page.html'
>>>>> base="http://somesite.com/"
>>>>> urlparse.urljoin(base,path)
>> 'http://somesite.com/../page.html'
>>
>> For me this is a bug and is very annoying because I can't simply trip ../
>> from path because base could have a level.
> 
> I'd say it's an annoyance, not a bug. Write your own urljoin function with
> your exact desired behavior - since all "meaningful" .. and . should have
> been already processed by urljoin, a simple url =
> url.replace("/../","/").replace("/./","/") may be enough.
> 

I had exactly the same though the solution is simply this:

urlparse.urljoin(base,path).replace("/../","/")


Many thanks,
--
Sérgio M. B. 



More information about the Python-list mailing list