Absolute to relative URL?

Thomas Guettler pan-newsreader at thomas-guettler.de
Thu Apr 4 16:54:54 EST 2002


On Thu, 04 Apr 2002 21:52:57 +0200, Jeff Shannon wrote:

> In article <3CAC1665.8040209 at mxm.dk>, maxm at mxm.dk says...
>> Thomas Guettler wrote:
>> 
>> 
>> > Is there already a function in the standard python modules which
>> > returns the a relative url given two absolute urls?
>> > 
>> > Example:
>> > 
>> > relative_url("http://foo/a/b/c", "http://foo/a/d")
>> >  --> return: "../d"
>> 
>> I hope you mean that the result should be "../../d" ??
> 
> Actually, that's not quite correct (and neither are your code examples
> below, for the same reason).
> 
> In the URL 'http://foo/a/b/c', we could presume that 'foo' is a domain
> name, 'a' and 'b' are directories, and 'c' is a filename. Thus, the
> 'current directory' ('.') for this URL is //foo/a/b, and the parent
> directory ('..') is //foo/a.  Therefore, in this case, the O.P.'s
> initial relative URL ('../d') is correct.
> 
> The issue is complicated somewhat in that '//foo/a/b/c' can *also* be
> shorthand for '//foo/a/b/c/index.html' (or whatever the server's
> default-filename is).  If (and *only* if) the original URL is
> representing a default document, then your result would be the correct
> one.

If you try to access //foo/a and "a" is a directory the webserver
redirects you to /foo/a/:

--guettli at sonne:~/python$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /~guettli/test
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://sonne.heaven/~guettli/test/">here</A>.<P>
</BODY></HTML>
Connection closed by foreign host.

Since there doesn't seem to be a function, I did it like this:


import urlparse
import re
import string

def relative_url(source, target):
	su=urlparse.urlparse(source)
	tu=urlparse.urlparse(target)
	junk=tu[3:]
	if su[0]!=tu[0] or su[1]!=tu[1]:
		#scheme (http) or netloc (www.heise.de) are different
		#return absolut path of target
		return target
	su=re.split("/", su[2])
	tu=re.split("/", tu[2])
	su.reverse()
	tu.reverse()

	#remove parts which are equal   (['a', 'b'] ['a', 'c'] --> ['c'])
	while len(su)>0 and len(tu)>0 and su[-1]==tu[-1]:
		su.pop()
		last_pop=tu.pop()
	if len(su)==0 and len(tu)==0:
		#Special case: link to itself (http://foo/a http://foo/a -> a)
		tu.append(last_pop)
	if len(su)==1 and su[0]=="" and len(tu)==0:
		#Special case: (http://foo/a/ http://foo/a -> ../a)
		su.append(last_pop)
		tu.append(last_pop)
	tu.reverse()
	relative_url=[]
	for i in range(len(su)-1): 
		relative_url.append("..")
	rel_url=string.join(relative_url + tu, "/")
	rel_url=urlparse.urlunparse(["", "", rel_url, junk[0], junk[1], junk[2]])
	return rel_url

def test_relative_url(source, target, result):
	res=relative_url(source, target)
	if res!=result:
		print "Test FAILED: result is:", res, "should be:", result, \
			  "source:", source, "target:", target
	else:
		print "Test ok: result is:", res, "source:", source, "target:", target
		
	
test_relative_url("http://foo/a/b", "http://foo/c", "../c")
test_relative_url("http://foo/a/b", "http://foo/c/d", "../c/d")
test_relative_url("http://foo/a/b", "ftp://foo/c", "ftp://foo/c")
test_relative_url("http://foo/a", "http://foo/b", "b")
test_relative_url("http://foo/a/", "http://foo/b", "../b")
test_relative_url("http://foo:80/a/", "http://foo/b", "http://foo/b")
test_relative_url("http://foo:8080/a/", "http://foo/b", "http://foo/b")
test_relative_url("http://foo/a", "http://foo/a", "a")
test_relative_url("http://foo/a/", "http://foo/a", "../a")
test_relative_url("http://foo/a", "http://foo/b/c", "b/c")
test_relative_url("http://foo/a/", "http://foo/b/c", "../b/c")
test_relative_url("http://foo/a/b", "http://foo/c/d", "../c/d")
test_relative_url("http://foo/a/b/", "http://foo/c/d", "../../c/d")
test_relative_url("http://foo/a", "http://foo/b", "b")
test_relative_url("http://foo/a;para?query#frag", "http://foo/a", "a")
test_relative_url("http://foo/a", "http://foo/a;para?query#frag",
				  "a;para?query#frag")


Feedback welcome
 thomas


-- 
Thomas Guettler <guettli at thomas-guettler.de>
http://www.thomas-guettler.de



More information about the Python-list mailing list