regex for url paramter

Robert Brewer fumanchu at amor.org
Tue Dec 7 17:49:39 EST 2004


Andreas Volz wrote:
> I try to extract a http target from a URL that is given as parameter.
> urlparse couldn't really help me. I tried it like this
> 
> url="http://www.example.com/example.html?url=http://www.exampl
> e.org/exa
> mple.html"
> 
> p = re.compile( '.*url=')
> url = p.sub( '', url)
> print url
> > http://www.example.org/example.html
> 
> This works, but if there're more parameters it doesn't work:
> 
> url2="http://www.example.com/example.html?url=http://www.examp
> le.org/exa
> mple.html&param=1"
> 
> p = re.compile( '.*url=')
> url2 = p.sub( '', url2)
> print url2
> > http://www.example.org/example.html&param=1
> 
> I played with regex to find one that matches also second case with
> multible parameters. I think it's easy, but I don't know how 
> to do. Can you help me?

I'd go back to urlparse if I were you.

>>> import urlparse
>>>
url="http://www.example.com/example.html?url=http://www.example.org/exam
ple.html"
>>> urlparse.urlparse(url)
('http', 'www.example.com', '/example.html', '',
'url=http://www.example.org/example.html', '')
>>> query = urlparse.urlparse(url)[4]
>>> params = [p.split("=", 1) for p in query.split("&")]
>>> params
[['url', 'http://www.example.org/example.html']]
>>> urlparse.urlparse(params[0][1])
('http', 'www.example.org', '/example.html', '', '', '')


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org



More information about the Python-list mailing list