regex for url paramter
Robert Brewer
fumanchu at amor.org
Tue Dec 7 17:49:39 EST 2004
Andreas Volz wrote:
> I try to extract a http target from a URL that is given as parameter.
> urlparse couldn't really help me. I tried it like this
>
> url="http://www.example.com/example.html?url=http://www.exampl
> e.org/exa
> mple.html"
>
> p = re.compile( '.*url=')
> url = p.sub( '', url)
> print url
> > http://www.example.org/example.html
>
> This works, but if there're more parameters it doesn't work:
>
> url2="http://www.example.com/example.html?url=http://www.examp
> le.org/exa
> mple.html¶m=1"
>
> p = re.compile( '.*url=')
> url2 = p.sub( '', url2)
> print url2
> > http://www.example.org/example.html¶m=1
>
> I played with regex to find one that matches also second case with
> multible parameters. I think it's easy, but I don't know how
> to do. Can you help me?
I'd go back to urlparse if I were you.
>>> import urlparse
>>>
url="http://www.example.com/example.html?url=http://www.example.org/exam
ple.html"
>>> urlparse.urlparse(url)
('http', 'www.example.com', '/example.html', '',
'url=http://www.example.org/example.html', '')
>>> query = urlparse.urlparse(url)[4]
>>> params = [p.split("=", 1) for p in query.split("&")]
>>> params
[['url', 'http://www.example.org/example.html']]
>>> urlparse.urlparse(params[0][1])
('http', 'www.example.org', '/example.html', '', '', '')
Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org
More information about the Python-list
mailing list