Better way to sift parts of URL . . .
Paul McGuire
ptmcg at austin.rr._bogus_.com
Tue Apr 18 18:05:55 EDT 2006
"Ben Wilson" <dausha at gmail.com> wrote in message
news:1145392079.720766.122310 at z34g2000cwc.googlegroups.com...
> I am working on a script that splits a URL into a page and a url. The
> examples below are the conditions I expect a user to pass to the
> script. In all cases, "http://www.example.org/test/" is the URL, and
> the page comprises parts that have upper case letters (note, 5 & 6 are
> the same as earlier examples, sans the 'test').
>
> 1. http://www.example.org/test/Main/AnotherPage (page =
> Main/AnotherPage)
> 2. http://www.example.org/test/Main (page = Main + '/' +
> default_page)
> 3. http://www.example.org/test (page = default_group + '/' +
> default_page)
> 4. http://www.example.org/test/ (page = default_group + '/' +
> default_page)
> 5. http://www.example.org/ (page = default_group + '/' +
> default_page)
> 6. http://www.example.org/Main/AnotherPage (page = Main/AnotherPage)
>
> Right now, I'm doing a simple split off condition 1:
>
> page = '.'.join(in.split('/')[-2:])
> url = '/'.join(in.split('/')[:-2]) + '/'
>
> Before I start winding my way down a complex path, I wanted to see if
> anybody had an elegant approach to this problem.
>
> Thanks in advance.
> Ben
>
Standard Python includes urlparse. Possible help?
-- Paul
import urlparse
urls = [
"http://www.example.org/test/Main/AnotherPage", # (page =
Main/AnotherPage)
"http://www.example.org/test/Main", # (page = Main + '/' + default_page)
"http://www.example.org/test", # (page = default_group + '/' +
default_page)
"http://www.example.org/test/", # (page = default_group + '/' +
default_page)
"http://www.example.org/", # (page = default_group + '/' + default_page)
"http://www.example.org/Main/AnotherPage",
]
for u in urls:
print u
parts = urlparse.urlparse(u)
print parts
scheme,netloc,path,params,query,frag = parts
print path.split("/")[1:]
print
prints:
http://www.example.org/test/Main/AnotherPage
('http', 'www.example.org', '/test/Main/AnotherPage', '', '', '')
['test', 'Main', 'AnotherPage']
http://www.example.org/test/Main
('http', 'www.example.org', '/test/Main', '', '', '')
['test', 'Main']
http://www.example.org/test
('http', 'www.example.org', '/test', '', '', '')
['test']
http://www.example.org/test/
('http', 'www.example.org', '/test/', '', '', '')
['test', '']
http://www.example.org/
('http', 'www.example.org', '/', '', '', '')
['']
http://www.example.org
('http', 'www.example.org', '', '', '', '')
[]
http://www.example.org/Main/AnotherPage
('http', 'www.example.org', '/Main/AnotherPage', '', '', '')
['Main', 'AnotherPage']
More information about the Python-list
mailing list