Better way to sift parts of URL . . .

Tue Apr 18 18:05:55 EDT 2006

"Ben Wilson" <dausha at gmail.com> wrote in message
news:1145392079.720766.122310 at z34g2000cwc.googlegroups.com...
> I am working on a script that splits a URL into a page and a url. The
> examples below are the conditions I expect a user to pass to the
> script. In all cases, "http://www.example.org/test/" is the URL, and
> the page comprises parts that have upper case letters (note, 5 & 6 are
> the same as earlier examples, sans the 'test').
>
>    1. http://www.example.org/test/Main/AnotherPage (page =
> Main/AnotherPage)
>    2. http://www.example.org/test/Main (page = Main + '/' +
> default_page)
>    3. http://www.example.org/test (page = default_group + '/' +
> default_page)
>    4. http://www.example.org/test/  (page = default_group + '/' +
> default_page)
>    5. http://www.example.org/  (page = default_group + '/' +
> default_page)
>    6. http://www.example.org/Main/AnotherPage (page = Main/AnotherPage)
>
> Right now, I'm doing a simple split off condition 1:
>
>   page = '.'.join(in.split('/')[-2:])
>   url = '/'.join(in.split('/')[:-2]) + '/'
>
> Before I start winding my way down a complex path, I wanted to see if
> anybody had an elegant approach to this problem.
>
> Thanks in advance.
> Ben
>

Standard Python includes urlparse.  Possible help?

-- Paul

import urlparse

urls = [
    "http://www.example.org/test/Main/AnotherPage", # (page =
Main/AnotherPage)
    "http://www.example.org/test/Main", # (page = Main + '/' + default_page)
    "http://www.example.org/test", # (page = default_group + '/' +
default_page)
    "http://www.example.org/test/", #  (page = default_group + '/' +
default_page)
    "http://www.example.org/", # (page = default_group + '/' + default_page)
    "http://www.example.org/Main/AnotherPage",
    ]

for u in urls:
    print u
    parts = urlparse.urlparse(u)
    print parts
    scheme,netloc,path,params,query,frag = parts
    print path.split("/")[1:]
    print

prints:
http://www.example.org/test/Main/AnotherPage
('http', 'www.example.org', '/test/Main/AnotherPage', '', '', '')
['test', 'Main', 'AnotherPage']

http://www.example.org/test/Main
('http', 'www.example.org', '/test/Main', '', '', '')
['test', 'Main']

http://www.example.org/test
('http', 'www.example.org', '/test', '', '', '')
['test']

http://www.example.org/test/
('http', 'www.example.org', '/test/', '', '', '')
['test', '']

http://www.example.org/
('http', 'www.example.org', '/', '', '', '')
['']

http://www.example.org
('http', 'www.example.org', '', '', '', '')
[]

http://www.example.org/Main/AnotherPage
('http', 'www.example.org', '/Main/AnotherPage', '', '', '')
['Main', 'AnotherPage']