Seeking wisdom on URI path parameters.

Alan Kennedy alanmk at hotmail.com
Tue May 27 16:20:46 EDT 2003


Greetings all,

Since this relates to URI parsing, which python supplies several methods
for, I thought that this would as good a place as any to ask :-

Background: URIs (used to) consist of a number of parts. They are

1. Scheme, e.g. "HTTP"
2. Authority, e.g. "www.python.org:80"
3. Path, the resource to be retrieved/constructed, e.g. "/index.html"
4. Path params, e.g. ";SessionID=123ABC456DEF"
5. Query, a set of name=value parameters, e.g. "?name=value"
6. FragId, a fragment identifier, e.g. "#toc"

When you use urlparse.urlparse to parse a url, it returns a 6-tuple
containing the items above.

A new function, introduced in python 2.2, urlparse.urlsplit, returns
only a 5-tuple: items 3 and 4 have been combined (or not split in the
first place).

Obviously, this in recognition of the fact that RFC 2396 states that
each segment of the path can have path parameters. The relevant text is 

"""
   The path may consist of a sequence of path segments separated by a
   single slash "/" character.  Within a path segment, the characters
   "/", ";", "=", and "?" are reserved.  Each path segment may include a
   sequence of parameters, indicated by the semicolon ";" character.
   The parameters are not significant to the parsing of relative
   references.
"""

So the following URL should be legal:

http://python.org/dir2;date=20030527/subdir;version=MIDDAY/index.html

It might not resolve to anything on python.org (maybe wrongly? see
below), but its a syntactically legal URL, AFAICT from RFC 2396.

Which means that to fully and correctly parse URLs, I would have to
further take the output of urlparse.urlsplit and split out the various
path segments and parameters.

I know it wouldn't be hard to parse: I'm just wondering how many
examples of usage of these types of parameters are out there?

I'm writing some URL fetching stuff right now, for automated web
testing, which I hope to be as generic as possible. I'm wondering if I
should attempt to model these path parameters in my application?

Any examples of usage out there? Are they used in anything like webdav?
SOAP? Any experimental web/application servers?

I know that some application servers use "the path parameter" to contain
the user session id (I think J2EE mandates it as a fallback when cookies
aren't enabled? And doesn't Webware do something similar?). But this is
the old style usage of path parameters, where there was only one set of
parameters, and which is compatible with the 6-tuple
urlparse.urlparse().

Lastly, since RFC 2396 says "The parameters are not significant to the
parsing of relative references", does this mean that in order to be
compliant with RFC 2396, I have to parse out these parameters in order
to correctly resolve URLs? E.G. When parsing a returned resource, do I
have to turn 

/dir;im_ignoring=this/subdir;and=this/index.html

into 

/dir/subdir/index.html

before I can resolve URLs relative to index.html? Or should I just treat
the "/dir;im_ignoring=this/subdir;and=this/" as the base directory?

regards,

-- 
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan:              http://xhaus.com/mailto/alan




More information about the Python-list mailing list