[Python-Dev] urlparse.urlunsplit should be smarter about +

Stephen J. Turnbull stephen at xemacs.org
Mon May 10 08:51:36 CEST 2010


David Abrahams writes:
 > At Sat, 08 May 2010 11:04:47 -0500,
 > John Arbash Meinel wrote:

 > > Don't you need to register the "git+file:///" url for urlparse to
 > > properly split it?
 > 
 > Yes.  But the question is whether urlparse should really be so fragile
 > that every hierarchical scheme needs to be explicitly registered.

Exactly.  And the answer is "no".  The RFCs are quite clear that
hierarchical schemes are expected to be extremely common, and provide
several requirements for how they should be parsed, even by
nonvalidating parsers.

It's pretty clear to me that

    urlunsplit(urlsplit('git+file:///foo/bar/baz'))

should be the identity.  The remaining question is, "Should

    urlunsplit(urlsplit('git+file:/foo/bar/baz'))

be the identity?"  I would argue that if git+file is *not* registered,
it should be the identity, while there should be an optional
registry of schemes which may (or perhaps should) be canonicalized
(ie, a *missing* authority would be unsplit as an *empty* authority).

 > Surely ending with "+file" should be sufficient to have it recognized
 > as a file-based scheme

What's a "file-based scheme"?  If you mean an RFC 3986 hierarchical
scheme, that is recognized by the presence of the authority portion,
which is syntactically defined by the presence of "//" immediately
after the scheme ":" terms.  No need for any implicit magic.

In general, EIBTI applies here.  If a registry as described above is
implemented, I would argue that canonicalization should not happen
implicitly.  Nor should validation (eg, error or warning on a URI with
a scheme registered as hierarchical but lacking authority, or vice
versa).  The API should require an explicit statement from the user to
invoke those functionalities.  It might be useful for the OOWTDI API
to canonicalize/validate by default (especially given XSS attacks and
the like), but it should be simple for consenting adults to turn that
feature off.



More information about the Python-Dev mailing list