[Python-ideas] URLs/URIs + pathlib.Path + literal syntax = ?

Stephen J. Turnbull stephen at xemacs.org
Wed Mar 30 00:06:07 EDT 2016


Chris Angelico writes:

 > 1) Path("./http://www.example.com")
 > 2) Path("http:/www.example.com")
 > 3) Path("file://http://www.example.com")
 > 
 > For scripts that need 100% dependable parsing, the third option will
 > be guaranteed to work.

No, the third should crap out with a syntax error on the colon, see
[1], which does not allow a port spec at all, and RFC 3986, which
doesn't allow colon in the host name ([1] references RFC 3986 for the
syntax of the host name).  Specifying the host to a "file:" URI gives
locally-defined behavior (eg, a Windows share), but in the most recent
attempt to deal with exactly these issues[1], it is legal.

The correct syntaxes per [1] and RFC 3986 are 

4)  Path("file:///http://www.example.com")
5)  Path("file://localhost/http://www.example.com")
6)  Path("file://[127.0.0.1]/http://www.example.com")
7)  Path("file://[::1]/http://www.example.com")

As far as I can tell the colon in "http:" is RFC 3986-legal, since it
has no URI syntactic meaning in the path component.  This isn't as
easy as it looks (which is why people are trying to delegate it to
something they think of as "simple").

There's an additional problem with trying to cram URIs and Path
together, which is that in a file system, "/a/b/symlink/../c" may
refer to any file system object depending on symlink's target which is
unknown, while as an URI path it refers to whatever "/a/b/c" refers
to, and nothing else.  (This is the semantic glitch I was thinking of
earlier.)

This means that URIs can be canonicalized syntactically, while doing
so with file system paths is risky.

Footnotes: 
[1]  https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-06



More information about the Python-ideas mailing list