[XML-SIG] file urls in urllib

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 7 Mar 2001 23:15:17 +0100


> rfc 1738 states:
> 
>    A file URL takes the form:
>        file://<host>/<path>
>    where <host> is the fully qualified domain name of the system on
>    which the <path> is accessible, and <path> is a hierarchical
>    directory path of the form <directory>/<directory>/.../<name>.
>    [...]
>    As a special case, <host> can be the string "localhost" or the empty
>    string; this is interpreted as `the machine from which the URL is
>    being interpreted'.
> 
> 
> So this would mean that if localhost is implied, all file urls should have (at least) three slashes.
> Assuming that the rfc means that the "/" is purely syntactic, what you should expect to work is:
>    file:////etc/passwd        (4 slashes, because of the leading "/")
>    file:///c:\autoexec.bat
>    file:///\\drv\autoexec.bat
>    file://///drv/autoexec.bat       (5 slashes, since forward slashes work on win32 too)

That clearly is not the intention of the RFC. It "essentially" says
that <path> is a slash-separated list of directories, forming a
hierarchy; ie. the intention is that it does not start with a
slash. So /etc/passwd clearly is

file:///etc/passwd

It then gives the example of a VMS file name
DISK$USER:[MY.NOTES]NOTE123456.TXT, saying that it might become (*)
file://vms.host.edu/disk$user/my/notes/note12345.txt. So the intention
clearly is that hierarchy is presented using /. Apparently,
translation between a file name and a <path> is meant to be executed
in a system-dependent manner, but many systems failed to define a
procedure for doing so. Considering that one needs to distinguish the
drv case, the logical form would be

file://C:/autoexec.bat

Regards,
Martin

(*) The 'might' probably refers to the fact that the URL introduces
vms.host.edu, which was not mentioned before.