[XML-SIG] file urls in urllib

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Thu, 8 Mar 2001 08:43:35 +0100


> suppose we agree that file:///c:/autoexec.bat should work (this is
> the case of a collapsed localhost).  then the processing model is
> that if a url starts with file:/// then remove that prefix, and
> consider the remainder (because /c:/autoexec.bat is not a proper
> local file).

Perhaps. Processing of file: URLs happens in a system-dependent
manner, so it could use one procedure on one system and another
procedure on another.

> ok, now do that to file:///etc/passwd and you get etc/passwd.

Sure. And that <path> denotes the file /etc/passwd, on Unix.

> so that means a parser has to look at c:/autoexec.bat and etc/passwd
> and conclude that because the first segment looks like a drive
> letter, it is ok, while etc/passwd needs a leading slash.

A different parser is used on Windows and Unix, so file:///etc/passwd
could mean different things on Windows and Unix. On Windows, it might
be ill-formed: for an absolute path, you need a drive letter (or else
you need to learn the current drive based on some magic processing
context); or it could mean \\etc\passwd (i.e. etc being the topmost
hierarchy level, if you allow file: URLs to denote UNC names). On
Unix, it clearly means /etc/passwd.

> i think it is fair to say that rfc1738 is ambiguous since they only
> give an mvs example.  but nfs urls are defined clearly to match my
> "cleaner" notion of purely lexical url processing,

Yes, but that is for the nfs: scheme; it does not tell anything about
the file: scheme.

> as per http://www.faqs.org/rfcs/rfc2396.html :
>    Note that the initial "/" that introduces the <url-path> of an NFS
>    URL must not be passed to the server for multi-component lookup since
>    the pathname is to be evaluated relative to the public filehandle
>    directory.  For example, if the public filehandle is associated with
>    the server's directory "/a/b/c" then the URL:
>         nfs://server/d/e/f
>    will be evaluated with a multi-component lookup of the path
>    "d/e/f" relative to the server's directory

That means something non-obvious: WebNFS (RFC 2054) has the notion of
a "public filehandle", which is a all-null file handle in NFSv2, and a
zero-length file handle in NFSv3; the directory associated with the
public filehandle is a matter of server configuration. So a "relative
path" starts at the directory associated with the public filehandle;
an "absolute path" starts with the directory associated with / on the
server. That does not readily translate to the file: scheme.

> we'd like certain functions to "just work" and handle either a url
> or a local host path -- this is certainly what we'd like when we
> specify an xml source on a command line.

Well, Guido argues that file names and URLs should not be mixed in XML
processing; that there should be separate APIs for putting in file
names and URLs. That is currently not the case, but it probably should
be. Then it is the application's matter to decide whether a string
they have is a file name or an URL.

> so in that case, if a processor sees etc/passwd, it should *not* add
> a leading slash, since it is relative to either current working
> directory or the current url base, whichever you like

It should be clear from the context whether a relative thing is a
relative file name or a relative URL; e.g. when it is passed by the
user, it is normally a relative file name, if it is an entity
definition, it is a relative URL.

> It should be a purely lexical operation.

That is clearly not the intention of the RFC; the conversion in the
VMS example shows that knowledge about the local file system is
required to process a file: URL.

Regards,
Martin