[XML-SIG] prepare_input_source and relative path

Mike Brown mike at skew.org
Tue Feb 8 12:30:53 CET 2005


Uche Ogbuji wrote:
> Bleah.  I guess that's why Mike Brown has had to create fixed versions
> of all the Python stdlib URI functions for 4Suite :-)

Yes. All of the URL functions in stdlib are either undocumented and for use
within stdlib only, or are about 8 years out of date. Or both.

I'm using Ft.Lib.Uri as proving grounds for APIs that I'll eventually propose
for inclusion in urllib2. I don't anticipate making any headway on such
proposals for a while, though.

In Ft.Lib.Uri everything is RFC 3986 compliant (I was tracking development
of the RFC), except for the percent-encoding APIs, which, like every other,
are fraught with various gotchas that I wouldn't want to have to explain to
anyone in any more detail than "everything you know is wrong" :) I hope to
have those looking better "soon" but it involves some serious brain twisting.

Relevant to this discussion, the API for resolution of URI references to
absolute form -- Ft.Lib.Uri.Absolutize() -- is stable, and the algorithm it
impements is well-defined by the RFC. The algorithm does not change for
different URI schemes; it works the same for 'file' as for 'http'.

It would not be too hard to copy Absolutize() and BaseJoin() from Ft.Lib.Uri
over into PyXML as a temporary workaround until urllib2 is knocked into shape.
I would just make it and its dependent functions semi-private, and change the
exceptions to ValueErrors.

> > > Then there's the matter of a base URI given as 
> > > 
> > > /base
> > > 
> > > in 4Suite we require all base URIs to be proper base URIs (so they must
> > > at least have a scheme).  I think this is a reasonable restriction based
> > > on RFC requirements.  Is there a valid user case where there would not
> > > be a proper base URI, anyway?
> > 
> > always having proper URI as base sounds like a reasonable restriction to
> > me too, and I can't see user case where it would not. But we may have
> > backward compat problem here if decide to care about it. Maybe
> > InputSource.setSystemId could check for scheme presence, and if not add
> > a file: and issue a deprecation warning ?

Adding 'file:' blindly can cause difficulties or unexpected results.
These are all very different things:

'xyz'         - relative URI reference (relative path)
'/xyz'        - relative URI reference (absolute path)
'file:xyz'    - absolute URI (undef authority, non-hierarchical path)
'file:/xyz'   - absolute URI (undef authority, absolute path; dubious usage)
'file://xyz'  - absolute URI (authority xyz; no path)
'file:///xyz' - absolute URI (empty authority, absolute path)

And then there's what happens when you start throwing in dot segments
('file:./xyz')... and people guessing at how to convert an OS path into a
URI reference... it gets ugly.

It is better to just check for the presence of a scheme and reject the
base if it doesn't have one. Or, if you can tolerate receiving a result
that has no scheme, prepend a dummy scheme, apply the proper resolution
algorithm, and strip the scheme from the result. Again this may not give
the results that the user expected, but IMHO there's no need to give the
user what they expect when what they expect is wrong :)

-Mike


More information about the XML-SIG mailing list