[XML-SIG] prepare_input_source and relative path

Sylvain Thénault Sylvain.Thenault at logilab.fr
Mon Feb 7 18:17:36 CET 2005


On Monday 07 February à 10:04, Uche Ogbuji wrote:
> On Mon, 2005-02-07 at 12:18 +0100, Sylvain Thénault wrote:
> > Hey,
> > 
> > I've been heating a bug which is already registered as #616431 in the
> > bug tracker. I find it very annoying and I've patched the function to
> > make it work before noticing a patch was already available. Is there any
> > reason to still wait to apply it ?
> > Anyway I've joined to this mail my version of the fix, which fix the
> > following cases:
> > 
> > - prepare_input_source('relative.xml', '/base') -> /base/relative.xml
> >   the sf submitted patch fix this one to.
> > 
> > - prepare_input_source('file:relative.xml', '/base') ->
> >   file:/base/relative.xml
> > 
> > 
> > this allow to have a xml file containing relative system identifiers
> > such as:
> > 
> >   <!ENTITY  plans SYSTEM "file:plans.xml">
> >   <!ENTITY  chatbot SYSTEM "chatbot.xml">
> > 
> > where parse(open('path to my xml file')) should not fail as it currently
> > does.
> > 
> > If this patch sounds good to you, I can check it in.
> 
> Wow.  I'm always amazed at some of bugs that have lived on for so long
> in PyXML.

isn't it...
 
> Your patch seems fine to me, but there is one area that is probably
> worth discussion.  I hope Mike Brown has a moment to chip in because
> he's an expert at such matters.
> 
> For the case of the file: URL scheme (BTW, you might want to consider
> replacing your variable name "proto" with "scheme"), it's probably OK to
> have 

thanks for fixing my url's vocabulary :)
 
> file:///base + file:relative.xml -> file:///base/relative.xml
> 
> Since the file scheme's semantics are so wooly.  But this wouldn't make
> sense if you replaced "file" with "http".

yep. But notice my patch doesn't change anything in that case, which
will so behave according to urlparse.urljoin's behaviour:

>>> urlparse.urljoin('file:///base', 'file:relative.xml')
'file:///relative.xml'
>>> urlparse.urljoin('file:///base', 'http:relative.xml')
'http:relative.xml'
 
> Then there's the matter of a base URI given as 
> 
> /base
> 
> in 4Suite we require all base URIs to be proper base URIs (so they must
> at least have a scheme).  I think this is a reasonable restriction based
> on RFC requirements.  Is there a valid user case where there would not
> be a proper base URI, anyway?

always having proper URI as base sounds like a reasonable restriction to
me too, and I can't see user case where it would not. But we may have
backward compat problem here if decide to care about it. Maybe
InputSource.setSystemId could check for scheme presence, and if not add
a file: and issue a deprecation warning ?

-- 
Sylvain Thénault                               LOGILAB, Paris (France).

http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org



More information about the XML-SIG mailing list