[XML-SIG] prepare_input_source and relative path

Wed Feb 9 15:39:38 CET 2005

On Tuesday 08 February à 19:01, Mike Brown wrote:
> Sylvain Thénault wrote:
> > I guess you're right. I wrote this patch because it was fixing my
> > problem. Now if it doesn't take too much time to have every cases
> > correctly fixed by implementing RFC 3986, I may take some time to do so
> > or to help having it done. And if parts of the job is already done in
> > 4suite, that's great. However what's in 4suite, what's not and need to
> > be implemented is not yet clear to me.
> 
> The current version of Ft.Lib.Uri is here:
> http://cvs.4suite.org/viewcvs/4Suite/Ft/Lib/Uri.py?view=markup [1]
> 
> If you see "rfc2396bis" in the doc strings, you may safely interpret
> them to mean "RFC 3986".
> 
> 
> The functions that you should look at are the following:
> 
> MakeUrllibSafe(uriRef)
> ======================
> This exists in order to convert a proper URI reference into one that
> can be handled by urllib.urlopen(). It does the following:
> 1. If the reference contains an Internationalized Domain Name,
>    recodes it so that it is resolvable. (Py 2.3+ only)
> 2. Strips the fragment component, if any. 
> 3. Ensures that the reference is a byte string, not unicode.
> 4. On Windows, assumes that the first ':' appearing in the path
>    component is part of a drivespec, and converts it to '|'.
> 
> If you port this function, the reference to PercentDecode() may be replaced 
> with urllib.unquote(), but you must move the byte string check (#3, above) to 
> occur before calling unquote. The references to the functions SplitUriRef and 
> UnsplitUriRef can be replaced with urlsplit() and urlunsplit() from the 
> urlparse module.
> 
> 
> Absolutize(uriRef, baseUri)
> ===========================
> This does strict merging of a URI reference and a base URI. The base URI 
> *must* be absolute (must have a scheme). If you port this function, the
> UriException may be replaced with a ValueError, and SplitUriRef &
> UnsplitUriRef may be replaced with their urlparse equivalents, as
> mentioned above. The RemoveDotSegments function must also be ported and
> should be made semi-private because it is not for general use. I've
> implemented it using two segment stacks, as alluded to in the spec,
> rather than the explicit string-walking algorithm that would be too
> inefficient.
> 
> 
> BaseJoin(base, UriRef)
> ======================
> This does lenient merging of a base URI and a URI reference (note the
> argument order is different than that of Absolutize). It allows the base
> URI to be a relative reference. In such cases, we use a dummy scheme
> (we don't say "assume 'file:' because the spec says all schemes must be
> resolved the same), run it through Absolutize, and then remove the scheme
> from the result. If you port this function, you will need to port the
> IsAbsolute function, which just checks to see if the URI has a scheme.
> I prefer to use a regex for this, as it is fast and accurate (':' can
> appear in more than one place in a URI reference, so it is not safe to
> assume that its presence means there is a scheme).

thanks a lot. Actually almost all the work is already done right there. 
Here is what I've worked on. Once we'll reach a consensus, I'll add that
to pyxml. So I've joined to this mail:

- a light version of 4Suite Uri.py including the following functions:
  SplitUriRef, UnsplitUriRef (it was really less annoying to use those
  two functions than the equivalent urllib's ones), Absolutize,
  MakeUrllibSafe, _RemoveDotSegments, BaseJoin, GetScheme and
  IsAbsolute. With the presented solution, the 3 last ones are not used
  and could be removed, but I've kept them in for now. Every tests for
  Absolutize from 4suite are still passing.

- a modified version of saxutils, expecting the Uri module above to be
  in the _xmlplus directory (ie importable as xml.Uri). I've refactored
  prepare_input_source to ease testing of the URI merging stuff.

- a unittest file, which include some test cases for the URI merging
  function. Please take a look at the existant test cases to check
  everything looks fine to you. If you have other case to add, please let
  me know (or maybe can I add this file to the cvs first). Notice that
  to run the tests, you should have a "quotes.xml" file in the same
  directory as the test file (there is one in the test directory of
  pyxml). As a bonus, I've converted the escape function test from
  test_utils into a unittest in the same file.

Anyway, having SplitUriRef/UnsplitUriRef replacing 
urlparse.urlsplit/urlunsplit and Absolutize or BaseJoin replacing
urlparse.urljoin would definitly be the right thing.

-- 
Sylvain Thénault                               LOGILAB, Paris (France).

http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_saxutils.py
Type: text/x-python
Size: 2062 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20050209/631a585c/test_saxutils-0001.py
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Uri.py
Type: text/x-python
Size: 16423 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20050209/631a585c/Uri-0001.py
-------------- next part --------------
A non-text attachment was scrubbed...
Name: saxutils.py
Type: text/x-python
Size: 24925 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20050209/631a585c/saxutils-0001.py