[XML-SIG] Ideas for web/ package

Thomas B. Passin tpassin@home.com
Sat, 16 Feb 2002 11:08:31 -0500


[M.-A. Lemburg]


> "Thomas B. Passin" wrote:
> >
> > > Andrew Kuchling writes:
> > >  > As part of the RELAX NG stuff, I've discovered that urlparse() is
> > >  > really lenient in its parsing.  For example, the fragment value is
''
> > >  > if no fragment is supplied, so you can't distinguish between
> > >  > http://www.amk.ca and http://www.amk.ca# .  Unfortunately this
can't
> > >
> > > It's not clear that the distinction is meaningful in the RFC, as best
> > > as I can recall (it's been a couple of months since I looked at it).
> > >
> > But it can make a ***huge*** difference in RDF.
>
> Why is that ? I find it a bit awkward to add semantics to
> weakly defined corner cases.
>
 There have been several major threads on this recently in the RDF-interest
list.  Basically, they don't use uris the way most other people think they
were intended to be used, although they probably meet the letter of the law.
Actually they use uri references, which can include fragments.

The "#" has no real syntactic significance in RDF and also does not indicate
an xpointer expression.  All classes, resources, and data types (when they
finally work them out) are denoted by uri references that generally include
the "#" character (but don't have to).  In some cases, the "#" character is
present with no following characters, but that is still significant because
RDF requires a character-by-character match with the ***whole*** string,
including any fragments.  (I believe that the match occurs after unescaping
according to url encoding rules).

Note that there is a difference between a "uri" and a "uri reference", and
RDF uses the latter. An uri itself does not include any part of the fragment
identifier, while a uri reference does.

I don't think we should treat RDF as a minor corner case since a large part
of the W3C and DAML Semantic Web stuff builds on it.

There is some controvery as to whether anything should be retrievable from
such a uri reference.  If, for example, the uri reference is for an
xsd:integer data type, should it be possible to dereference anything (like a
definition of the data type)?  Currently, of course, you cannot.

It may be that no support is needed in any new url parsing package.  These
uri references appear as attribute values in almost all cases, and they
aren't supposed to be parsed for use.  However, sometimes it may be
convenient to separate out the uri from the fragment identifier so a new
identifier can be constructed.  So perhaps the only call really needed (if
it is in fact needed at all) would be

def get_uri_before_fragment(uri_reference,include_fragment_separator=1):
    # Return the part of the string up to and by default
    # INCLUDING the FIRST "#" character.

For RDF purposes there could still be some ambiguity because you can't be
sure that the base upon which a series of related identifiers are built
actually includes the "#" or not.  But in most actual cases it does.

Just to be clear about this, the RDF rules for creating a new identifier do
NOT say "take the base uri, construct a unique string for the new resource,
and make that a fragment identifier added onto the base uri using the rfc
rules for fragment identifiers".

Instead, they say in effect "take the string for the base uri reference,
construct a unique string for the new resource, and concatenate the two
strings."

Cheers,

Tom P