[XML-SIG] Proposed XBEL DTD

Fred L. Drake Fred L. Drake, Jr." <fdrake@acm.org
Fri, 2 Oct 1998 12:51:32 -0400 (EDT)


--UK/cpViHvX
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit


Marc van Grootel writes:
 > Just before the weekend I've got a few possible issues for the XBEL
 > DTD. I say possible because it's not my intention to upset the current
 > design. After reading some stuff about metadata I can see that it's a

  Ha, you're too late!  He, he, he.... oh, well.  Andrew, please
ignore the public text I sent for XBEL this morning.  ;-)
  I'll attach an updated DTD below for continued discussion.

 > A while ago I suggested adding the ISO Latin 1 entities (like HTML
 > does) was that ruled out? It would keep XBEL more readable.

  Do we want just latin-1, or all of the standard ISO entities that
have been defined for XML?  This would be closer to what HTML uses.
I propose to include all those listed at
<http://www.schema.net/entities/> if we're going to allow more than a
minimal set.  (Note that five currently in previous version of the DTD 
are defined in the "Numeric and Special Graphic" entity set, not
latin-1.)

 > Folders
 > 
 > The name for the folder element is directly derived from common
 > bookmark files. In some ways the 'folder' is like the 'sect' in the
 > DocBook DTD. One interpretation of a folder is 'a grouped set of
 > nodes' the fact that this is rendered as a real folder in a bookmark
 > file is a presentational aspect. I see a real advantage when I can use
 > a folder element as just a group of nodes inside the current folder
 > without always being rendered as a separate folder. Such a
...

  I'm not sure I see the value in what's been defined largely as an
interchange format.  Most applications would not understand or care
about the distinction (assuming I understand it).  Perhaps "groups"
can be defined using application-specific metadata?

        <bookmark href="http://xxx.lanl.gov/hypertex/"
                  added="1998-01-27T13:09:45-05:00"
                  visited="1998-03-24T10:12:10-05:00">
          <title>HyperTeX FAQ</title>
	  <info>
	    <metadata scheme="application::GroupHandler">
	      <meta name="group">hypertext resources</meta>
	    </metadata>
	  </info>
	</bookmark>

 > shortcuts). However there's an asymmetry: an 'url' alias does not
 > point to an element with info, a 'folder' alias does. So resolving the
 > 'url' alias (including the info) is now different from resolving a
 > 'folder' alias.  This is a result from our decision to accept 'bare'
 > url's. This moved the id attribute from bookmark to the url element.

  This will probably be a non-issue for many bookmark-specific
applications, but might create problems for general XML tools.  In my
current implementation, the internal node created for a <bookmark> is
the same as that created for a <url>, and all the right stuff happens
by magic.  Since the location of attribtutes is currently (and
appropriately) constrained, this doesn't present any real issues.
Also, note that an <alias> can refer to a <bookmark> that doesn't have 
an <info> child at all, so the argument doesn't appear compelling.

 > Now both 'bookmark' and 'url' get %common.attrs;. When they are really
 > being used this will automatically raise the question: Where to put
 > the value for a common attribute? On a 'bookmark' or on the contained
 > 'url'? Previously we removed the id attribute from bookmark to avoid
 > this.
 > 
 > It seems that the 'bare' url is causing subtle problems. Maybe this
 > was a bad decision. Should we undo that and merge url with bookmark?
 > It doesn't cause a big upset since most of bookmarks content is

  My current implementation attempts to "minimize" the generate output 
by using a bare <url> if it doesn't cause a loss of information.  What 
I find by looking at the output for my general bookmarks is twofold:
(1) most entries turn into <url> elements, which are substantially
more compact for viewing by a human, and (2) I've lost a lot of
descriptions by testing older versions of Grail bookmark code on
"live" data.  ;-(
  I think (hope?) my point is that brevity of markup is pretty
valuable for this application.  Perhaps <url> should be retained for
this reason, and perhaps not.  There is good thinking in putting "id"
on the <bookmark> element, however, and using just one element would
simplify processing somewhat.
  But let's take a look at the size difference anyway, since I've
brought it up.  This uses the previous version of XBEL:

        <url href="http://xxx.lanl.gov/hypertex/"
             added="1998-01-27T13:09:45-05:00"
             visited="1998-03-24T10:12:10-05:00"
          >HyperTeX FAQ</url>

  This is the same bookmark, without <url>:

        <bookmark href="http://xxx.lanl.gov/hypertex/"
                  added="1998-01-27T13:09:45-05:00"
                  visited="1998-03-24T10:12:10-05:00">
          <title>HyperTeX FAQ</title>
	</bookmark>

  Well, that doesn't look so bad.  I'll go ahead and adjust the DTD.


 > Metadata
 > 
 > I can follow the reasons for removing ID from metadata. But the
 > ability to reference a block of metadata is now lost. I wonder of this

  I see two real options: put "id" attributes only on things that it
make immediate sense to refer to via <alias>, and to put it in
common.attrs.  The only place to link to anything other than folders
and bookmarks (assuming that linking to a folder makes sense), is from 
outside the document.  I would expect this to happen rarely, if ever.
From this perspective, I'll vote for simplicity.

 > It is a little more complex but more powerful to be able to
 > reference an info-nugget (it could be done by copying via an entity
 > reference though). Not all info follows the folder hierarchy. On the

  Perhaps the need to linking will be made clear if you can describe
an application of it?

 > Scheme
 > 
 > What is the content of the scheme attribute?  Should it be an URL (or
 > URN) or can it be any CDATA string?  Since an xbel probably uses only

  CDATA seems appropriate; there's no catalog of metadata schemes.
Since <metadata> is overloaded with application-specific schemes, we
cannot presume to predict the range of possible values.  I just took a 
look at the <META> element from HTML 4.0, and it looks like there's a
slightly different approach described there:  <HEAD> has a "profile"
attribute which names what I've been calling the scheme, and the
<META> attribute "scheme" is used to describe the notation of the
metadata value.  Perhaps the <metadata> "scheme" should be named
"profile", and add an HTML 4.0-style "scheme" to <meta>, primarily to
allow applications which collect information from HTML pages to store
it without loosing fidelity.

 > ID/IDREF pair (or CDATA link attributes) and adding a <scheme
 > name="a-long-formal-id" id="s1"/>) somewhere near the top of the
 > document. This clearly documents which info schemes are being
 > used. Others may exist inside the document but the ones mentioned can
 > be used by reference (this could cut down on the file size in the case
 > of formal scheme names, which tend to be quite long).

  I'd avoid this since I don't know of any formal registries for
metadata schemes/profiles/whatever.  This can also be accomplished
using general entities:

      <!DOCTYPE xbel ... [
        <!ENTITY my-scheme "...long identifier...">
      ]>
      <xbel>
        <bookmark href="http://xxx.lanl.gov/hypertex/"
                  added="1998-01-27T13:09:45-05:00"
                  visited="1998-03-24T10:12:10-05:00">
          <title>HyperTeX FAQ</title>
	  <info>
	    <metadata scheme="&my-scheme;">
	      ...
	    </metadata>
	</bookmark>
      </xbel>


 > Other linking issues (maybe consider these for next version)
 > 
 > Are there other linking issues? What about a way to make xref's? Link
 > to external xbel documents and/or external metadata
 > information-nuggets? 

  There comes a point at which we punt and require people to use
XPointer.  Or wait for specific issues to crop up and make a new
version.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191



--UK/cpViHvX
Content-Type: text/xml
Content-Description: YAX (Yet Another XBEL)
Content-Disposition: inline;
	filename="xbel.dtd"
Content-Transfer-Encoding: 7bit

<!-- This is the XML Bookmarks Exchange Language, version 1.0.  It should
     be used with the formal public identifier:

	-//IDN python.org//DTD XML Bookmark Exchange Language 1.0//EN//XML

     One valid system identifier at which this DTD will remain
     available is:

	http://www.python.org/topics/xml/dtds/xbel-1.0.dtd

     More information the on the DTD, including reference
     documentation, is available at:

	http://www.python.org/topics/xml/xbel/

    Attributes which take date/time values should encode the value
    according to the W3C NOTE on date/time formats:

	http://www.w3.org/TR/NOTE-datetime
  -->

<!ENTITY ISOlat1
         PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN//XML"
                "http://www.schema.net/public-text/ISOlat1.pen">
<!ENTITY ISOlat2
         PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN//XML"
                "http://www.schema.net/public-text/ISOlat2.pen">
<!ENTITY ISOnum
         PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML"
                "http://www.schema.net/public-text/ISOnum.pen">
<!ENTITY ISOpub
         PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN//XML"
                "http://www.schema.net/public-text/ISOpub.pen">
<!ENTITY ISOtech
         PUBLIC "ISO 8879:1986//ENTITIES General Technical//EN//XML"
                "http://www.schema.net/public-text/ISOtech.pen">
<!ENTITY ISOdia
         PUBLIC "ISO 8879:1986//ENTITIES Diacritical Marks//EN//XML"
                "http://www.schema.net/public-text/ISOdia.pen">
<!ENTITY ISOgrk1
         PUBLIC "ISO 9573-15:1993//ENTITIES Greek Letters//EN//XML"
                "http://www.schema.net/public-text/ISOgrk1.pen">
<!ENTITY ISOgrk2
         PUBLIC "ISO 9573-15:1993//ENTITIES Monotoniko Greek//EN//XML"
                "http://www.schema.net/public-text/ISOgrk2.pen">
<!ENTITY ISOgrk3
         PUBLIC "ISO 8879:1986//ENTITIES Greek Symbols//EN//XML"
                "http://www.schema.net/public-text/ISOgrk3.pen">
<!ENTITY ISOgrk4
         PUBLIC "ISO 8879:1986//ENTITIES Alternative Greek Symbols//EN//XML"
                "http://www.schema.net/public-text/ISOgrk4.pen">



<!ENTITY % common.attrs	"">
<!ENTITY % node.attrs	"id	  ID	#IMPLIED
			 added	  CDATA	#IMPLIED">
<!ENTITY % url.attrs	"href	  CDATA	#REQUIRED
                         visited  CDATA	#IMPLIED
                         modified CDATA	#IMPLIED">

<!ENTITY % nodes	"bookmark|folder|alias|separator">


<!ELEMENT xbel (title?, info?, desc?, (%nodes;)*)>
<!ATTLIST xbel
            version  CDATA	#FIXED "1.0"
>
<!ELEMENT title	     (#PCDATA)>
<!ATTLIST title
	    %common.attrs;
>

<!--=================== Info ======================================-->

<!ELEMENT info (metadata*)>
<!ATTLIST info
	    %common.attrs;
>

<!ELEMENT metadata (meta*)>
<!ATTLIST metadata
	    %common.attrs;
	    scheme   CDATA	#IMPLIED
>
<!ELEMENT meta (#PCDATA)>
<!ATTLIST meta
	    %common.attrs;
	    name     CDATA	#REQUIRED
>

<!--=================== Folder ====================================-->

<!ELEMENT folder   (title?, info?, desc?,(%nodes;)*)>
<!ATTLIST folder
	    %common.attrs;
            %node.attrs;
            folded   (yes|no)	'yes'   
>

<!--=================== Bookmark ==================================-->

<!ELEMENT bookmark (title, info?, desc?)>
<!ATTLIST bookmark
	    %common.attrs;
	    %node.attrs;
            %url.attrs;
>

<!ELEMENT desc       (#PCDATA)>
<!ATTLIST desc
	    %common.attrs;
>

<!--=================== Separator =================================-->

<!ELEMENT separator EMPTY>
<!ATTLIST separator
	    %common.attrs;
>

<!--=================== Alias =====================================-->

<!ELEMENT alias EMPTY>
<!ATTLIST alias
	    %common.attrs;
            ref       IDREF	#REQUIRED
>

--UK/cpViHvX--