[XML-SIG] Proposed XBEL DTD

Greg Stein gstein@lyra.org
Fri, 02 Oct 1998 15:54:43 -0700


Okay guys... here comes another swipe at complexity :-)

Fred L. Drake wrote:
> ...
>  > A while ago I suggested adding the ISO Latin 1 entities (like HTML
>  > does) was that ruled out? It would keep XBEL more readable.
> 
>   Do we want just latin-1, or all of the standard ISO entities that
> have been defined for XML?  This would be closer to what HTML uses.
> I propose to include all those listed at
> <http://www.schema.net/entities/> if we're going to allow more than a
> minimal set.  (Note that five currently in previous version of the DTD
> are defined in the "Numeric and Special Graphic" entity set, not
> latin-1.)

I think you guys are making this overly complicated. I don't see any DTD
out there that specifies additional entities. Instead, standard
character encoding is used.

I seem to recall reading somewhere that part of the purpose of XML was
to get rid of all the random entities that HTML had (which were
generally not universally recognized anyhow), and to limit the entities
down to just a handful. Those handful of entities are builtin to XML
parsers and do not require specification in a DTD. (lt, gt, etc)

Punting this whole entity thing seems to make a lot of sense to me. If
machines are the primary users of XBEL, then you won't be using entities
anyhow, in lieu of standard encoding.

>  > Folders
>  >
>  > The name for the folder element is directly derived from common
>  > bookmark files. In some ways the 'folder' is like the 'sect' in the
>  > DocBook DTD. One interpretation of a folder is 'a grouped set of
>  > nodes' the fact that this is rendered as a real folder in a bookmark
>  > file is a presentational aspect. I see a real advantage when I can use
>  > a folder element as just a group of nodes inside the current folder
>  > without always being rendered as a separate folder. Such a
> ..
> 
>   I'm not sure I see the value in what's been defined largely as an
> interchange format.  Most applications would not understand or care
> about the distinction (assuming I understand it).  Perhaps "groups"
> can be defined using application-specific metadata?
> 
>         <bookmark href="http://xxx.lanl.gov/hypertex/"
>                   added="1998-01-27T13:09:45-05:00"
>                   visited="1998-03-24T10:12:10-05:00">
>           <title>HyperTeX FAQ</title>
>           <info>
>             <metadata scheme="application::GroupHandler">
>               <meta name="group">hypertext resources</meta>
>             </metadata>
>           </info>
>         </bookmark>

Can we name or define an application before adding it? Why put in
complexity if you don't have an outline need?

> ...
>  > Now both 'bookmark' and 'url' get %common.attrs;. When they are really
>  > being used this will automatically raise the question: Where to put
>  > the value for a common attribute? On a 'bookmark' or on the contained
>  > 'url'? Previously we removed the id attribute from bookmark to avoid
>  > this.

I believe the whole %common.attrs thing is bogus. It appears its only
purpose in the DTD is to make it more complicated. It doesn't define any
common attributes, and I bet nobody can come up with one that is common
across ALL elements in there (it HAS been applied to all elements, after
all).

What's the purpose? Torch it.

Don't say "for the future" ... add it in the future if you find you need
it.

>  >
>  > It seems that the 'bare' url is causing subtle problems. Maybe this
>  > was a bad decision. Should we undo that and merge url with bookmark?
>  > It doesn't cause a big upset since most of bookmarks content is
> ...
>   Well, that doesn't look so bad.  I'll go ahead and adjust the DTD.

Woo hoo! Simplification :-)

>  > Metadata
>  >
>  > I can follow the reasons for removing ID from metadata. But the
>  > ability to reference a block of metadata is now lost. I wonder of this
> 
>   I see two real options: put "id" attributes only on things that it
> make immediate sense to refer to via <alias>, and to put it in
> common.attrs.  The only place to link to anything other than folders
> and bookmarks (assuming that linking to a folder makes sense), is from
> outside the document.  I would expect this to happen rarely, if ever.
> >From this perspective, I'll vote for simplicity.

I believe I missed the whole rationale for <alias> to begin with. What
is its intended purpose?

Maybe some comments could be inserted into the DTD to describe the
general usage/purpose of the different elements?

>...
>  > Scheme
>  >
>  > What is the content of the scheme attribute?  Should it be an URL (or
>  > URN) or can it be any CDATA string?  Since an xbel probably uses only
> 
>   CDATA seems appropriate; there's no catalog of metadata schemes.
> Since <metadata> is overloaded with application-specific schemes, we
> cannot presume to predict the range of possible values.  I just took a

Ah, but I bet you can say that whatever it is, it must be unique so that
an application can differentiate its schemes/profiles from another. I'd
argue for the simplicity and uniqueness of a URL. If you make it a
CDATA, then people need to ask "what is the typical format? how do I
ensure uniqueness?" And they'll just stick a URL in there anyhow.

> ...
>  > ID/IDREF pair (or CDATA link attributes) and adding a <scheme
>  > name="a-long-formal-id" id="s1"/>) somewhere near the top of the
>  > document. This clearly documents which info schemes are being
>  > used. Others may exist inside the document but the ones mentioned can
>  > be used by reference (this could cut down on the file size in the case
>  > of formal scheme names, which tend to be quite long).
> 
>   I'd avoid this since I don't know of any formal registries for
> metadata schemes/profiles/whatever.  This can also be accomplished
> using general entities:
> 
>       <!DOCTYPE xbel ... [
>         <!ENTITY my-scheme "...long identifier...">
>       ]>
>       <xbel>
>         <bookmark href="http://xxx.lanl.gov/hypertex/"
>                   added="1998-01-27T13:09:45-05:00"
>                   visited="1998-03-24T10:12:10-05:00">
>           <title>HyperTeX FAQ</title>
>           <info>
>             <metadata scheme="&my-scheme;">
>               ...
>             </metadata>
>         </bookmark>
>       </xbel>

Actually, this begs the whole question: what the heck are metadata
elements doing in there anyhow? Why not use arbitrary XML elements (with
their potential for a namespace)? Your example above looks suspiciously
like namespaces. Here is an example:

<?xml:namespace ns="http://www.lyra.org/greg/stuff" prefix="gjs" ?>
<xbel>
...
      <info>
        <gjs:hello foo="bar"/>
        <gjs:another />
      </info>
</xbel>

I'd say punt the whole metadata thing and rely on applications to define
their own XML elements and place those into the <info> area.

> 
>  > Other linking issues (maybe consider these for next version)
>  >
>  > Are there other linking issues? What about a way to make xref's? Link
>  > to external xbel documents and/or external metadata
>  > information-nuggets?
> 
>   There comes a point at which we punt and require people to use
> XPointer.  Or wait for specific issues to crop up and make a new
> version.  ;-)

yah yah! :-)

Complexity is generally the reciprocal of usefulness. Some of the
complexity that is bulking it up (e.g. common.attrs) almost appears to
be like peacock feathers. :-)  Keep it simple... people will use it.

Cheers,
-g

--
Greg Stein (gstein@lyra.org)