From lisarein@finetuning.com  Tue Sep  1 17:47:52 1998
From: lisarein@finetuning.com (Lisa Rein)
Date: Tue, 01 Sep 1998 09:47:52 -0700
Subject: [XML-SIG] Re: XML-SIG digest, Vol 1 #86 - 1 msg
References: <199809011600.MAA14717@python.org>
Message-ID: <35EC2538.F78FC6E@finetuning.com>

Actually you guys I'm doing a story on these programs and I gotta ask
you -- is CVS really any good.  Are there specific needs you've found
that it can't address?  Just trying to get a reality check on what's out
there.  Thanks,

lisa rein

http://www.finetuning.com/collect.html

xml-sig-admin@python.org wrote:
> 
> Send XML-SIG maillist submissions to
>         xml-sig@python.org
> 
> To subscribe or unsubscribe via the web, visit
>         http://www.python.org/mailman/listinfo/xml-sig
> or, via email, send a message with subject or body 'help' to
>         xml-sig-request@python.org
> You can reach the person managing the list at
>         xml-sig-admin@python.org
> 
> (When replying, please edit your Subject line so it is more specific than
> "Re: Contents of XML-SIG digest...")
> 
>                                                   ------------------------------------------------------------------------
> Today's Topics:
> 
>   1. Re: Could we use a public CVS tree? (Andrew M. Kuchling)
> 
>                                                   ------------------------------------------------------------------------
> 
> Subject: Re: [XML-SIG] Could we use a public CVS tree?
> Date: Mon, 31 Aug 1998 17:26:46 -0400 (EDT)
> From: "Andrew M. Kuchling" <akuchlin@cnri.reston.va.us>
> To: xml-sig@python.org
> References: <akuchlin@cnri.reston.va.us>
>      <13798.47324.921033.764667@amarok.cnri.reston.va.us>
>      <UTC199808310906.LAA01384.jack@snelboot.cwi.nl>
> 
> Jack Jansen writes:
> >I think it definitely would help. However, write access may pose a problem,
> >especially if we don't want everyone to be able to change every bit of the
> >tree. What may be easier is semi-automatic updates, with a human (i.e. you:-)
> >in the loop. Developers would mail diffs to you in an easy to recognize way (a
> >different mail alias would probably be easiest), and after a quick check you
> >would just feed the mail messages into patch and do the commit.
> 
>         I agree that write-access is less vital; people are working on
> separate components of the package, so it's quite simple for me to
> drop in the latest sgmlop.c or saxlib or whatever, and commit the
> resulting changes.
> 
>         I'll be working on setting this up, and hope to have it
> operational later this week.
> 
> --
> A.M. Kuchling                   http://starship.skyport.net/crew/amk/
> You'll have to leave my meals on a tray outside the door because I'll be
> working pretty late on the secret of making myself invisible, which may take
> me almost until eleven o'clock.
>     -- S.J. Perelman, "Captain Future, Block That Kick!"


From rherath@cs.monash.edu.au  Wed Sep  2 03:05:45 1998
From: rherath@cs.monash.edu.au (Ravindra N Herath)
Date: Wed, 2 Sep 1998 12:05:45 +1000 (EST)
Subject: [XML-SIG] XML sample files
Message-ID: <Pine.ULT.4.02.9809021203470.17607-100000@ip4.cs.monash.edu.au>

I am new to the list and am learning XML, could someone help me out and
post a sample file that is not complex or direct me to one, so that I have
a basic understanding of XML.

Thanks,

Ravi


From fredrik@pythonware.com  Wed Sep  2 09:16:43 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Wed, 2 Sep 1998 09:16:43 +0100
Subject: [XML-SIG] XML sample files
Message-ID: <016c01bdd64b$9a458320$f29b12c2@pythonware.com>

>I am new to the list and am learning XML, could someone help me out and
>post a sample file that is not complex or direct me to one, so that I have
>a basic understanding of XML.

Check the XML topic guide:
http://www.python.org/topics/xml/

you may wish to start with:
http://www.stud.ifi.uio.no/~larsga/download/xml/xml_eng.html

for further study, here's some link collections:
http://www.sil.org/sgml/xml.html
http://www.ucc.ie/xml/

Cheers /F
fredrik@pythonware.com
http://www.pythonware.com


From jprewitt@justintime.com  Thu Sep  3 01:55:27 1998
From: jprewitt@justintime.com (Johnny Prewitt)
Date: Wed, 02 Sep 1998 17:55:27 -0700
Subject: [XML-SIG] XML Parsing, SF
Message-ID: <35EDE8FF.E251D73B@justintime.com>

Just in Time Solutions is a serious, pre I.P.O., product development
organization..

Just in Time Solutions is developing and deploying Internet bill
presentment software.
We are leaders in Internet billing and are well positioned to capitalize
on this emerging market.

We are seeking a Senior Engineer to support our development by analyzing
existing
parser routines and tools and selecting and implementing the appropriate
tool. This effort
will involve JAVA, C++, XML and development of APIs. Upon implementation
of a
parser solution, the Engineer will be assigned other development tasks.

As a young, 60-employee organization, based in San Francisco, Just in
Time Solutions
offers a relaxed, casual work environment and generous compensation
including stock
options. Our development environment is primarily Java/CORBA.

If you know of someone who may be interested, give us a call, or pass
the word along.
We offer a “bounty “ for referrals that come to work with us.

Johnny Prewitt, Recruiting Manager
Just In Time Solutions
444 De Haro St. Suite 100
San Francisco, CA. 94107

Tel. 415-553-6481 or 888-652-0864 x6481
Fax 415-553-6496
www.justintime.com


From MHammond@skippinet.com.au  Thu Sep  3 05:58:37 1998
From: MHammond@skippinet.com.au (Mark Hammond)
Date: Thu, 3 Sep 1998 14:58:37 +1000
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
Message-ID: <012601bdd6f7$d3daf410$1301a8c0@bobcat.skippinet.com.au>

For no good reason at all, I am toying with the idea of the following little
mini-project.  If you dont use MSWindows, and/or dont use MSIE, then read no
further!

MSIE has the concept of "Favorites" (Bookmarks in Netscape speak) built in
at the Operating System level.  It is really quite trivial - a special
folder (directory) called "Favorites" exists, and this is filled with normal
Windows95 "shortcuts".  If this folder contains sub-folders, then these are
shown as sub-menus on the favorites menu.

It has bothered me for a while that this makes it quite hard to "publish"
(or even archive) the Favourites.

So my idea for a mini project is:
* Python code can locate and traverse this "favorites" directory.  It can
use the Windows "shortcuts" API to determine the underlying URL, and other
attributes (such as the time the link was last updated, etc).
* The above code can generate XML - the attributes for each shortcut can
appear in the XML.
* Code can be written to format the XML into pretty HTML, so people could
publish their favorites, as seemed to be common a while ago
* Later code could be written to parse an existing XML file, and update the
favorites themselves.  This would allow me to send my favorites to someone
else, and have them imported locally, for example.

As I said, a fairly useless little tool, but does appear to me to be a
reasonable starting point to get me going with XML (and at the same time
beef up Python's Win95 shell integration features :-)  The main benefit is
some direct XML experience.

Anyone care to help with this?  Im happy to do all the Windows specific
stuff, but the XML stuff will no doubt cause me to struggle somewhat...Two
minds are better than 1, even if they are both clueless :-)

Mark.


From larsga@ifi.uio.no  Thu Sep  3 08:22:38 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Thu, 03 Sep 1998 09:22:38 +0200
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: <012601bdd6f7$d3daf410$1301a8c0@bobcat.skippinet.com.au>
Message-ID: <3.0.1.32.19980903092238.00686650@ifi.uio.no>

* Mark Hammond
>
>As I said, a fairly useless little tool, but does appear to me to be a
>reasonable starting point to get me going with XML (and at the same time
>beef up Python's Win95 shell integration features :-)  The main benefit is
>some direct XML experience.

How about calling it XML Bookmark Exchange Language (XBEL) and adding
conversion routines to and from Netscape bookmarks and Opera bookmarks?
It could still do what you suggested, but would actually be useful as
well... :-)

>Anyone care to help with this?  Im happy to do all the Windows specific
>stuff, but the XML stuff will no doubt cause me to struggle somewhat...Two
>minds are better than 1, even if they are both clueless :-)

Why not make a stab at the XML stuff and post it here for comments? 184 minds
should be even better than 2. :-)

--Lars M.


From digitome@iol.ie  Thu Sep  3 09:06:35 1998
From: digitome@iol.ie (Sean Mc grath)
Date: Thu, 03 Sep 1998 09:06:35 +0100
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
Message-ID: <1.5.4.32.19980903080635.008f7348@gpo.iol.ie>

Mark,

I am very glad to help on the XML side. What is more, if you wish,
I can include the XML application as a Python/XML example in my next book
which I am currently working on. Why don't we do it all here on the
XML-SIG.

At 02:58 PM 9/3/98 +1000, you wrote:
>For no good reason at all, I am toying with the idea of the following little
>mini-project.  If you dont use MSWindows, and/or dont use MSIE, then read no
>further!
>
>MSIE has the concept of "Favorites" (Bookmarks in Netscape speak) built in
>at the Operating System level.  It is really quite trivial - a special
>folder (directory) called "Favorites" exists, and this is filled with normal
>Windows95 "shortcuts".  If this folder contains sub-folders, then these are
>shown as sub-menus on the favorites menu.
>
>It has bothered me for a while that this makes it quite hard to "publish"
>(or even archive) the Favourites.
>
>So my idea for a mini project is:
>* Python code can locate and traverse this "favorites" directory.  It can
>use the Windows "shortcuts" API to determine the underlying URL, and other
>attributes (such as the time the link was last updated, etc).
>* The above code can generate XML - the attributes for each shortcut can
>appear in the XML.
>* Code can be written to format the XML into pretty HTML, so people could
>publish their favorites, as seemed to be common a while ago
>* Later code could be written to parse an existing XML file, and update the
>favorites themselves.  This would allow me to send my favorites to someone
>else, and have them imported locally, for example.
>
>As I said, a fairly useless little tool, but does appear to me to be a
>reasonable starting point to get me going with XML (and at the same time
>beef up Python's Win95 shell integration features :-)  The main benefit is
>some direct XML experience.
>
>Anyone care to help with this?  Im happy to do all the Windows specific
>stuff, but the XML stuff will no doubt cause me to struggle somewhat...Two
>minds are better than 1, even if they are both clueless :-)
>
>Mark.
>
>
>_______________________________________________
>XML-SIG maillist  -  XML-SIG@python.org
>http://www.python.org/mailman/listinfo/xml-sig
>
>
Sean Mc Grath
http://www.digitome.com/sean.htm
+353 96 47391

"Imagine a world without hypothetical situations..."


From fredrik@pythonware.com  Thu Sep  3 11:28:02 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Thu, 3 Sep 1998 11:28:02 +0100
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
Message-ID: <013601bdd725$8ab0db50$f29b12c2@pythonware.com>

>MSIE has the concept of "Favorites" (Bookmarks in Netscape speak) built in
>at the Operating System level.  It is really quite trivial - a special
>folder (directory) called "Favorites" exists, and this is filled with normal
>Windows95 "shortcuts".  If this folder contains sub-folders, then these are
>shown as sub-menus on the favorites menu.
>
>It has bothered me for a while that this makes it quite hard to "publish"
>(or even archive) the Favourites.
>
>So my idea for a mini project is:
>* Python code can locate and traverse this "favorites" directory.  It can
>use the Windows "shortcuts" API to determine the underlying URL, and other
>attributes (such as the time the link was last updated, etc).
>* The above code can generate XML - the attributes for each shortcut can
>appear in the XML.
>* Code can be written to format the XML into pretty HTML, so people could
>publish their favorites, as seemed to be common a while ago
>* Later code could be written to parse an existing XML file, and update the
>favorites themselves.  This would allow me to send my favorites to someone
>else, and have them imported locally, for example.

Here's a first stab.  This is tested with MSIE 5.0 on a Swedish NT installation
(so you definitely need to change the directory to run it -- a production version
should of course use the registry to find out where the directory is located).

Don't know if earlier versions used shell shortcuts; if that's the case, the
"geturl" stuff needs to be rewritten.

Cheers /F

#
# convert "favourites" directory to an XML file
#

import os, string
from cgi import escape

DIR = "Favoriter" # swedish version

class Node:

    def __init__(self, name):
        self.name = name
        self.data = []

    def append(self, item):
        self.data.append(item)

    def dump(self, level=0):
        
        if not level:
            print "<?xml version='1.0' standalone='yes'?>"

        prefix = level * " "

        print prefix + "<node>"
        if self.name:
            print prefix, "<name>" + escape(self.name) + "</name>"

        for item in self.data:
            if isinstance(item, Node):
                item.dump(level+1)
            else:
                name, url = item
                print prefix, "<bookmark>"
                print prefix, " <name>" + escape(name) + "</name>"
                print prefix, " <url>" + escape(url) + "</url>"
                print prefix, "</bookmark>"

        print prefix + "</node>"


class Bookmarks:

    def dump(self):
        self.root.dump()


class MSIE(Bookmarks):
    # internet explorer

    def __init__(self):
        # FIXME: use registry for this!

        self.root = Node(None)
        self.path = os.path.join(os.environ["USERPROFILE"], DIR)

        self.__walk(self.root)

    def __walk(self, this, subpath=[]):
        # traverse favourites folder
        path = os.path.join(self.path, string.join(subpath, os.sep))
        for file in os.listdir(path):
            fullname = os.path.join(path, file)
            if os.path.isdir(fullname):
                node = Node(file)
                this.append(node)
                self.__walk(node, subpath + [file])
            else:
                url = self.__geturl(fullname)
                if url:
                    this.append((os.path.splitext(file)[0], url))

    def __geturl(self, file):
        try:
            fp = open(file)
            if fp.readline() != "[InternetShortcut]\n":
                return None
            while 1:
                s = fp.readline()
                if not s:
                    break
                if s[:4] == "URL=":
                    return s[4:-1]
        except IOError:
            pass
        return None


bookmarks = MSIE()
bookmarks.dump()


From MHammond@skippinet.com.au  Thu Sep  3 14:49:21 1998
From: MHammond@skippinet.com.au (Mark Hammond)
Date: Thu, 3 Sep 1998 23:49:21 +1000
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
Message-ID: <004501bdd741$ac2899f0$1301a8c0@bobcat.skippinet.com.au>

Thanks Lars and Sean! And Fredrik doesnt mess around - Thanks! :-)

OK - Here is a simple DTD Ive come up with based on Fredriks code.  If you
havent run it, it will generate something like:
 <node>
  <name>Python</name>
  <bookmark>
   <name>Aussie Mirror - Python Language Website</name>
   <url>http://mirror.aarnet.edu.au/pub/python/www.python.org/</url>
  </bookmark>
...

Here is my simple DTD, using Lars' "XBEL":-).  Any comments? Its all way too
simple :-)  Then I'll have to knock up a tool to parse these back to a file
structure on disk, and also something to generate a .html representation of
the tree...

Thanks,

Mark.

<!-- DTD for XBEL - XML Bookmark Exchange Language -->

<!ELEMENT XBEL     (INFO, NODE+)>
<!ELEMENT NODE     (NAME, BOOKMARK+)>

<!ELEMENT BOOKMARK (NAME, URL)>

<!ELEMENT INFO     (OWNER, MACHINE, VERSION, DATE?)>

<!ELEMENT OWNER    (#PCDATA)>
<!ELEMENT MACHINE  (#PCDATA)>
<!ELEMENT VERSION  (#PCDATA)>
<!ELEMENT DATE     (#PCDATA)>


<!ELEMENT NAME     (#PCDATA)>
<!ELEMENT URL      (#PCDATA)>


From larsga@ifi.uio.no  Thu Sep  3 14:57:50 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Thu, 03 Sep 1998 15:57:50 +0200
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: <004501bdd741$ac2899f0$1301a8c0@bobcat.skippinet.com.au>
Message-ID: <3.0.1.32.19980903155750.0076bdf8@ifi.uio.no>

* Mark Hammond
>
>Then I'll have to knock up a tool to parse these back to a file
>structure on disk, and also something to generate a .html representation of
>the tree...

Sounds like a suitable project to learn SAX... :)

><!-- DTD for XBEL - XML Bookmark Exchange Language -->
>
><!ELEMENT XBEL     (INFO, NODE+)>
><!ELEMENT NODE     (NAME, BOOKMARK+)>

Maybe NODE should be called FOLDER? It took me a while to figure out
that that was what it was meant to be.

><!ELEMENT INFO     (OWNER, MACHINE, VERSION, DATE?)>

What are MACHINE and VERSION meant to contain?

Other than that it looks good to me. I'll be giving a course this
weekend and can probably make an Opera-to-XBEL converter (and vice versa) 
while my students do their exercises. If I do I'll post it when it's done.

--Lars M.


From grove@infotek.no  Thu Sep  3 15:21:17 1998
From: grove@infotek.no (Geir Ove Gronmo)
Date: Thu, 03 Sep 1998 16:21:17 +0200
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: <004501bdd741$ac2899f0$1301a8c0@bobcat.skippinet.com.au>
Message-ID: <199809031428.QAA32554@mail.infotek.no>

At 23:49 03.09.98 +1000, Mark Hammond wrote:
>Thanks Lars and Sean! And Fredrik doesnt mess around - Thanks! :-)
>
>OK - Here is a simple DTD Ive come up with based on Fredriks code.  If you
>havent run it, it will generate something like:
> <node>
>  <name>Python</name>
>  <bookmark>
>   <name>Aussie Mirror - Python Language Website</name>
>   <url>http://mirror.aarnet.edu.au/pub/python/www.python.org/</url>
>  </bookmark>
>..
>
>Here is my simple DTD, using Lars' "XBEL":-).  Any comments? Its all way too
>simple :-)  Then I'll have to knock up a tool to parse these back to a file
>structure on disk, and also something to generate a .html representation of
>the tree...

Notice that names in XML are case sensitive. You'll have to use the same
case in the instance as in the DTD. :-)

Geir O.

 ==================  Geir Ove Grønmo  ==================
|  STEP Infotek as, Gjerdrumsvei 12, 0486 Oslo, Norway  |
|        grove@infotek.no http://www.infotek.no/        |
 -------------------------------------------------------


From fredrik@pythonware.com  Thu Sep  3 16:44:56 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Thu, 3 Sep 1998 16:44:56 +0100
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
Message-ID: <002d01bdd751$ce512870$f29b12c2@pythonware.com>

><!-- DTD for XBEL - XML Bookmark Exchange Language -->
>
><!ELEMENT XBEL     (INFO, NODE+)>
><!ELEMENT NODE     (NAME, BOOKMARK+)>
>
><!ELEMENT BOOKMARK (NAME, URL)>
>
><!ELEMENT INFO     (OWNER, MACHINE, VERSION, DATE?)>
>
><!ELEMENT OWNER    (#PCDATA)>
><!ELEMENT MACHINE  (#PCDATA)>
><!ELEMENT VERSION  (#PCDATA)>
><!ELEMENT DATE     (#PCDATA)>
>
><!ELEMENT NAME     (#PCDATA)>
><!ELEMENT URL      (#PCDATA)>

Note that nodes can contain other nodes (bookmarks and nodes are 
mixed in the order they are found), and the top node doesn't have
a name element.  Let's see...  Is the following valid syntax?

    <!ELEMENT NODE     (NAME?, (BOOKMARK|NODE)+)>

I'm not that happy about the name "node" either...  anyone have a
better idea?

...

And yes, MSIE also uses a timestamp for each bookmark:

[InternetShortcut]
URL=http://www.secretlabs.com/
Modified=107CD6B43F8ABD019D

(haven't figured out how to decipher that one yet)

Netscape uses at least three: ADD_DATE, LAST_VISIT, and
LAST_MODIFIED (standard time_t's).

How about:

<!ELEMENT BOOKMARK (NAME, URL, ADDED?, VISITED?, MODIFIED?)>

(where dates are stored according to http://www.w3.org/TR/NOTE-datetime
or RFC1766 or something -- is there a "defacto standard" for dates in XML?)

Cheers /F
fredrik@pythonware.com
http://www.pythonware.com


From digitome@iol.ie  Thu Sep  3 15:41:03 1998
From: digitome@iol.ie (Sean Mc Grath)
Date: Thu, 3 Sep 1998 15:41:03 +0100
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
Message-ID: <199809031441.PAA21101@GPO.iol.ie>

Mark,
A couple of things.

1) XML is case sensitive. How about lowercase or CamelCase for the element
type names.?

2) The characters "<" and "&" are special in XML and must be escaped if they
occur as part of the content (in URL's for example you can have "&").
This does not effect your DTD, but needs to be born in mind when generating
XBEL files. "&" -> "&amp;". "<" -> "&lt;"

3) In XML there are no standard ways of specifying lexical structure in
PCDATA (yet). Attributes give better (but still unsatisfactory) control.
I am thinking primarily of the date element type.

<date yyyy = "2005" mm = "12" dd = "01"/>

is more checkable than

        <date>2005/12/01</date>

4) There are many, many ways to go from XBEL to HTML and other formats:-

  DSSSL Stylesheet (James Clark's Jade)
  XSL StyleSheet (James Clark's XT via JPython)
  Custom Python Translator
  ...

5) There is a lot of stuff going on in XML at the moment that will all
impact on XBEL as it develops:-

 a) Rendering via XSL
 b) Hypertext linking via XLink
 c) Namespaces (making the vocabulary of XBEL formally public and documented
via a DTD)
 d) DCD - A proposal for a more powerful schema language for XML than DTDs

I suggest we keep it all very simple for now! Even as it stands a tool like
sgrep - structured grepping - really shows up the advantage of XBEL.


[Mark Hammond]
><!-- DTD for XBEL - XML Bookmark Exchange Language -->
>
><!ELEMENT XBEL     (INFO, NODE+)>
><!ELEMENT NODE     (NAME, BOOKMARK+)>
>
><!ELEMENT BOOKMARK (NAME, URL)>
>
><!ELEMENT INFO     (OWNER, MACHINE, VERSION, DATE?)>
>
><!ELEMENT OWNER    (#PCDATA)>
><!ELEMENT MACHINE  (#PCDATA)>
><!ELEMENT VERSION  (#PCDATA)>
><!ELEMENT DATE     (#PCDATA)>
>
>
><!ELEMENT NAME     (#PCDATA)>
><!ELEMENT URL      (#PCDATA)>
>
>
</Sean>

Sean Mc Grath - http://www.digitome.com/sean.htm
XML by Example:Building E-Commerce Applications 
	(http://www.amazon.com/exec/obidos/ISBN=0139601627/digitomeelectronA/)
ParseMe.1st - SGML for Software Developers
	(http://www.amazon.com/exec/obidos/ISBN=0134889673/digitomeelectronA/)


From larsga@ifi.uio.no  Thu Sep  3 15:57:41 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Thu, 03 Sep 1998 16:57:41 +0200
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: <002d01bdd751$ce512870$f29b12c2@pythonware.com>
Message-ID: <3.0.1.32.19980903165741.0073f310@ifi.uio.no>

* Fredrik Lundh
>
>Note that nodes can contain other nodes (bookmarks and nodes are 
>mixed in the order they are found), and the top node doesn't have
>a name element.  Let's see...  Is the following valid syntax?
>
>    <!ELEMENT NODE     (NAME?, (BOOKMARK|NODE)+)>

Yes, but I'm a bit uneasy about making NAME optional. Maybe we should
have a separate element for the top NODE?

>How about:
>
><!ELEMENT BOOKMARK (NAME, URL, ADDED?, VISITED?, MODIFIED?)>

Looks good to me; Opera has CREATED and VISITED.

>(where dates are stored according to http://www.w3.org/TR/NOTE-datetime
>or RFC1766 or something -- is there a "defacto standard" for dates in XML?)

Not at present, but ISO 8601 looks like a likely candidate. I think the
19980902 variant of ISO 8601 is the best one.

--Lars M.


From fredrik@pythonware.com  Thu Sep  3 17:20:24 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Thu, 3 Sep 1998 17:20:24 +0100
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
Message-ID: <002901bdd756$c28f9170$f29b12c2@pythonware.com>

>1) XML is case sensitive. How about lowercase or CamelCase for the element
>type names.?

CamelCase!?  Look what the CamelFolks have to say about that:

    Also, instead of writing your variables with leading caps for
    all of the words like this: MyVariableForLoop. We should instead
    use the underscore and write it like this: my_variable_for_loop.
    [tchrist] gave several good reasons, including the fact that Perl
    is now a global language and it can be hard for those who speak
    English as a second language to read the variables. Also, we are
    used to having spaces in words so it makes it more readable for
    us too.
    (http://www.perl.com/pace/pub/perldocs/1998/08/show/day4.html)

On the other hand, they also say:

    ...if we need comments in our code, then we didn't write it
    properly...

which only shows that one might as well ignore them...

>2) The characters "<" and "&" are special in XML and must be escaped if they
>occur as part of the content (in URL's for example you can have "&").
>This does not effect your DTD, but needs to be born in mind when generating
>XBEL files. "&" -> "&amp;". "<" -> "&lt;"

Hey, my class did that...

>3) In XML there are no standard ways of specifying lexical structure in
>PCDATA (yet). Attributes give better (but still unsatisfactory) control.
>I am thinking primarily of the date element type.
>
><date yyyy = "2005" mm = "12" dd = "01"/>

Ouch! ;-)

>I suggest we keep it all very simple for now! Even as it stands a tool like
>sgrep - structured grepping - really shows up the advantage of XBEL.

sgrep?  Is this an existing utility?  Where do I find it?

Cheers /F
fredrik@pythonware.com
http://www.pythonware.com


From Jack.Jansen@cwi.nl  Thu Sep  3 16:17:10 1998
From: Jack.Jansen@cwi.nl (Jack Jansen)
Date: Thu, 03 Sep 1998 17:17:10 +0200
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: Message by Lars Marius Garshol <larsga@ifi.uio.no> ,
 Thu, 03 Sep 1998 15:57:50 +0200 , <3.0.1.32.19980903155750.0076bdf8@ifi.uio.no>
Message-ID: <UTC199809031517.RAA09557.jack@snelboot.cwi.nl>

Looking at Marks DTD (and the code it is based upon) I noticed that I
would have done things differently: I would have used elements only
for the BOOKMARK and NODE items, and used attributes for the rest.

Can anyone enlighten me which method is best, and why?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@cwi.nl      | ++++ if you agree copy these lines to your sig ++++
http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From fleck@informatik.uni-bonn.de  Thu Sep  3 17:19:26 1998
From: fleck@informatik.uni-bonn.de (Markus Fleck)
Date: Thu, 03 Sep 1998 18:19:26 +0200
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
References: <002901bdd756$c28f9170$f29b12c2@pythonware.com>
Message-ID: <35EEC18E.5C8A@informatik.uni-bonn.de>

Fredrik Lundh wrote:
> >I suggest we keep it all very simple for now! Even as it stands a tool like
> >sgrep - structured grepping - really shows up the advantage of XBEL.
> 
> sgrep?  Is this an existing utility?  Where do I find it?

"sgrep" allows for the definition of "regions" of text, which may
then be searched selectively. For example, I use the following
"sgrep" macros to split the text-only version of the Python FAQ
(the version that gets posted to comp.lang.python.announce) into
sections:

--- CUT ---
define(FAQ1, (("1" in "\n\n1. ") .. ( "-\n\n" in "-\n\n2. ") ))
define(FAQ2, (("2" in "\n\n2. ") .. ( "-\n\n" in "-\n\n3. ") ))
define(FAQ3, (("3" in "\n\n3. ") .. ( "-\n\n" in "-\n\n4. ") ))
define(FAQ4, (("4" in "\n\n4. ") .. ( "-\n\n" in "-\n\n5. ") ))
define(FAQ5, (("5" in "\n\n5. ") .. ( "-\n\n" in "-\n\n6. ") ))
define(FAQ6, (("6" in "\n\n6. ") .. ( "-\n\n" in "-\n\n7. ") ))
define(FAQ7, (("7" in "\n\n7. ") .. ( "-\n\n" in "-\n\n8. ") ))
define(FAQ8, (("8" in "\n\n8. ") .. (("-\n\n" in "-\n\n1. ") or end) ))
--- CUT ---

Invoking "sgrep FAQ2 FAQ.txt" would then spit out section 2 of the FAQ.

You can also use "sgrep" to search only "Subject: " fields in mail
headers of a mailbox file, or only <H1>-tagged text in an HTML file.

"sgrep" can be found at <http://www.cs.helsinki.fi/~jjaakkol/sgrep.html>.

Yours,
Markus.

-- 
////////////////////////////////////////////////////////////////////////////
   Markus B Fleck - University of Bonn - CS Department IV - fleck@isoc.de
         UNIX Administrator - comp.lang.python.announce Moderator
    PINN Open Source Internet Groupware Project - http://cscw.net/pinn/
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\


From fermigie@math.jussieu.fr  Thu Sep  3 17:27:24 1998
From: fermigie@math.jussieu.fr (Stefane Fermigier)
Date: Thu, 3 Sep 1998 18:27:24 +0200
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: <013601bdd725$8ab0db50$f29b12c2@pythonware.com>; from Fredrik Lundh on Thu, Sep 03, 1998 at 11:28:02AM +0100
References: <013601bdd725$8ab0db50$f29b12c2@pythonware.com>
Message-ID: <19980903182724.A19852@riemann.math.jussieu.fr>

On Thu, Sep 03, 1998 at 11:28:02AM +0100, Fredrik Lundh wrote:
> 
> Here's a first stab.  This is tested with MSIE 5.0 on a Swedish NT installation
> (so you definitely need to change the directory to run it -- a production version
> should of course use the registry to find out where the directory is located).

Similar program using pydom:


import os, string

from xml.dom.writer import XmlWriter
from xml.dom.core import *

#
ROOT_DIR = 'favorites' # Fix this on your machine, I don't have NT.
dom_factory = DOMFactory()

class Name(Element):
  def __init__(self, name):
    Element.__init__(self, 'Name')
    self.appendChild(dom_factory.createTextNode(name))

class Url(Element):
  def __init__(self, url):
    Element.__init__(self, 'Url')
    self.appendChild(dom_factory.createTextNode(url))

class Folder(Element):
  def __init__(self, folder_name):
    Element.__init__(self, 'Folder')
    self.appendChild(Name(folder_name))

class Bookmark(Element):
  def __init__(self, name, url):
    Element.__init__(self, 'Bookmark')
    self.appendChild(Name(name))
    self.appendChild(Url(url))

# One could also use the factory everywhere instead of defining these 
# classes.


# This class is almost untouched from Frederik's version.
class MSIE:
  def __init__(self):
    self.root = Folder('')
    self.path = ROOT_DIR # Fix this if you're on Windows.
    self.__walk(self.root)

  def __walk(self, this, subpath=[]):
    # traverse favourites folder
    path = os.path.join(self.path, string.join(subpath, os.sep))
    for file_name in os.listdir(path):
      fullname = os.path.join(path, file_name)
      if os.path.isdir(fullname):
        node = Folder(file_name)
        this.appendChild(node)
        self.__walk(node, subpath + [file])
      else:
        url = self.__geturl(fullname)
        if url:
          this.appendChild(Bookmark(os.path.splitext(file_name)[0], url))

  def __geturl(self, file):
    try:
      fp = open(file)
      if fp.readline() != "[InternetShortcut]\n":
        return None
      while 1:
        s = fp.readline()
        if not s:
          break
        if s[:4] == "URL=":
          return s[4:-1]
    except IOError:
      pass
    return None


bookmarks = MSIE()
writer = XmlWriter()
writer.newline_after_start = ['Folder', 'Bookmark']
writer.newline_after_end = ['Name', 'Url', 'Folder', 'Bookmark']
writer.write(bookmarks.root)


Cheers,

	S.

-- 
Stéfane Fermigier, MdC à l'Université Paris 7. Tel: 01.44.27.61.01 (Bureau).
<www.math.jussieu.fr/~fermigie/>, <www.aful.org>, <www.linux-center.org>. 
"Python is so much easier to write and experiment with that I write it 
in Python first, then translate to Java if necessary - despite being 
the author of a Java book!" Gordon McMillan


From wunder@infoseek.com  Thu Sep  3 17:53:58 1998
From: wunder@infoseek.com (Walter Underwood)
Date: Thu, 03 Sep 1998 09:53:58 -0700
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: <002901bdd756$c28f9170$f29b12c2@pythonware.com>
Message-ID: <3.0.5.32.19980903095358.00bf7290@corp>

>>3) In XML there are no standard ways of specifying lexical structure in
>>PCDATA (yet). Attributes give better (but still unsatisfactory) control.
>>I am thinking primarily of the date element type.
>>
>><date yyyy = "2005" mm = "12" dd = "01"/>

On the other hand, there are times to specify structure without
using XML. The web profile of the ISO 8601 date format works
fine in this case. See http://www.w3.org/TR/NOTE-datetime for 
the details. Here are some versions of the above using ISO 8601:

  <date>2005-12-01</date>
  <date value="2005-12-01"/>
  <date scheme="ISO 8601">2005-12-01</date>
  <date scheme="ISO 8601" value="2005-12-01"/>

and so on.

By the way, thanks for all the work on XML parsing. We're using 
this to add XML support in future versions of Ultraseek Server,
our Python-based search engine.

wunder

Walter R. Underwood
wunder@infoseek.com
wunder@best.com (home)
http://www.best.com/~wunder/
1-408-543-6946


From mss@transas.com  Thu Sep  3 19:48:14 1998
From: mss@transas.com (Michael Sobolev)
Date: Thu, 3 Sep 1998 22:48:14 +0400
Subject: [XML-SIG] DTDs..
Message-ID: <19980903224814.A14927@transas.com>

Hi,

I am trying to figure out how the processed DTD is stored.  I took xvcmd.py
program that comes with python-xml (debian) distribution and parsed the
document.  Then I executed parser's get_dtd method (this, I guess, contains the
DTD).  How can I reverse engineer the DTD for my document?  Or, to be more
precise, how am I supposed to walk through content_model information of
an element?

TIA,

--
Mike


From jtauber@jtauber.com  Fri Sep  4 06:33:04 1998
From: jtauber@jtauber.com (James Tauber)
Date: Fri, 4 Sep 1998 13:33:04 +0800
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
Message-ID: <006a01bdd7c5$bcd13bc0$bc6118cb@caleb>

>>>3) In XML there are no standard ways of specifying lexical structure in
>>>PCDATA (yet).

That's not *entirely* true. You can use notation attributes.

>  <date>2005-12-01</date>
>  <date value="2005-12-01"/>
>  <date scheme="ISO 8601">2005-12-01</date>
>  <date scheme="ISO 8601" value="2005-12-01"/>

The best would be something similar to the third one:

<date scheme="iso-8601">2005-12-01</date>

where scheme is a notation attribute and iso-8601 is a notation referencing
the ISO standard.

James
--
James Tauber / jtauber@jtauber.com      http://www.jtauber.com/
Lecturer and Associate Researcher
Electronic Commerce Network             ( http://www.xmlinfo.com/
Curtin Business School                  ( http://www.xmlsoftware.com/
Perth, Western Australia                ( http://www.schema.net/


From larsga@ifi.uio.no  Fri Sep  4 16:16:49 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Fri, 04 Sep 1998 17:16:49 +0200
Subject: [XML-SIG] DTDs..
In-Reply-To: <19980903224814.A14927@transas.com>
Message-ID: <3.0.5.32.19980904171649.007b0520@ifi.uio.no>

* Michael Sobolev
>
>I am trying to figure out how the processed DTD is stored.  I took xvcmd.py
>program that comes with python-xml (debian) distribution and parsed the
>document.  Then I executed parser's get_dtd method (this, I guess,
contains the
>DTD). 

It returns an object that contains the DTD information, yes.

>How can I reverse engineer the DTD for my document? Or, to be more precise, 
>how am I supposed to walk through content_model information of an element?

The content model of elements is parsed into a parse tree, converted to a
non-deterministic finite automaton and then converted from there to a
deterministic finite automaton. The original parse tree is then discarded,
which means that you basically don't have any means of getting back to the
original content model.

However, if you can tell me what it is you need I may add it to the next
version. The current DTD interface is just what I needed to implement
validation, and may not be optimal for other kinds of uses.

--Lars M.


From mss@transas.com  Fri Sep  4 18:54:56 1998
From: mss@transas.com (Michael Sobolev)
Date: Fri, 4 Sep 1998 21:54:56 +0400
Subject: [XML-SIG] DTDs..
In-Reply-To: <3.0.5.32.19980904171649.007b0520@ifi.uio.no>; from Lars Marius Garshol on Fri, Sep 04, 1998 at 05:16:49PM +0200
References: <19980903224814.A14927@transas.com> <3.0.5.32.19980904171649.007b0520@ifi.uio.no>
Message-ID: <19980904215456.A17805@transas.com>

On Fri, Sep 04, 1998 at 05:16:49PM +0200, Lars Marius Garshol wrote:
> The content model of elements is parsed into a parse tree, converted to a
> non-deterministic finite automaton and then converted from there to a
> deterministic finite automaton. The original parse tree is then discarded,
> which means that you basically don't have any means of getting back to the
> original content model.
You meant that I likely to get an equivalent form?

> However, if you can tell me what it is you need I may add it to the next
> version. The current DTD interface is just what I needed to implement
> validation, and may not be optimal for other kinds of uses.
Basically, I need more documentation.  It is not obvious how to get all
defined elements, for example.  And more examples, if possible.  :)
What I want to know is how:

    to obtain the list of public identifiers from catalog;
    to parse a specific DTD (using its public or system id);
    to get DTD information for a given document.

Under DTD information I understand the list of elements (with theirs attributes)
and a way for figuring out how the elements may follow one another.  Having
written my previous message, I understood what the content_model is, and how
to make use of it.  I am only afraid that since it is not documented (and,
therefore, is not fixed) it may easily be changed should you find a different
way for validating XML files against DTD.

Regards,

--
Mike


From larsga@ifi.uio.no  Fri Sep  4 21:19:21 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Fri, 04 Sep 1998 22:19:21 +0200
Subject: [XML-SIG] DTDs..
In-Reply-To: <19980904215456.A17805@transas.com>
References: <3.0.5.32.19980904171649.007b0520@ifi.uio.no>
 <19980903224814.A14927@transas.com>
 <3.0.5.32.19980904171649.007b0520@ifi.uio.no>
Message-ID: <3.0.5.32.19980904221921.007b2e60@ifi.uio.no>

* Lars Marius Garshol
>
> The original parse tree is then discarded,
> which means that you basically don't have any means of getting back to the
> original content model.

* Michael Sobolev
>
> You meant that I likely to get an equivalent form?

(I assume there's a 'not' missing in that sentence.)

Correct. If this is important to you I may consider adding a way to preserve
the original content model structure. I've thought about doing so, but since
nobody seemed to use the DTD interface I haven't bothered so far.

* Michael Sobolev
>
>Basically, I need more documentation.  It is not obvious how to get all
>defined elements, for example. 

Not so strange, since you can't. :) Another thing I've been thinking about,
but haven't yet added. It's just two or three lines, so I'll put it in in a
couple of days. Expect it in the next release.

>And more examples, if possible.  :)

Maybe I can add an example program that does something interesting with
DTD information.

>What I want to know is how:
>
>    to obtain the list of public identifiers from catalog;

Currently you can't. I'll add this.

>    to parse a specific DTD (using its public or system id);

Hmmm. You can do this now by using the DTDParser class in the xmlproc
module. Give it a DTDConsumer (see the DTD API doco) to receive events.
I want to move the DTDParser and clean up the interface a little, so I
haven't documented this yet, but the DTDParser understands the same
methods as XMLProcessor, expcept that you set the DTD handler with
'set_dtd_consumer'. Note that this will break in a future version.

>    to get DTD information for a given document.

Hmmm. Since you already know about the get_dtd method, I'm not
sure what more you want.

>Under DTD information I understand the list of elements (with theirs
attributes)
>and a way for figuring out how the elements may follow one another.  Having
>written my previous message, I understood what the content_model is, and how
>to make use of it.  I am only afraid that since it is not documented (and,
>therefore, is not fixed) it may easily be changed should you find a different
>way for validating XML files against DTD.

It's not likely to change since the current method seems to work pretty well,
but, yes, you do run that risk. This isn't a finished product and I want to
keep my options open here... :)

However, if you there's some specific information about the content models
you want I'll see what I can do. Do you want to be able to reconstruct the
original syntax of the declarations, or is there something else you want?

--Lars M.


From mss@transas.com  Fri Sep  4 22:01:23 1998
From: mss@transas.com (Michael Sobolev)
Date: Sat, 5 Sep 1998 01:01:23 +0400
Subject: [XML-SIG] DTDs..
In-Reply-To: <3.0.5.32.19980904221921.007b2e60@ifi.uio.no>; from Lars Marius Garshol on Fri, Sep 04, 1998 at 10:19:21PM +0200
References: <3.0.5.32.19980904171649.007b0520@ifi.uio.no> <19980903224814.A14927@transas.com> <3.0.5.32.19980904171649.007b0520@ifi.uio.no> <19980904215456.A17805@transas.com> <3.0.5.32.19980904221921.007b2e60@ifi.uio.no>
Message-ID: <19980905010123.A22973@transas.com>

On Fri, Sep 04, 1998 at 10:19:21PM +0200, Lars Marius Garshol wrote:
> >Basically, I need more documentation.  It is not obvious how to get all
> >defined elements, for example. 
> Not so strange, since you can't. :) Another thing I've been thinking about,
> but haven't yet added. It's just two or three lines, so I'll put it in in a
> couple of days. Expect it in the next release.
Well, with current version I can easily get all elements that are used for
defining root element, can't I?  For the most cases, it's sufficient.

> >And more examples, if possible.  :)
> Maybe I can add an example program that does something interesting with
> DTD information.
Yes, please.

> >    to obtain the list of public identifiers from catalog;
> Currently you can't. I'll add this.
This would be nice.

> >    to parse a specific DTD (using its public or system id);
> 
> Hmmm. You can do this now by using the DTDParser class in the xmlproc
> module. Give it a DTDConsumer (see the DTD API doco) to receive events.
> I want to move the DTDParser and clean up the interface a little, so I
> haven't documented this yet, but the DTDParser understands the same
> methods as XMLProcessor, expcept that you set the DTD handler with
> 'set_dtd_consumer'. Note that this will break in a future version.
What I mean here is rather an example than functionality. :)

Regards,

--
Mike


From akuchlin@cnri.reston.va.us  Fri Sep  4 22:06:18 1998
From: akuchlin@cnri.reston.va.us (Andrew Kuchling)
Date: Fri,  4 Sep 1998 17:06:18 -0400 (EDT)
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: <3.0.5.32.19980903095358.00bf7290@corp>
References: <002901bdd756$c28f9170$f29b12c2@pythonware.com>
 <3.0.5.32.19980903095358.00bf7290@corp>
Message-ID: <13808.21923.568032.333624@newcnri.cnri.reston.va.us>

Walter Underwood writes:
>By the way, thanks for all the work on XML parsing. We're using 
>this to add XML support in future versions of Ultraseek Server,
>our Python-based search engine.

	That's very interesting.  Can you say anything about the level
of the API you're using?  That is, are you using xmllib.py, xmllib.py
+ sgmlop.c, the PyExpat module, or something higher-level such as SAX?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Given time, you'll spin a yarn of what we saw in the ocean. Given time I'll
tell the tale of the handsome cabin boy. But given enough time and the right
audience, the darkest of secrets scum over into mere curiosities.
    -- Hob Gadling, in SANDMAN #53: "Hob's Leviathan"


From wunder@infoseek.com  Fri Sep  4 22:36:09 1998
From: wunder@infoseek.com (Walter Underwood)
Date: Fri, 04 Sep 1998 14:36:09 -0700
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: <13808.21923.568032.333624@newcnri.cnri.reston.va.us>
References: <3.0.5.32.19980903095358.00bf7290@corp>
 <002901bdd756$c28f9170$f29b12c2@pythonware.com>
 <3.0.5.32.19980903095358.00bf7290@corp>
Message-ID: <3.0.5.32.19980904143609.00c2b8e0@corp>

At 05:06 PM 9/4/98 -0400, Andrew Kuchling wrote:
>Walter Underwood writes:
>>By the way, thanks for all the work on XML parsing. We're using 
>>this to add XML support in future versions of Ultraseek Server,
>>our Python-based search engine.
>
>	That's very interesting.  Can you say anything about the level
>of the API you're using?  That is, are you using xmllib.py, xmllib.py
>+ sgmlop.c, the PyExpat module, or something higher-level such as SAX?

Still on xmllib.py (version 0.1), since the work was first done
back in May. I'm planning on moving to SAX, and dropping in a
faster parser, probably via sgmlop support.

We're using XML in another part of the engine, but that is not
speed-sensitive.

The search engine only requires that the XML be well-formed, since it
doesn't really need to know about the DTD, just the text that remains
after parsing. Well, we do pay attention to one tag -- the first <title>
or <TITLE> tag is considered to be the title of the document for 
purposes of displaying search hits.

If people don't mind a commercial announcement, I'll let the list know
when we release the XML-savvy version.

wunder

Walter R. Underwood
wunder@infoseek.com
wunder@best.com (home)
http://www.best.com/~wunder/
1-408-543-6946


From larsga@ifi.uio.no  Sat Sep  5 07:38:16 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Sat, 05 Sep 1998 08:38:16 +0200
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: <3.0.5.32.19980904143609.00c2b8e0@corp>
References: <13808.21923.568032.333624@newcnri.cnri.reston.va.us>
 <3.0.5.32.19980903095358.00bf7290@corp>
 <002901bdd756$c28f9170$f29b12c2@pythonware.com>
 <3.0.5.32.19980903095358.00bf7290@corp>
Message-ID: <3.0.5.32.19980905083816.007b62f0@ifi.uio.no>

* Walter Underwood
>
>The search engine only requires that the XML be well-formed, since it
>doesn't really need to know about the DTD, just the text that remains
>after parsing. Well, we do pay attention to one tag -- the first <title>
>or <TITLE> tag is considered to be the title of the document for 
>purposes of displaying search hits.

Hmmm. Have you considered using architectural forms to give page authors
more freedom, but still allow you to discover which elements are the
equivalents of 'TITLE' and 'AUTHOR' etc?

--Lars M.


From larsga@ifi.uio.no  Sat Sep  5 15:37:12 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Sat, 05 Sep 1998 16:37:12 +0200
Subject: [XML-SIG] Bookmark parsers
Message-ID: <3.0.5.32.19980905163712.0079e8f0@ifi.uio.no>

Here are some scripts to convert from MSIE, Opera and Netscape bookmarks
to Opera, Netscape and XBEL. There's hardly any support for created, visited
and modified. Fredriks code has been looted to get the MSIE support.

Testing has been minimal so far.

(adr_parse.py)

"""
Small utility to parse Opera bookmark files.
"""

import string,bookmark

# --- Constants

short_months={"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05",
              "Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10",
              "Nov":"11","Dec":"12"}

# --- Parsing exception

class OperaParseException(Exception):
    pass

# --- Methods
        
def readfield(infile,fieldname):
    line=infile.readline()
    pos=string.find(line,fieldname+"=")
    if pos==-1:
        raise OperaParseException("Field '%s' missing" % fieldname)

    return line[pos+len(fieldname)+1:-1]

def swallow_rest(infile):
    "Reads input until first blank line."
    while 1:
        line=infile.readline()
        if line=="" or line=="\n": break

def parse_date(date):
    # CREATED=904923783 (Fri Sep 04 17:43:03 1998)
    # VISITED=0 (?)
    lp=string.find(date,"(")
    rp=string.find(date,")")
    if lp==-1 or rp==-1:
        raise OperaParseException("Date without parentheses")

    if date[lp:rp+1]=="(?)":
        return None

    month=short_months[date[lp+5:lp+8]]
    day=date[lp+9:lp+11]
    year=date[rp-4:rp]

    return "%s%s%s" % (year,month,day)

def parse_adr(filename):
    bms=bookmark.Bookmarks()
    
    infile=open(filename)
    version=infile.readline()

    while 1:
        line=infile.readline()
        if line=="": break
        
        if line[:-1]=="#FOLDER":
            name=readfield(infile,"NAME")
            created=parse_date(readfield(infile,"CREATED"))
            visited=parse_date(readfield(infile,"VISITED"))
            order=readfield(infile,"ORDER")
            swallow_rest(infile)

            bms.add_folder(name,created,visited)
        elif line[:-1]=="#URL":
            name=readfield(infile,"NAME")
            url=readfield(infile,"URL")
            created=parse_date(readfield(infile,"CREATED"))
            visited=parse_date(readfield(infile,"VISITED"))
            order=readfield(infile,"ORDER")
            swallow_rest(infile)

            bms.add_bookmark(name,created,visited,url)
        elif line[:-1]=="-":
            bms.leave_folder()

    return bms

# --- Test-program

bms=parse_adr(r"c:\programfiler\opera\opera3.adr")
bms.dump_netscape()

(msie_parse.py)

"""
Small utility to convert MSIE favourites to an object structure.

Originally written by Fredrik Lundh.
"""

import bookmark,os,string

DIR = "Favoritter" # Norwegian version

#USRDIR = os.environ["USERPROFILE"] # NT version
USRDIR = r"c:\windows" # 95 version

class MSIE:
    # internet explorer

    def __init__(self,bookmarks):
        # FIXME: use registry for this!

        self.bms=bookmarks
        self.root = None
        self.path = os.path.join(USRDIR, DIR)

        self.__walk()

    def __walk(self, subpath=[]):
        # traverse favourites folder
        path = os.path.join(self.path, string.join(subpath, os.sep))
        for file in os.listdir(path):
            fullname = os.path.join(path, file)
            if os.path.isdir(fullname):
                self.bms.add_folder(file,None,None)
                self.__walk(subpath + [file])
            else:
                url = self.__geturl(fullname)
                if url:
                    self.bms.add_bookmark(os.path.splitext(file)[0],None,
                                          None,url)

    def __geturl(self, file):
        try:
            fp = open(file)
            if fp.readline() != "[InternetShortcut]\n":
                return None
            while 1:
                s = fp.readline()
                if not s:
                    break
                if s[:4] == "URL=":
                    return s[4:-1]
        except IOError:
            pass
        return None

# --- Testprogram
    
msie=MSIE(bookmark.Bookmarks())
msie.bms.dump_xbel()

(ns_parse.py)

"""
Small utility that parses Netscape bookmarks.
"""

from xml.sax import saxexts,saxlib
import bookmark

# --- SAX handler for Netscape bookmarks

class NetscapeHandler(saxlib.HandlerBase):

    def __init__(self):
        self.bms=bookmark.Bookmarks()
        self.cur_elem=None
        self.added=None
        self.url=None
        self.visited=None
        self.last_modified=None

    def startElement(self,name,attrs):
        if name=="h3":
            self.cur_elem="h3"
            self.added=attrs["add_date"]
        elif name=="a":
            self.cur_elem="a"
            self.added=attrs["add_date"]
            self.url=attrs["href"]
            self.visited=attrs["last_visit"]
            self.last_modified=attrs["last_modified"]            

    def characters(self,data,start,length):
        if self.cur_elem=="h3":
            self.bms.add_folder(data[start:start+length],None,None)
        elif self.cur_elem=="a":
            self.bms.add_bookmark(data[start:start+length],None,None,self.url)
            
    def endElement(self,name):
        if name=="h3":
            self.cur_elem=None
        elif name=="dl":
            self.bms.leave_folder()
        elif name=="a":
            self.cur_elem=None

# --- Main program

ns_handler=NetscapeHandler()

p=saxexts.SGMLParserFactory.make_parser()
p.setDocumentHandler(ns_handler)
p.parseFile(open(r"h:/internet/netscape/bookmark.htm"))

ns_handler.bms.dump_netscape()

(bookmark.py)

"""
Classes to store bookmarks and dump them to XBEL.
"""

import sys,string

# --- Class for bookmark container

class Bookmarks:

    def __init__(self):
        self.folders=[]
        self.folder_stack=[]

    def add_folder(self,name,created,visited):
        nf=Folder(name,created,visited)
        if self.folder_stack==[]:
            self.folders.append(nf)
        else:
            self.folder_stack[-1].add_child(nf)

        self.folder_stack.append(nf)

    def add_bookmark(self,name,created,visited,url):
        nb=Bookmark(name,created,visited,url)

        if self.folder_stack!=[]:
            self.folder_stack[-1].add_child(nb)
        else:
            self.folders.append(nb)
        
    def leave_folder(self):
        if self.folder_stack!=[]:
            del self.folder_stack[-1]

    def dump_xbel(self,out=sys.stdout):
        out.write("<XBEL>\n")
        for folder in self.folders:
            folder.dump_xbel(out)
        out.write("<XBEL>")

    def dump_adr(self,out=sys.stdout):
        out.write("Opera Hotlist version 2.0\n\n")
        for folder in self.folders:
            folder.dump_adr(out)

    def dump_netscape(self,out=sys.stdout):
        out.write("<!DOCTYPE NETSCAPE-Bookmark-file-1>\n")
        out.write("<!-- This is an automatically generated file.\n")
        out.write("It will be read and overwritten.\n")
        out.write("Do Not Edit! -->\n")
        out.write("<TITLE>Skriv HELE NAVNET her's Bookmarks</TITLE>\n")
        out.write("<H1>Skriv HELE NAVNET her's Bookmarks</H1>\n\n")

        out.write("<DL><p>\n")
        for folder in self.folders:
            folder.dump_netscape(out)
        out.write("</DL><p>\n")

# --- Superclass for folder and bookmarks
        
class Node:

    def __init__(self,name,created,visited):
        self.name=name
        self.created=created
        self.visited=visited

# --- Class for folders
    
class Folder(Node):

    def __init__(self,name,created,visited):
        Node.__init__(self,name,created,visited)
        self.children=[]

    def add_child(self,child):
        self.children.append(child)

    def dump_xbel(self,out):
        out.write("  <NODE>\n")
        out.write("    <NAME>%s</NAME>\n" % self.name)
        for child in self.children:
            child.dump_xbel(out)
        out.write("  </NODE>\n")

    def dump_adr(self,out):
        out.write("#FOLDER\n")
        out.write("\tNAME=%s\n" % self.name)
        out.write("\tCREATED=%s\n" % "0 (?)")
        out.write("\tVISITED=%s\n" % "0 (?)")
        out.write("\tORDER=-1\n")
        out.write("\n")

        for child in self.children:
            child.dump_adr(out)

        out.write("\n")
        out.write("-\n")

    def dump_netscape(self,out):
        out.write("  <DT><H3 FOLDED>%s</H3>\n" % self.name)
        out.write("  <DL><p>\n")

        for child in self.children:
            child.dump_netscape(out)

        out.write("  </DL><p>\n")

# --- Class for bookmarks
        
class Bookmark(Node):

    def __init__(self,name,created,visited,url):
        Node.__init__(self,name,created,visited)
        self.url=url

    def dump_xbel(self,out):
        out.write("  <BOOKMARK>\n")
        out.write("    <NAME>%s</NAME>\n" % self.name)
        out.write("    <URL>%s</URL>\n" % self.url)

        if self.created!=None:
            out.write("    <ADDED>%s</ADDED>\n" % self.created)

        if self.visited!=None:
            out.write("    <VISITED>%s</VISITED>\n" % self.visited)
            
        out.write("  </BOOKMARK\n")

    def dump_adr(self,out):
        out.write("#URL\n")
        out.write("\tNAME=%s\n" % self.name)
        out.write("\tURL=%s\n" % self.url)
        out.write("\tCREATED=%s\n" % "0 (?)")
        out.write("\tVISITED=%s\n" % "0 (?)")
        out.write("\tORDER=-1\n")
        out.write("\n")

    def dump_netscape(self,out):
        out.write("    <DT><A HREF=\"%s\">%s</A>\n" % (self.url,self.name))

--Lars M.


From lisarein@finetuning.com  Sun Sep  6 00:04:55 1998
From: lisarein@finetuning.com (Lisa Rein)
Date: Sat, 05 Sep 1998 16:04:55 -0700
Subject: [XML-SIG] Re: XML-SIG digest, Vol 1 #90 - 7 msgs
References: <199809051600.MAA10334@python.org>
Message-ID: <35F1C397.317347AF@finetuning.com>

Walter R. Underwood said:
> 
> The search engine only requires that the XML be well-formed, since it
> doesn't really need to know about the DTD, just the text that remains
> after parsing. Well, we do pay attention to one tag -- the first <title>
> or <TITLE> tag is considered to be the title of the document for
> purposes of displaying search hits.
> 

Hello Walter:

I am very curious how exactly XML is being utilized in the search engine
if the only tag  being taken into account is the (first) TITLE tag (just
like a search engine would use during a "bag of words" approach) and not
using a DTD -- making any semantic associations impossible.  

If you're not going to deal with the text until after it's parsed, why
are you using XML?  Are you doing some kind of indexing or another
variation I haven't of?  Do tell ;-)

Thanks,

lisa rein

http://www.finetuning.com/editor.html


From stuart.hungerford@webone.com.au  Sun Sep  6 07:24:02 1998
From: stuart.hungerford@webone.com.au (Stuart Hungerford)
Date: Sun, 6 Sep 1998 16:24:02 +1000
Subject: [XML-SIG] Status of XML python stuff on Win32?
Message-ID: <000101bdd95e$f3718be0$0b2c08d2@alderman>

Folks,

Can someone tell me what the status of the 
Python XMl software tools is for the Win32
platform?

I believe xmlproc should work "out of the 
box", but the collection of tools would
need extra work?


From akuchlin@cnri.reston.va.us  Sun Sep  6 15:43:26 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Sun, 6 Sep 1998 10:43:26 -0400
Subject: [XML-SIG] Marshalling to XML, again
Message-ID: <199809061443.KAA00545@207-172-112-146.s146.tnt4.ann.erols.com>

Here's another version of the xml.marshal module.  (That name will
have to be changed now, though, since xml.marshal uses Python's
original marshal to handle code objects.  Any suggestions?)

For example, take the recursive list produced by this code:

recursive_list = [None, 1, pow(3,65L), '<fake tag>', 1+5j]
recursive_list.append( recursive_list )

Here's the marshalled version (pretty-printed; the module just
produces one long line):

<?xml version="1.0"?>
<!DOCTYPE marshal SYSTEM "marshal.dtd">
<marshal>
  <list id="i135737736">
    <none/>
    <integer>1</integer>
    <long>10301051460877537453973547267843</long>
    <string>&lt;fake tag&gt;</string>
    <complex>
      <float>1.0</float>
      <float>5.0</float>
    </complex>
  <reference id="i135737736"/>
</list>
</marshal>

The DTD for the marshalling format is available as the __dtd__
attribute of the module; does this seem like a useful convention for
future modules?  Comments on the code, DTD, etc. are welcome.  

There's been some discussion of marshalling scripting language data
types on the Casbah list and on the Perl-XML list recently; Dave
Winer's suggestion for XML-RPC bears some relation to this.  It would
be very useful if some common DTD was agreed upon, which would allow
painlessly exchanging data between Python and Perl, Frontier, or
whatever.  (However, I lack the time to read all the relevant mailing
lists and agitate for a specification.)

If no such common DTD arises, is this module still useful, and should
it be included?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
In a modern university if you ask for knowledge they will provide it in almost
any form -- though if you ask for out-of-fashion things they may say, like the
people in shops, "Sorry, there's no call for it."
    -- Robertson Davies, _The Rebel Angels_


# xml.marshal : Marshals simple Python data types into an XML-based
# format.  The interface is the same as the built-in module of the
# same name, with four functions: 
#   dump(value, file), load(file)
#   dumps(value), loads(string)

from types import *
import string

__dtd__ = """
<!ELEMENT marshal (integer | string | float | long | complex | code | none 
                     | tuple | list | dictionary)>
<!ELEMENT none EMPTY>
<!ELEMENT reference EMPTY>
<!ELEMENT integer (#PCDATA)>
<!ELEMENT string (#PCDATA)>
<!ELEMENT float (#PCDATA)>
<!ELEMENT long (#PCDATA)>
<!ELEMENT code (#PCDATA)>

<!ELEMENT complex (float, float)>
<!ELEMENT tuple (integer | string | float | long | complex | code | none
                     | tuple | list | dictionary | reference)*>
<!ELEMENT list (integer | string | float | long | complex | code | none
                     | tuple | list | dictionary | reference)*>
<!ELEMENT dictionary ( 
  (integer | string | long | float | complex | code | tuple | reference ),
  (integer | string | float | long | complex | code | none
                     | tuple | list | dictionary | reference) )* >

<!ATTLIST list id ID #REQUIRED>
<!ATTLIST dictionary id ID #REQUIRED>
<!ATTLIST reference id IDREF #REQUIRED>
"""

# Dictionary mapping some of the simple types to the corresponding tag
_mapping = {StringType:'string', IntType:'integer', 
	   FloatType:'float'}

# XML version and DOCTYPE declaration
PROLOGUE = """<?xml version="1.0"?>
<!DOCTYPE marshal SYSTEM "marshal.dtd">
"""

def _marshal(value, dict):
    L = []
    t = type(value) ; i = str( id(value) )
    if dict.has_key( i ):
        # This object has already been marshalled, so
        # emit a reference element.
        L.append( '<reference id="i%s"/>' % (i, ) )            

    elif _mapping.has_key( t ):
        # Some simple type: integer, string, or float
	name = _mapping[t]
        L.append( '<'+name + '>')
        s = str(value)
        if '&' in s or '>' in s or '<' in s:
            s = string.replace(s, '&', '&amp;')
            s = string.replace(s, '<', '&lt;')
            s = string.replace(s, '>', '&gt;')
	L.append( s )
	L.append( '</' + name + '>')
        
    elif t == LongType:
	L.append('<long>%s</long>' % (str(value)[:-1],) )

    elif t == TupleType:
	L.append( '<tuple>')
	for elem in value:
            L = L + _marshal(elem, dict)
	L.append( '</tuple>')

    elif t == ListType:
        dict[ i ] = 1
	L.append( '<list id="i%s">' %(i,) )
	for elem in value:
            L = L + _marshal(elem, dict)
	L.append( '</list>')

    elif t == DictType:
        dict[ i ] = 1
	L.append( '<dictionary id="i%s">' %(i,) )
	for key, v in value.items():
	    L = L + _marshal(key, dict)
	    L = L + _marshal(v, dict)
	L.append( '</dictionary>')

    elif t == NoneType:
	L.append( '<none/>')

    elif t == ComplexType:
        # XXX should it be <complex><real>...</real><imag>...</imag></complex>?
        L.append( '<complex><float>' )

	L.append( str(value.real) )
        L.append( '</float><float>' )
	L.append( str(value.imag) )
        L.append( '</float>' )

        L.append( '</complex>' )

    elif t == CodeType:
	# The full information about code objects is only available
	# from the C level, so we'll use the built-in marshal module
	# to convert the code object into a string, and include it in
	# the HTML.
	import marshal, base64
	L.append( '<code>' )
        s = marshal.dumps(value)
        s = base64.encodestring(s)
	L.append( s )
	L.append( '</code>' )
        dict[ i ] = 'code'

    return L

from xml.sax import saxlib
DICT = 'dict' ; LIST = 'list' ; TUPLE='tuple'

class _unmarshalHandler(saxlib.HandlerBase):
    def __init__(self):
        saxlib.HandlerBase.__init__(self)
        
    def startElement(self, name, attrs):
        if name == 'marshal':
            self.dict = {}
            self.data_stack = []
            return
        elif name == 'reference':
            assert attrs.has_key('id')
            id = attrs['id']
            assert self.dict.has_key(id)
            self.data_stack.append( self.dict[id] )
        
        if name=='dictionary':
            self.data_stack.append(DICT)
            d = {}
            id = attrs[ 'id']
            self.dict[ id ] = d
            self.data_stack.append( d )
        elif name=='list':
            self.data_stack.append(LIST)
            L = []
            id = attrs[ 'id']
            self.dict[ id ] = L
            self.data_stack.append( L )
        elif name=='tuple':
            self.data_stack.append(TUPLE)
        else:
            self.data_stack.append( [] )

    def characters(self, ch, start, length):
        self.data_stack[-1].append(ch[start:start+length])

    def endElement(self, name):
        ds = self.data_stack
        if name == 'string':
            ds[-1] = string.join(ds[-1], "")
        elif name == 'integer':
            ds[-1] = string.join(ds[-1], "")
            ds[-1] = string.atoi( ds[-1] )
        elif name == 'long':
            ds[-1] = string.join(ds[-1], "")
            ds[-1] = string.atol( ds[-1] )
        elif name == 'float':
            ds[-1] = string.join(ds[-1], "")
            ds[-1] = string.atof( ds[-1] )
        elif name == 'none':
            ds[-1] = None
        elif name == 'complex':
            c = ds[-2] + ds[-1]*1j
            ds[-3:] = [c]
        elif name == 'code':
            import marshal, base64
            s = string.join(ds[-1], "")
            s = base64.decodestring( s )
            ds[-1] = marshal.loads(s)
        elif name == 'dictionary':
            for index in range(len(ds)-1, -1, -1):
                if ds[index] is DICT: break
            assert index!=-1
            d = ds[index+1]
            for i in range(index+2, len(ds), 2):
                key = ds[i] ; value =ds[i+1]
                d[key] = value
            ds[index:] = [ d ]
            
        elif name == 'list':
            for index in range(len(ds)-1, -1, -1):
                if ds[index] is LIST: break
            assert index!=-1
            L = ds[index+1]
            
            L[:] = ds[index+2 : len(ds)]
            ds[index:] = [ L ]
        elif name == 'tuple':
            for index in range(len(ds)-1, -1, -1):
                if ds[index] is TUPLE: break
            assert index!=-1
            t = tuple( ds[index+1 : len(ds)] )
            ds[index:] = [ t ]
            
            
def dump(value, file):
    "Write the value on the open file"
    L = _marshal(value, {} )
    L = [PROLOGUE + '<marshal>'] + L + ['</marshal>']
    file.write( string.join(L, "") )

def load(file):
    "Read one value from the open file"
    h = _unmarshalHandler()
    from xml.sax import saxexts
    p=saxexts.make_parser()
    p.setDocumentHandler(h)
    p.parseFile(file)
    return h.data_stack[0]
    
def dumps(value):
    "Marshal value, returning the resulting string"
    L = _marshal(value, {} )
    L = [PROLOGUE + '<marshal>'] + L + ['</marshal>']
    return string.join(L, "")

def loads(string):
    "Read one value from the string"
    import StringIO
    file = StringIO.StringIO(string)
    return load(file)

if __name__ == '__main__':
    print "Testing XML marshalling..."
    L=[None, 1, pow(2,123L), 19.72, 1+5j, 
       "here is a string & a <fake tag> ",
       (1,2,3), 
       ['alpha', 'beta', 'gamma'],
       {'key':'value', 1:2}, 
       dumps.func_code ]

    # Try all the above bits of data
    import StringIO

    for item in L + [ L ]:
	s = dumps(item)
        print s
	output = loads(s)
	# Try it from a file
	file = StringIO.StringIO()
	dump(item, file)
	file.seek(0)
	output2 = load(file)

        print repr(item), s
        assert item==output and item==output2 and output==output2

    recursive_list = [None, 1, pow(3,65L), '<fake tag>', 1+5j]
    recursive_list.append( recursive_list )
    s = dumps(recursive_list)
    print s
    output = loads(s)
    print repr(output)

   
From mss@transas.com  Sun Sep  6 18:12:21 1998
From: mss@transas.com (Michael Sobolev)
Date: Sun, 6 Sep 1998 21:12:21 +0400
Subject: [XML-SIG] a small question
Message-ID: <19980906211221.A9066@transas.com>

Is this declaration is valid?

    <!ENTITY % lang.params "lang   CDATA #REQUIRED">
    <!ELEMENT comment (#PCDATA)>
    <!ATTLIST comment
        %lang.params;>

If no, what exactly is incorrect?  If yes, why xmlproc does not process
it properly? :)

TIA,

--
Mike


From colds@nwlink.com  Sun Sep  6 20:06:07 1998
From: colds@nwlink.com (Chris Olds)
Date: Sun, 06 Sep 1998 12:06:07 -0700
Subject: [XML-SIG] a small question
References: <19980906211221.A9066@transas.com>
Message-ID: <35F2DD1F.E5C2D166@nwlink.com>

This is legal unless it is in the internal subset, i.e. the part of the
DTD in the document instance.  In the document instance, parameter
entities can only appear where an entity, element or attribute list
declaration can appear, and must yield a complete declaration.

As for why xmlproc, I haven't tried it on this yet (a complete document
example and an explanation of what you mean by not processing properly
would help).

Michael Sobolev wrote:
> 
> Is this declaration is valid?
> 
>     <!ENTITY % lang.params "lang   CDATA #REQUIRED">
>     <!ELEMENT comment (#PCDATA)>
>     <!ATTLIST comment
>         %lang.params;>
> 
> If no, what exactly is incorrect?  If yes, why xmlproc does not process
> it properly? :)

	/cco


From mss@transas.com  Sun Sep  6 20:46:28 1998
From: mss@transas.com (Michael Sobolev)
Date: Sun, 6 Sep 1998 23:46:28 +0400
Subject: [XML-SIG] a small question
In-Reply-To: <35F2DD1F.E5C2D166@nwlink.com>; from Chris Olds on Sun, Sep 06, 1998 at 12:06:07PM -0700
References: <19980906211221.A9066@transas.com> <35F2DD1F.E5C2D166@nwlink.com>
Message-ID: <19980906234628.A6858@transas.com>

OK.  My document looks like:

<?xml version="1.0"?>
<!DOCTYPE info SYSTEM "my.dtd">
<info>
    ...
</info>

my.dtd:

<!ENTITY % common.decl SYSTEM "common.mod">

<!-- other similar declarations -->

%common.decl;

common.mod:

> >     <!ENTITY % lang.params "lang   CDATA #REQUIRED">
> >     <!ELEMENT comment (#PCDATA)>
> >     <!ATTLIST comment
> >         %lang.params;>

What I get:

/home/mss/xml/common.mod:8:5: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]*
/home/mss/xml/common.mod:8:5: Whitespace expected here
/home/mss/xml/common.mod:8:5: Expected type or alternative list
/home/mss/xml/common.mod:16:5: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]*
/home/mss/xml/common.mod:16:5: Whitespace expected here
/home/mss/xml/common.mod:16:5: Expected type or alternative list
/home/mss/xml/common.mod:22:5: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]*
/home/mss/xml/common.mod:22:5: Whitespace expected here
/home/mss/xml/common.mod:22:5: Expected type or alternative list
/home/mss/xml/other.mod:15:5: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]*
/home/mss/xml/other.mod:15:5: Whitespace expected here
/home/mss/xml/other.mod:15:5: Expected type or alternative list
info.xml:7:22: Unknown attribute 'lang'

Where common.mod:8:5 is for first %lang.params;.

Is it correct usage?

--
Mike


From Jack.Jansen@cwi.nl  Sun Sep  6 22:29:31 1998
From: Jack.Jansen@cwi.nl (Jack Jansen)
Date: Sun, 06 Sep 1998 23:29:31 +0200
Subject: [XML-SIG] Marshalling to XML, again
In-Reply-To: Message by "A.M. Kuchling" <amk1@erols.com> ,
 Sun, 6 Sep 1998 10:43:26 -0400 , <199809061443.KAA00545@207-172-112-146.s146.tnt4.ann.erols.com>
Message-ID: <UTC199809062129.XAA15618.jack@snelboot.cwi.nl>

Recently, "A.M. Kuchling" <amk1@erols.com> said:
> There's been some discussion of marshalling scripting language data
> types on the Casbah list and on the Perl-XML list recently; Dave
> Winer's suggestion for XML-RPC bears some relation to this.  It would
> be very useful if some common DTD was agreed upon, which would allow
> painlessly exchanging data between Python and Perl, Frontier, or
> whatever.

This used to be my view, but after a bit more thinking I think that
what we want is not a common DTD but a number of easily convertible
DTDs, possibly with a common subset. The various object types in the
various languages each have their ideosyncracies, and it may be
important to keep these. Unless you have to convert the data
structures to your language of choice, in which case you want to read
the objects in the most logical but still representable form, of
course.

The question then is whether it is possible, upon reading an XML
representation of a yet-unknown language, to automatically convert the 
objects to the nearest representation of your language.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@cwi.nl      | ++++ if you agree copy these lines to your sig ++++
http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From gstein@lyra.org  Sun Sep  6 22:25:50 1998
From: gstein@lyra.org (Greg Stein)
Date: Sun, 06 Sep 1998 14:25:50 -0700
Subject: [XML-SIG] Marshalling to XML, again
References: <199809061443.KAA00545@207-172-112-146.s146.tnt4.ann.erols.com>
Message-ID: <35F2FDDE.3C979D78@lyra.org>

A.M. Kuchling wrote:
> 
> Here's another version of the xml.marshal module.  (That name will
> have to be changed now, though, since xml.marshal uses Python's
> original marshal to handle code objects.  Any suggestions?)

Why does it need to change?

> For example, take the recursive list produced by this code:
> ...
>
> There's been some discussion of marshalling scripting language data
> types on the Casbah list and on the Perl-XML list recently; Dave
> Winer's suggestion for XML-RPC bears some relation to this.  It would
> be very useful if some common DTD was agreed upon, which would allow
> painlessly exchanging data between Python and Perl, Frontier, or
> whatever.  (However, I lack the time to read all the relevant mailing
> lists and agitate for a specification.)

Dave, MSFT, and another company are defining an XML-based RPC thing,
which they're calling SOAP (Simple Object Access Protocol). They haven't
released a spec yet, but the intent to provide a low-level RPC that
would slide in underneath the various Distributed Object systems. This
would allow, say, a Windows-based system to use DCOM to call an object
on a Linux system, where the calls and parameters are marshalled in XML.

The data types that they would use will follow what the IE5 version of
MSXML can do for data typing. It is detailed here:

http://www.microsoft.com/xml/authoring/dataTypes/dataTypes.htm

I'm not on those lists -- are there web archives somewhere? I'd be
interested in reading those threads.

> If no such common DTD arises, is this module still useful, and should
> it be included?

Yes, definitely. When SOAP is completed, I'd like to hook up the Linux
end of it :-) (and the marshalling will be needed). Sure, it could
change or whatever, but for any type of RPC system, the XML-based
marshalling done by this module will be cool.

Note: a Python client talking to a Python server could recognize that
fact, marshal using the builtin, and then embed the data into a PCDATA
element (or maybe encode using base64 for simplicity). It would fall
back to the above marshalling for unknown targets. MSFT will similarly
try to use a faster marshalling between its platform ("interoperable,
but it works better if you use Windows all around" is always their motto
:-)

-g

--
Greg Stein (gstein@lyra.org)


From MHammond@skippinet.com.au  Sun Sep  6 14:29:24 1998
From: MHammond@skippinet.com.au (Mark Hammond)
Date: Sun, 6 Sep 1998 23:29:24 +1000
Subject: [XML-SIG] XBEL DTD
Message-ID: <01e101bdd9ed$22d0fcc0$1301a8c0@bobcat.skippinet.com.au>

Fredrik and Jack both hit exactly on 2 questions I had.  I would really like
some comments on them.  Jack asked:

> I would have used elements only
> for the BOOKMARK and NODE items,
> and used attributes for the rest.
>Can anyone enlighten me which method is best, and why?

Any comments?  The best I came up with is that attributes require quotes,
and elements dont??  But logically I agree many of these things are actually
attributes.  Should they be attributes instead of elements?

And Fredrik asked:
>a name element.  Let's see...  Is the following valid syntax?
>
>    <!ELEMENT NODE     (NAME?, (BOOKMARK|NODE)+)>

I have no idea, and I could not find an answer myself.  Im glad you noticed!
I am running with it for now :-)

Sean asked about the CaseOfTheTags??  No one seemed to go with that idea?  I
kinda like it.

And lastly, the discussion on dates seemed to settle with James indicating
the XML would look like:
<date scheme="iso-8601">2005-12-01</date>
But I am unsure what this means to the DTD??

So the new DTD (only a few mods) now looks like:

<!-- DTD for XBEL - XML Bookmark Exchange Language -->

<!ELEMENT XBEL     (INFO, FOLDER+)>
<!ELEMENT FOLDER   (NAME?, (BOOKMARK|FOLDER)+)>

<!ELEMENT BOOKMARK (NAME, URL, ADDED?, VISITED?, MODIFIED?)>

<!ELEMENT INFO     (OWNER, DATE?, MACHINENAME?)>

<!ELEMENT OWNER    (#PCDATA)>
<!ELEMENT MACHINENAME (#PCDATA)>
<!ELEMENT DATE     (#PCDATA)>

<!ELEMENT NAME     (#PCDATA)>
<!ELEMENT URL      (#PCDATA)>
<!ELEMENT ADDED    (#PCDATA)>
<!ELEMENT VISITED  (#PCDATA)>
<!ELEMENT MODIFIED (#PCDATA)>


Mark.


From digitome@iol.ie  Mon Sep  7 08:20:54 1998
From: digitome@iol.ie (Sean Mc grath)
Date: Mon, 07 Sep 1998 08:20:54 +0100
Subject: [XML-SIG] XBEL DTD
Message-ID: <1.5.4.32.19980907072054.0094dfdc@gpo.iol.ie>

[Mark Hammond]
>Fredrik and Jack both hit exactly on 2 questions I had.  I would really like
>some comments on them.  Jack asked:
>
>> I would have used elements only
>> for the BOOKMARK and NODE items,
>> and used attributes for the rest.
>>Can anyone enlighten me which method is best, and why?
>
>Any comments?  The best I came up with is that attributes require quotes,
>and elements dont??  But logically I agree many of these things are actually
>attributes.  Should they be attributes instead of elements?

The attribute versus element debate is one of the nuggets of the SGML/XML
world. You can express some extra (though rather minor) validity constraints
for attribute values in that they can be one of a pre-defined set of types.

Attribute values can also be layered on at parse-time rather than
added to the document itself. This has a number of very useful applications
culminating in the powerful notion of a document architecture. Lets
leave document architectures alone for now...

Some argue that attributes should only be used for content that is not
logically part of the document. I.e. if it should not disappear when you
strip tags, don't put it in an attribute. Others argue that attributes
are redundant and should be used sparingly if at all. Me? I throw
a small drop of 10 year old Irish Whiskey over my left shoulder
whilst standing on one leg. If one of the little people appear,
I use an attribute, otherewise PCDATA.

>
>And Fredrik asked:
>>a name element.  Let's see...  Is the following valid syntax?
>>
>>    <!ELEMENT NODE     (NAME?, (BOOKMARK|NODE)+)>
>
>I have no idea, and I could not find an answer myself.  Im glad you noticed!
>I am running with it for now :-)
Perfectly valid syntax.

>
>Sean asked about the CaseOfTheTags??  No one seemed to go with that idea?  I
>kinda like it.
SoDoI. XBEL documents are gonna LOOK REALLY LOUD. all lowercase is, i think
prefereable to all uppcase whatever about camelcase...

>
>And lastly, the discussion on dates seemed to settle with James indicating
>the XML would look like:
><date scheme="iso-8601">2005-12-01</date>
>But I am unsure what this means to the DTD??

XML parsers do not know anything about dates. You need
to layer on a program that knows about iso-8601. The above is fine
XML markup but you do not get the implied semantic check from XML.

Having said that, this stuff is on the way really soon now. The first
salvo was a Tim Bray propsal for data typing in XML. Then came a formal
submission to the W3C called XML-Data. The latest state of play is
a joint Microsoft/IBM/Tim Bray propsal called DCD. Full info to be
found on W3C.ORG. Me? I use (abuse?) fixed attributes and Python:-

<!ATTLIST date
        value CDATA #REQUIRED
        python-value CDATA #FIXED "Is8601Date">

I have a Python program that kicks in immediately after a parse and hunts
for attributes of the form "python-X". This attribute value is treated
as a Python predicate function and passed the real value of attribute X.
You get the idea. I am not saying this is the way to go. I think DCD
syntax is that way to go because DCD will be built into a bunch of tools
including the Python ones. What I am saying, is that right now,
we have to roll our own validation code for dates.


Sean Mc Grath
http://www.digitome.com/sean.htm
+353 96 47391

"Imagine a world without hypothetical situations..."


From gstein@lyra.org  Mon Sep  7 09:38:34 1998
From: gstein@lyra.org (Greg Stein)
Date: Mon, 07 Sep 1998 01:38:34 -0700
Subject: [XML-SIG] XBEL DTD
References: <1.5.4.32.19980907072054.0094dfdc@gpo.iol.ie>
Message-ID: <35F39B8A.579081E0@lyra.org>

Sean Mc grath wrote:
> ...
> Having said that, this stuff is on the way really soon now. The first
> salvo was a Tim Bray propsal for data typing in XML. Then came a formal
> submission to the W3C called XML-Data. The latest state of play is
> a joint Microsoft/IBM/Tim Bray propsal called DCD. Full info to be
> found on W3C.ORG. Me? I use (abuse?) fixed attributes and Python:-
> ...

Euh... unless I'm horribly mistaken, XML-Data is an XML DTD that
describes a schema for describing schemas :-) i.e. rather than using
that specialized DTD syntax, you can describe the schema in XML. Of
course, this implies that you can start using a host of XML tools for
manipulating the actual schema.

Sure, there are some parts of XML-Data that are used to define the
constraints (and type) of an attribute, but it isn't very complete.

-g

--
Greg Stein (gstein@lyra.org)


From digitome@iol.ie  Mon Sep  7 09:51:16 1998
From: digitome@iol.ie (Sean Mc Grath)
Date: Mon, 7 Sep 1998 09:51:16 +0100
Subject: [XML-SIG] XBEL DTD
Message-ID: <199809070851.JAA18542@GPO.iol.ie>

>Sean Mc grath wrote:
>> ...
>> Having said that, this stuff is on the way really soon now. The first
>> salvo was a Tim Bray propsal for data typing in XML. Then came a formal
>> submission to the W3C called XML-Data. The latest state of play is
>> a joint Microsoft/IBM/Tim Bray propsal called DCD. Full info to be
>> found on W3C.ORG. Me? I use (abuse?) fixed attributes and Python:-
>> ...
>
[Greg Stein]
>Euh... unless I'm horribly mistaken, XML-Data is an XML DTD that
>describes a schema for describing schemas :-)
Yes. A common and powerful technique in the SGML/XML world. Meta-DTDs.

>i.e. rather than using
>that specialized DTD syntax, you can describe the schema in XML. Of
>course, this implies that you can start using a host of XML tools for
>manipulating the actual schema.

Right.

>
>Sure, there are some parts of XML-Data that are used to define the
>constraints (and type) of an attribute, but it isn't very complete.
>
It has been superceeded by DCD.

</Sean>

Sean Mc Grath - http://www.digitome.com/sean.htm
XML by Example:Building E-Commerce Applications 
	(http://www.amazon.com/exec/obidos/ISBN=0139601627/digitomeelectronA/)
ParseMe.1st - SGML for Software Developers
	(http://www.amazon.com/exec/obidos/ISBN=0134889673/digitomeelectronA/)


From akuchlin@cnri.reston.va.us  Mon Sep  7 18:53:09 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Mon, 7 Sep 1998 13:53:09 -0400
Subject: [XML-SIG] IBM XML developer survey
Message-ID: <199809071753.NAA00568@207-172-56-245.s245.tnt12.ann.erols.com>

Found this on scripting.com: IBM is running a survey of XML
developers, in order to design an XML Web site resource.
Interestingly, there's one section which asks you to list your skills
in various areas; Python is listed along with HTML, CGI, PageMill, and
various other (mostly Web-related) items. 

http://www.networking.ibm.com/survey/survey.nsf/surveyone

This is part 1 of the survey; fill out both parts, and you get a free
T-shirt.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Consumers are like roaches -- you spray them and they get immune after a while.
    -- David Lubars


From ken@bitsko.slc.ut.us  Tue Sep  8 03:12:04 1998
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 07 Sep 1998 21:12:04 -0500
Subject: [XML-SIG] Re: Marshalling to XML, again
Message-ID: <m3ww7f8ia3.fsf@biff.bitsko.slc.ut.us>

A.M. Kuchling <akuchlin@cnri.reston.va.us> writes:
> Here's another version of the xml.marshal module.  (That name will
> have to be changed now, though, since xml.marshal uses Python's
> original marshal to handle code objects.  Any suggestions?)

> For example, take the recursive list produced by this code:

> recursive_list = [None, 1, pow(3,65L), '<fake tag>', 1+5j]
> recursive_list.append( recursive_list )

> There's been some discussion of marshalling scripting language data
> types on the Casbah list and on the Perl-XML list recently;

Speaking from the Casbah project, we're just about to go 0.1 on our
Lightweight Distributed Objects.  0.1 won't have the XML serialization
implemented, but it is specified and will be plug-in compatible with
the current binary implementation.  The Python implementation is
browsable at:
  <http://www.ntlug.org/cgi-bin/cvsweb/lotos/python/>

The implementation for 0.1 includes the binary serialization, the
connection, and a remote proxy (method forwarder).  The last items to
clean up for 0.1 are the documentation and creating a coherent
release.

Initial notes on the XML serialization are in text format at:

  <http://www.bitsko.slc.ut.us/~ken/casbah/xml-serialization.txt>

The LDO equivalent serialization would look something like this:

    <list id=1>
      <null>
      <value>1</value>
      <value>10301051460877537453973547267843</value>
      <value>&lt;fake tag&gt;</value>
      <dict type="complex">
        <value>real</value><value>1.0</value>
        <value>imaginary</value><value>5.0</value>
      </dict>
      <ref id=1>
    <list>


In a followup message, Greg Stein <gstein@lyra.org> comments:
> Note: a Python client talking to a Python server could recognize
> that fact, marshal using the builtin, and then embed the data into a
> PCDATA element (or maybe encode using base64 for simplicity). It
> would fall back to the above marshalling for unknown targets.

We leaving a hook to support this in LDO, but assuming that the
built-in marshaling would completely replace the XML or binary
marshaling, rather than embedding it in XML.


This is more DO-SIG related, but from reading the above sample you can
probably guess an issue we're facing in the Python implementation: LDO
assumes that an implementation supports automatic or explicit coercion
from strings to numerics.

LDO supports numeric types (ints and floats), but doesn't require
them.  If the scenario was Python-to-Python, this wouldn't be a
problem because the calling Python code would encode using a numeric
type and the called Python code would decode the numeric type.  The
problem comes from command-line or TCL calling code talking to a
Python called code, or a Python calling code talking to a shell or TCL
called code -- numeric and string types aren't distinguished and
Python doesn't coerce between strings and numerics.

The solutions we've thought of so far are:

 1) require implementations to indicate numeric values (a real problem
for shells and TCL)

 2) require Python code to handle remote calls specially (a problem
specific to Python users that we had hoped to avoid)

 3) use CORBA IDLs (no worse than Java, C++, or CORBA)

 4) migrate Python to use non-math operators for string functions so
that math operators can signal coercion (an unlikely option)

(4) is the most elegant, but also the most difficult.  For now we're
starting with (2) and expecting (3) to be the ``final'' solution.

-- 
  Ken MacLeod
  ken@bitsko.slc.ut.us


From MHammond@skippinet.com.au  Tue Sep  8 02:08:50 1998
From: MHammond@skippinet.com.au (Mark Hammond)
Date: Tue, 8 Sep 1998 11:08:50 +1000
Subject: [XML-SIG] XBEL DTD
Message-ID: <00a601bddafe$cf377b30$1301a8c0@bobcat.skippinet.com.au>

>Some argue that attributes should only be used for content that is not
>logically part of the document. I.e. if it should not disappear when you
>strip tags, don't put it in an attribute. Others argue that attributes
>are redundant and should be used sparingly if at all. Me? I throw

Hmm.  This sounds like a reasonable "rule of thumb" to me.  Does anyone
disagree with this.

This does seem to fit the existing HTML model - eg, an "IMG" tag - the size
attributes dont really form part of the document.

Dont know about an "anchor" tag - the HREF is an attribute - IMO this is a
necessary part of the document.

But if we stick with this definition, then the DTD with only elements seems
correct.

>a small drop of 10 year old Irish Whiskey over my left shoulder
>whilst standing on one leg. If one of the little people appear,
>I use an attribute, otherewise PCDATA.

:-)  I can relate to that!  Hopefully this means you only use attributes
very rarely (or after a _long_ session :-)


>>Sean asked about the CaseOfTheTags??  No one seemed to go with that idea?
I
>>kinda like it.
>SoDoI. XBEL documents are gonna LOOK REALLY LOUD. all lowercase is, i think
>prefereable to all uppcase whatever about camelcase...

OK - no one making noises, so I will use lower case (all our elements are
single words, so no need for mixed case)

Interesting about "CamelCase".  Fredrik thought it means "Perl" (the obvious
Camel reference).  Personally, I took it as being derived from the
silhouette of a real camel - the humps relate to the caps in the middle of
the word.  I wonder where it derived from - does it really mean "Perl"?

Maybe we should call it "Kangaroo Case" ;-) (coined by someone from
"skippi-net" - coincidence, or conspiracy - you be the judge :-)

Mark.


From larsga@ifi.uio.no  Tue Sep  8 09:17:18 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Tue, 08 Sep 1998 10:17:18 +0200
Subject: [XML-SIG] a small question
In-Reply-To: <19980906211221.A9066@transas.com>
Message-ID: <3.0.1.32.19980908101718.006c967c@ifi.uio.no>

* Michael Sobolev
>
>Is this declaration is valid?
>
>    <!ENTITY % lang.params "lang   CDATA #REQUIRED">
>    <!ELEMENT comment (#PCDATA)>
>    <!ATTLIST comment
>        %lang.params;>

Yes, this declaration is perfectly valid.

>If yes, why xmlproc does not process it properly? :)

xmlproc currently only allows parameter entity references between declarations
and not inside them. I've now found a way to implement this (I think), so this
may appear in 0.60.

--Lars M.


From bottoni@cadlab.it  Tue Sep  8 10:00:43 1998
From: bottoni@cadlab.it (Alessandro Bottoni)
Date: Tue, 8 Sep 1998 11:00:43 +0200
Subject: [XML-SIG] Any example of HTML Processing with Python/SAX?
Message-ID: <005d01bddb07$28cbc9a0$172b2bc1@pc6d2.cadlab.it>

I'm starting to work with Python on HTML and XML documents, so I'm looking
for sample applications of HTML and XML processing with Python, XMLLIB,
HTMLLIB and, most important, SAX.

Does anybody knows where I could find a few good examples?
(Of course, I have already sacked www.python.org , starship.skyport.net ,
http://www.stud.ifi.uio.no/~larsga/download/python/xml/index.html and
www.pythonware.com )
Does anybody want to share any source code example/fragment with me?

TIA
------------------------------
Alessandro Bottoni
Technical Writer
Cad.Lab SPA
Bologna, Italy
---------------------


From akuchlin@cnri.reston.va.us  Tue Sep  8 15:18:56 1998
From: akuchlin@cnri.reston.va.us (Andrew Kuchling)
Date: Tue,  8 Sep 1998 10:18:56 -0400 (EDT)
Subject: [XML-SIG] Re: FREE DOM
In-Reply-To: <35F0DA72.563D1632@totten.com>
References: <35F0DA72.563D1632@totten.com>
Message-ID: <13813.15100.631957.233022@newcnri.cnri.reston.va.us>

John Totten writes (in a private message):
>Anyone working on a Python version of SAXDOM/FREEDOM?
>					John Totten

	[Cc'ed to xml-sig@python.org, because the answer is of interest]
	
	I spent some of this weekend working on the PyDOM code, trying
to bring it into compliance with the most recent DOM spec.  Nothing
releasable yet, though, though it hopefully won't take much longer.
DOM's moving through the W3C's process faster than I expected,
possibly becoming a Recommendation in September.  Therefore I think a
DOM implementation should be part of 1.0, instead of being postponed
until after 1.0.

	Garbage collection is going to be a problem, though.  DOM
nodes allow retrieving both the parent node, and the children.  The
obvious implementation is to have .parent and .children attributes,
but those create cycles, which will lead to uncollected garbage.  

	One solution is to require calling a .destroy() (or
similarly-named) method when you're done with a node.  The method
would then do something like:

	def destroy(self):
	    del self.parent
            for i in self.children:
	        i.destroy()
	    del self.children

This is simple to implement, but it means that you have to remember to
call .destroy().  Does anyone see a representation that would avoid
the necessity of doing this?  I was thinking of just having .children
in each node, and then there would be a global dictionary that mapped
nodes to their parent objects.  Because it's global, it wouldn't
participate in any cycles, but cleaning it up is also a pain.  

	Anyone have a suggestion?  (Other than continually visiting
Guido and whining for non-refcounting GC?)

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
It was a wasted life, but God forbid that one should be hard upon it, or upon
anything in this world that is not deliberately and coldly wrong . . .
    -- Charles Dickens, in a letter to his friend John Forster.


From fredrik@pythonware.com  Tue Sep  8 16:53:34 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 8 Sep 1998 16:53:34 +0100
Subject: [XML-SIG] Re: FREE DOM
Message-ID: <000901bddb40$d71947b0$f29b12c2@pythonware.com>

> One solution is to require calling a .destroy() (or
>similarly-named) method when you're done with a node.  The method
>would then do something like:
>
> def destroy(self):
>     del self.parent
>            for i in self.children:
>         i.destroy()
>     del self.children
>
>This is simple to implement, but it means that you have to remember to
>call .destroy().  Does anyone see a representation that would avoid
>the necessity of doing this?  I was thinking of just having .children
>in each node, and then there would be a global dictionary that mapped
>nodes to their parent objects.  Because it's global, it wouldn't
>participate in any cycles, but cleaning it up is also a pain.  

Yup. *When* should you do the clean-up in that case?  Since all
nodes will have an extra reference (from the global dictionary),
they'll never go away unless you explicitly call a cleanup function...

(alright, you can have a "purge" function that kills nodes with
reference count=1, and use a background thread to call that
function now and then...)

I definitely prefer the "destroy" pattern (or rather, I prefer to
use visitors for this, but that's another story).

> Anyone have a suggestion?  (Other than continually visiting
>Guido and whining for non-refcounting GC?)

Well, I see no reason why you cannot keep on doing that as well ;-)

Cheers /F

PS. Does anyone have pointers to SAXDOM and/or FreeDOM?


From larsga@ifi.uio.no  Tue Sep  8 15:47:41 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Tue, 08 Sep 1998 16:47:41 +0200
Subject: [XML-SIG] Re: FREE DOM
In-Reply-To: <000901bddb40$d71947b0$f29b12c2@pythonware.com>
Message-ID: <3.0.1.32.19980908164741.0075b2e4@ifi.uio.no>

* Fredrik Lundh
>
>PS. Does anyone have pointers to SAXDOM and/or FreeDOM?

SAXDOM has changed name to FreeDOM, which has changed name to The Docuverse
DOM
SDK. You can find it at:

<URL:http://www.docuverse.com/domsdk/index.html>


When looking for free XML tools, this is (IMHO) the place to start:

<URL:http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html>

--Lars M.


From wunder@infoseek.com  Tue Sep  8 18:27:52 1998
From: wunder@infoseek.com (Walter Underwood)
Date: Tue, 08 Sep 1998 10:27:52 -0700
Subject: [XML-SIG] Useless fun thing for XML - comments or helpers?
In-Reply-To: <3.0.5.32.19980905083816.007b62f0@ifi.uio.no>
References: <3.0.5.32.19980904143609.00c2b8e0@corp>
 <13808.21923.568032.333624@newcnri.cnri.reston.va.us>
 <3.0.5.32.19980903095358.00bf7290@corp>
 <002901bdd756$c28f9170$f29b12c2@pythonware.com>
 <3.0.5.32.19980903095358.00bf7290@corp>
Message-ID: <3.0.5.32.19980908102752.00a48df0@corp>

At 08:38 AM 9/5/98 +0200, Lars Marius Garshol wrote:
>* Walter Underwood
>>
>> [...] Well, we do pay attention to one tag -- the first <title>
>>or <TITLE> tag is considered to be the title of the document for 
>>purposes of displaying search hits.
>
>Hmmm. Have you considered using architectural forms to give page authors
>more freedom, but still allow you to discover which elements are the
>equivalents of 'TITLE' and 'AUTHOR' etc?

The general form of our answer for feature requests is "if paying
customers want it, we'll look at it". Of course, we're providing
XML even though we only have one customer asking for it (so far).

The Architectural Forms proposal looks interesting, and I actually
hope it catches on, since it could make our job easier. The search
engine only needs to know a little bit of info, basically, what is
content, what is meta-content, and what is formatting. Actual 
interpretation and display is the job of some other program. That
is why the search engine only needs well-formed XML, rather than
valid XML. But a *small* set of common base architectural forms
could allow the parser to sort out some of the basic data/metadata
elements.

Interestingly, this supports the earlier rule-of-thumb in the 
attribute vs. element discussion. If it is something that should
be searchable, represent it with an element.

At 04:04 PM 9/5/98 -0700, Lisa Rein wrote:
>I am very curious how exactly XML is being utilized in the search engine
>if the only tag  being taken into account is the (first) TITLE tag (just
>like a search engine would use during a "bag of words" approach) and not
>using a DTD -- making any semantic associations impossible.  
>
>If you're not going to deal with the text until after it's parsed, why
>are you using XML?  Are you doing some kind of indexing or another
>variation I haven't of?  Do tell ;-)

The goal is to make XML documents "findable" via web search. If we
treated them as raw text, the elements names would show up in search
results and would swamp queries like "xml" or "doctype" with irrelevant
hits. Parsing the XML allows us to give quality results. Being independent
of the DTD allows us to handle the widest variety of documents. So far,
that looks like a "sweet spot" in XML support. DTD-specific search
can get very complex, very fast.

Remember, the web server still serves the document. The search engine
only provides a URL to it. So the search engine just needs enough
info to serve a URL. Anything else gets in the way.

One clarification -- this feature is for the Ultraseek Server
product (http://software.infoseek.com), a search engine that people 
can buy and run locally. Ultraseek Server features are somewhat
indpendent of features for www.infoseek.com, the on-line search service. 

Finally, the XML market is very new, and this will be the first release
of our XML support. As the market matures, customers will tell us 
what they want and don't want, and we'll respond.

wunder


Walter R. Underwood
wunder@infoseek.com
wunder@best.com (home)
http://www.best.com/~wunder/
1-408-543-6946


From mss@transas.com  Tue Sep  8 19:47:27 1998
From: mss@transas.com (Michael Sobolev)
Date: Tue, 8 Sep 1998 22:47:27 +0400
Subject: [XML-SIG] yet another question.
Message-ID: <19980908224727.A24349@transas.com>

Let's suppose the following DTD.

    <!ELEMENT foo (bar+)>

    <!ELEMENT bar (#PCDATA)>

I believe that the following text conforms the above specification:

    <foo>
        <bar>1</bar>
        <bar>2</bar>
    </foo>

If I run pyexpat parser on the above text, I will get something like:

    start_element foo
    pcdata \n
    pcdata '   '
    start_element bar
    pcdata 1
    end_element bar
    pcdata \n
    pcdata '   '
    start_element bar
    pcdata 2
    end_element bar
    pcdata \n
    end_element foo

This is fine since expat is not a validating parser.  What should I expect from
a validating one?  After the declaration, foo cannot have any pcdata at all.

TIA,

--
Mike


From ken@bitsko.slc.ut.us  Tue Sep  8 20:06:03 1998
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: 08 Sep 1998 14:06:03 -0500
Subject: [XML-SIG] Re: FREE DOM
In-Reply-To: "Fredrik Lundh"'s message of Tue, 8 Sep 1998 16:53:34 +0100
References: <000901bddb40$d71947b0$f29b12c2@pythonware.com>
Message-ID: <m3k93e8lwk.fsf@biff.bitsko.slc.ut.us>

"Fredrik Lundh" <fredrik@pythonware.com> writes:

> > One solution is to require calling a .destroy() (or
> >similarly-named) method when you're done with a node.  The method
> >would then do something like:
> >
> > def destroy(self):
> >     del self.parent
> >            for i in self.children:
> >         i.destroy()
> >     del self.children
> >
> >This is simple to implement, but it means that you have to remember
> >to call .destroy().  Does anyone see a representation that would
> >avoid the necessity of doing this?  I was thinking of just having
> >.children in each node, and then there would be a global dictionary
> >that mapped nodes to their parent objects.  Because it's global, it
> >wouldn't participate in any cycles, but cleaning it up is also a
> >pain.

> I definitely prefer the "destroy" pattern (or rather, I prefer to
> use visitors for this, but that's another story).

I've used a proxy-iterator to solve this problem and it seems to be
working well.

When you build the tree, don't include parent references.  But when
somebody asks for a tree object, return a proxy for the tree object
that includes a parent reference.  Create iterator methods in the
proxy object that return new proxies with a correct parent
proxy-iterator.

The proxy-iterator classes are shadow classes for the object model
classes, so there's a one-to-one correspondence.  The tree objects end
up being simple data objects, it's the proxy-iterator that conforms to
the DOM interface.

For any ``active'' proxy-iterators, there will be a reference, but as
soon as the proxy-iterator is collected, the reference will go away,
leaving only the root of the tree as the primary reference -- release
the root and the entire tree is collected.

A side benefit of the proxy-iterator is that you can now share tree
fragments during processing, because the child-parent relationship is
contained in the proxy-iterator, not in the tree.

-- 
  Ken MacLeod
  ken@bitsko.slc.ut.us


From akuchlin@cnri.reston.va.us  Tue Sep  8 20:33:24 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Tue,  8 Sep 1998 15:33:24 -0400 (EDT)
Subject: [XML-SIG] yet another question.
In-Reply-To: <19980908224727.A24349@transas.com>
References: <19980908224727.A24349@transas.com>
Message-ID: <13813.33682.769880.357155@amarok.cnri.reston.va.us>

Michael Sobolev writes:
>    <foo>
>        <bar>1</bar>
>        <bar>2</bar>
>    </foo>
>This is fine since expat is not a validating parser.  What should I
>expect from a validating one?  After the declaration, foo cannot have
>any pcdata at all. 

Consult the annotated XML spec at www.xml.com.  Section 2.10 discusses
this:

	An XML processor must always pass all characters in a document
	that are not markup through to the application. A validating
	XML processor must also inform the application which of these
	characters constitute white space appearing in element
	content.

"Element content" is defined in section 3.2.1 as:

	An element type has element content when elements of that type must
	contain only child elements (no character data), optionally
	separated by white space (characters matching the nonterminal
	S).

So, a validating parser must still tell the application that this
whitespace is present, though it might not use the same mechanism it
uses for #PCDATA content.  For example, in the SAX interface there's a
method called ignorableWhitespace that would be used.  I'd imagine
that few applications will care about this, since few will treat
<bar>1</bar>\n<bar>2</bar> differently from <bar>1</bar><bar>2</bar>.
XML editors are probably the big exception to this, since an editor
would want to preserve whitespace when editing a document.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Most of my ideas were rejected and I got used to it. One can get fond of
almost anything, even rejection.
    -- Tom Baker, in his autobiography


From fredrik@pythonware.com  Tue Sep  8 21:49:49 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 8 Sep 1998 21:49:49 +0100
Subject: [XML-SIG] Re: FREE DOM
Message-ID: <001b01bddb6a$39945280$f29b12c2@pythonware.com>

Ken MacLeod wrote:
>
> /F wrote:
>> I definitely prefer the "destroy" pattern (or rather, I prefer to
>> use visitors for this, but that's another story).
>
>I've used a proxy-iterator to solve this problem and it seems to be
>working well.

Now that you mention it...

(short break while /F loads the "opal" project into opal)

module opal.core.XML:
    class XMLNode
    class XMLParser
    class XMLTreeBuilder(XMLParser)
    class XMLIterator
    def load
    def dump

(You're right, of course; if it doesn't violate the COM API,
this is a much better solution...)

Cheers /F
fredrik@pythonware.com
http://www.pythonware.com


From fleck@informatik.uni-bonn.de  Tue Sep  8 21:47:32 1998
From: fleck@informatik.uni-bonn.de (Markus Fleck)
Date: Tue, 08 Sep 1998 22:47:32 +0200
Subject: [XML-SIG] HUMOR: oos.org - "Our Own Standards"...
Message-ID: <35F597E4.4AA5@informatik.uni-bonn.de>

Hi!

For all those who are tired of reading (and implementing) overly
complicated standards documents, you might enjoy having a quick
look at

     <http://www.oos.org>,

the "Our Own Standards" organization :-), who have just published
their LML ("Lightweight Markup Language") specification, v1.1.
Their rule is "KEIS",  "Keep It Even Simpler".

The cool thing is that LML is HTML backwards-compatible and
can be displayed by any of the more popular WWW browsers...

OOS.ORG is offering Basic Membership for US$2.500 per
year (you won't be allowed to make suggestions, and may
not take part in the decision-making process, though :-),
and Full Membership at US$25.000 per year (note: special
offer for first 200 applicants only).

If you send them a nice anecdote "about why you dislike all those
new standards" to <mailto:info@oss.org>, you may also qualify for
their "Oppressed Engineer Support Plan".

Have fun. :-)

Yours,
Markus.

PS: Please CC any anecdotes... :-)

-- 
SCSI: System Can't See It 
ISDN: It Still Does Nothing
PCMCIA: People Can't Memorize Computer Industry Acronyms
TWAIN: Technology Without An Interesting Name (really!)


From fredrik@pythonware.com  Wed Sep  9 15:28:50 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Wed, 9 Sep 1998 15:28:50 +0100
Subject: [XML-SIG] XBEL DTD
Message-ID: <009501bddbfe$2c1c7300$f29b12c2@pythonware.com>

>>Some argue that attributes should only be used for content that is not
>>logically part of the document. I.e. if it should not disappear when you
>>strip tags, don't put it in an attribute. Others argue that attributes
>>are redundant and should be used sparingly if at all. Me? I throw
>
>Hmm.  This sounds like a reasonable "rule of thumb" to me.  Does anyone
>disagree with this.

The following just appeared in my mailbox:

    From: "John E. Simpson" <simpson@POLARIS.NET>
    Subject:      Re: Attributes and Elements
    To: XML-L@LISTSERV.HEANET.IE

    >What is the different between attributes and elements and when should they be
    >used?

    Look here for Robin Cover's excellent discussion of the issues:
            http://www.sil.org/sgml/elementsAndAttrs.html

    (The domain has changed, but I don't have the new one at hand -- for now,
    the above URL will work.)

Cheers /F


From akuchlin@cnri.reston.va.us  Wed Sep  9 15:03:06 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed,  9 Sep 1998 10:03:06 -0400 (EDT)
Subject: [XML-SIG] Re: FREE DOM
In-Reply-To: <m3k93e8lwk.fsf@biff.bitsko.slc.ut.us>
References: <000901bddb40$d71947b0$f29b12c2@pythonware.com>
 <m3k93e8lwk.fsf@biff.bitsko.slc.ut.us>
Message-ID: <13814.35247.953457.226574@amarok.cnri.reston.va.us>

Ken MacLeod writes:
>When you build the tree, don't include parent references.  But when
>somebody asks for a tree object, return a proxy for the tree object
>that includes a parent reference.  Create iterator methods in the
>proxy object that return new proxies with a correct parent
>proxy-iterator.

	That seems like a reasonable strategy, but how do you
determine what the parent reference should be, in general?  It's
obviously trivial to construct a proxy for some special cases, such as 
the children of a node, but how would you find the parent of a node
without actually storing a reference to it?  Storing a non-reference,
such as an integer ID?  Walking the tree?  Something else?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
First, you must know what the thing is, and then after learn the use of the
same.
    -- Robert Recorde


From fredrik@pythonware.com  Wed Sep  9 17:34:46 1998
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Wed, 9 Sep 1998 17:34:46 +0100
Subject: [XML-SIG] Re: FREE DOM
Message-ID: <00e701bddc0f$c291d800$f29b12c2@pythonware.com>

> That seems like a reasonable strategy, but how do you
>determine what the parent reference should be, in general?  It's
>obviously trivial to construct a proxy for some special cases, such as 
>the children of a node, but how would you find the parent of a node
>without actually storing a reference to it?  Storing a non-reference,
>such as an integer ID?  Walking the tree?  Something else?

The iterator uses a a parent list which is updated when you
move around in the tree.  If you go down, it adds the current
node to the parent list.  If you go up, it removes a node.

Cheers /F


From ken@bitsko.slc.ut.us  Wed Sep  9 17:38:04 1998
From: ken@bitsko.slc.ut.us (Ken MacLeod)
Date: Wed, 9 Sep 1998 11:38:04 -0500 (CDT)
Subject: [XML-SIG] Re: FREE DOM
Message-ID: <199809091638.LAA13523@bitsko.slc.ut.us>

Fredrik Lundh writes:
> Andrew M. Kuchling writes:
> > That seems like a reasonable strategy, but how do you determine
> >what the parent reference should be, in general?  It's obviously
> >trivial to construct a proxy for some special cases, such as the
> >children of a node, but how would you find the parent of a node
> >without actually storing a reference to it?  Storing a
> >non-reference, such as an integer ID?  Walking the tree?  Something
> >else?

> The iterator uses a a parent list which is updated when you move
> around in the tree.  If you go down, it adds the current node to the
> parent list.  If you go up, it removes a node.

The way I implemented it, I created a new proxy object for next, prev,
first_child, etc.  The proxy object carried a `parent' member that
pointed back to the parent _proxy_, so instead of a list it was a
chain back up to the parent.

In this case, the proxy-iterator isn't an iterator in the sense that
it has a ``current node'' and you point the iterator to new nodes by
calling next, prev, first_child, etc.  Instead, the iterator functions
actually return a new proxy.

This technique also allows you to pass proxy-iterators around as
easily as nodes themselves.


From bwaumg@urc.tue.nl  Wed Sep  9 21:31:11 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Wed, 09 Sep 1998 22:31:11 +0200
Subject: [XML-SIG] XBEL DTD
Message-ID: <199809092031.WAA11519@asterix.urc.tue.nl>

Hi,


For the purpose of discussion I added my attempt
at what is dubbed the XBEL DTD.

I took Mark Hammond's as a starting point.

I scoped it a little wider and included most features
of the Netscape bookmark format. Between MSIE and NS
the latter offers more and the bookmark organizer is
much much better. When a user saves a bookmark file
to XBEL I think it should be possible to convert it 
back to Netscape without much loss of data. Although 
this useless fun thing started with Mark's idea for
the MS favourites I would like the DTD to be able to
express the NS format too (dunno about the
Opera bookmark format).

Here are most of the changes with Mark's initial DTD:

- optional 'folded' attribute (to restore the state of a folder NS)
- optional description element for 'folder' and 'bookmark' elements (NS)
- added 'id' attribute to 'folder' and 'bookmark'
- added 'alias' element with 'ref' attribute to reference bookmarks
  (NS) maybe this can be implemented with shortcuts on MS Favourites?
- simplified the top level to (info,folder)
- changed 'name' into 'title' and made it a required element
- put all timestamps in attributes (where 'added' belongs to
  'bookmark' and the other to the 'url')
- no dates only timestamps (more precise and all these attributes
  can be treated with the same code)
- optional 'separator' element (not very useful but NS uses it in it's
  menu's)
- added 'added' timestamp attribute for folders too
- added link checking attributes to 'url' element (MSIE offers the
  possibility to subscribe and notify you of changes so we need the
  last checked time and status code)

Some other issues:

- Are duplicate names/titles allowed? Since Favourites use the
  filename they are restricted in characters/length it is also not
  possible to have duplicate names (there is no way the parser could
  check for this).
- Should the folder hierarchy be a forest or a tree. In NS the
  top level can have a description and title instead of adding these
  to the xbel element using (info,folder) or (info?,folder) lets
  the folder element itself take care of that. 
- The visited, modified, etc attributes belong to the url not
  to the bookmarks itself. The added attribute belongs to the
  bookmark element

As I said this is for discussion. I looked at it mainly from the
modelling side and did not consider implementation in any of the
Python XML parsers.

---
Marc
bwaumg@urc.tue.nl


Here's the DTD:

================ snip snip snip ==================

<!ELEMENT xbel     (info, folder)>
<!ATTLIST xbel
            version    CDATA   #IMPLIED
>

<!-- contents of info needs some more thought. Adding a meta    -->
<!-- element (like in HTML) makes this open-ended               -->

<!ELEMENT info    (owner,machinename)>
<!ELEMENT owner       (#PCDATA)>
<!ELEMENT machinename (#PCDATA)>


<!ELEMENT folder   (title, desc?, (bookmark|folder|separator|alias)+)>
<!ATTLIST folder
            id       ID       #IMPLIED
            added    CDATA    #IMPLIED 
            folded   (yes|no) 'yes'
>

<!ELEMENT bookmark (title,desc?,url)>
<!ATTLIST bookmark
            id       ID       #IMPLIED
            added    CDATA    #IMPLIED
>

<!ELEMENT title      (#PCDATA)>
<!ELEMENT desc       (#PCDATA)>
<!ELEMENT url        (#PCDATA)>
<!ATTLIST url
            visited  CDATA    #IMPLIED
            modified CDATA    #IMPLIED
            response CDATA    #IMPLIED
            checked  CDATA    #IMPLIED
>


<!ELEMENT separator EMPTY>

<!ELEMENT alias EMPTY>
<!ATTLIST alias
            ref       IDREF    #REQUIRED  
>


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Sep  9 21:55:57 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 9 Sep 1998 16:55:57 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD
In-Reply-To: <199809092031.WAA11519@asterix.urc.tue.nl>
References: <199809092031.WAA11519@asterix.urc.tue.nl>
Message-ID: <13814.60253.831905.443040@weyr.cnri.reston.va.us>

  This is starting to look like a potentially interesting bookmarks
format.  Once the DTD shapes up (and it looks like it's well on the
way with Marc's contribution), I'll add support for XBEL in Grail.
;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From heaney@mail.cambridge.scr.slb.com  Fri Sep 11 15:01:28 1998
From: heaney@mail.cambridge.scr.slb.com (Steven Heaney)
Date: 11 Sep 98 15:01:28 +0100
Subject: [XML-SIG] (fwd) WebDAV extensions to urllib/httplib?
Message-ID: <B21EEBD6-FB477@134.32.101.215>

On Mon, Aug 31, 1998 10:06 am, Greg Stein <mailto:gstein@lyra.org> wrote:
>Andrew M. Kuchling wrote:
>> 
>> "Steven Heaney" writes:
>> >Can anybody point me to some software to kick-start development of
>> >a client library for interacting with a 'WebDAV' server?  Specifically,
>> >I have in mind a forms-based interface to the Netscape Web Publisher
>> >functionality.
>> 
>>         I don't know of anyone who's started on implementing bits of
>> WebDAV in Python, but most of the pieces--httplib.py, XML parsing--are
>> probably already in place, and you'd only have to glue them together.
>> (This is gathered from a cursory glance at the WebDAV draft, so take
>> it with a grain of salt.)
>
>I'd be interested in following your work on this, as I had planned to
>start a similar library in about a month. I'll happily consult on info
>for a while, and take a more direct role later.
>
>thx
>-g
>

Andrew, Greg,

Thanks for the input.

Right now, I'm going to follow the line of least resistance which is 
to use JPython to access the Java client library provided by Netscape 
to communicate with their Web Publishing server.

I'm pretty sure this does not conform with the current draft of the 
standard or, of course, provide the basis for a 'standard' Python 
module, but it fits my purposes at the moment.  I simply need to 
create some wrappers (for convenience) to the Java classes provided 
and I'm up and running.

It's also introduced me to JPython, which is one of those 'wow' things 
you come across occasionally.

Cheers,

Steve.

........................................................................
Steven Heaney
Schlumberger

http://www.slb.com/cgi-bin/people.pl?type=person&name=steven%20heaney


From akuchlin@cnri.reston.va.us  Fri Sep 11 16:51:39 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Fri, 11 Sep 1998 11:51:39 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD
In-Reply-To: <199809092031.WAA11519@asterix.urc.tue.nl>
References: <199809092031.WAA11519@asterix.urc.tue.nl>
Message-ID: <13817.16056.184603.582709@amarok.cnri.reston.va.us>

Marc van Grootel writes:
>For the purpose of discussion I added my attempt
>at what is dubbed the XBEL DTD.
>I took Mark Hammond's as a starting point.
>I scoped it a little wider and included most features
>of the Netscape bookmark format. Between MSIE and NS
	
	What was the group's reaction to Marc's revised DTD?  I saw no
problems with it; while it makes the format a bit more complicated, it
seems to be required in order to support lossless conversion from
Netscape format.

	I like this little effort, and the XBEL programs will make a
good demo to include with the XML software.  To that end, I've written
a little program to read Lynx bookmarks and generate the corresponding
XBEL (using Mark's original DTD).

	Given that various people have already written programs to
convert Netscape or IE bookmarks to XBEL, we now only need something
to convert an XBEL document to an attractive HTML version, to make it
easy to display our bookmark lists on the Web.  Displaying XBEL files
could be done with XSL, or with the XML rendering features in Mozilla,
but that would limit the potential user base greatly; rendering to
HTML seems the obvious course to follow.

	Longer-term, what applications are enabled if you have lots of
people's bookmark files in a machine-readable form.  You could build a
high-quality list of links by finding the most commonly linked-to
pages; I can't think of any other use off the top of my head.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
I am not here to mourn him. I mourned the loss of my love a long time ago. I
am here to say goodbye to a stranger who once did me a good turn. And to the
man who gave my son the death he craved.
    -- Calliope, in SANDMAN #71, part two of "The Wake"


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Sep 11 17:23:05 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 11 Sep 1998 12:23:05 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD
In-Reply-To: <13817.16056.184603.582709@amarok.cnri.reston.va.us>
References: <199809092031.WAA11519@asterix.urc.tue.nl>
 <13817.16056.184603.582709@amarok.cnri.reston.va.us>
Message-ID: <13817.20073.56664.26194@weyr.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > 	What was the group's reaction to Marc's revised DTD?  I saw no

  I liked it.  I'll be glad to add support for XBEL in Grail (both
internally and in the external bookmarks2html script (which should be
renamed...).


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From bwaumg@urc.tue.nl  Sun Sep 13 01:31:50 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Sun, 13 Sep 1998 02:31:50 +0200
Subject: [XML-SIG] XBEL DTD
Message-ID: <199809130031.CAA15303@asterix.urc.tue.nl>

Hi,

I attached a modified version of Lars's nsparse.py and bookmarks.py. I
changed nsparse to use htmllib since I thought it could cause problems 
when xmlproc gets HTML (empty elements: <hr> vs. <hr/> ?). I didn't
check that though.

I changed bookmarks.py to output xbel XML according to the dtd (oops
that's a lie -- it doesn't output an info element and in the dtd I
defined it as a required element) I recently sent. Oh, and I removed the
dump_adr methods 'cause I didn't know how to implement the new
features for Opera.

  cat bookmark.htm | nsparse.py -ns >bookmark2.html
  cat bookmark.htm | nsparse.py >bookmark.xml

I only ran this one time on my big bookmark file and it worked. Don't
hit me if it blows up. It's just an illustration for getting stuff
into the new dtd. The code may be a bit messy too. I'm a recent Python
convert hope it's not too much baby-talk ;)

I also thought that it would be nice to be able to store extra info in
the xbel file on different levels. This could be done by borrowing the
HTML meta tag idea:

  <xbel>
    <info>
      <meta name="generator" content="grail:?)">
      <meta name="created" content="123456789">
    </info>
    <folder>
      <info>
        <meta name="x" content="10">
      </info>
      <bookmark>...</bookmark>
      <bookmark>
        <info><meta name="y" content="20"></info>
        ...
      </bookmark>

  etc.

We could then store arbitrary data with the major elements
(xbel,folder,bookmark). It's an easy enough addition without adding
much complexity. And if you don't need it just ignore the info
elements it. Maybe it could be used in web-maintainance tools like
linbot.

What bookmark formats should be supported? I would like to see
excerpts of different kinds (like Lynx, Opera) and see if any of those
makes changes to the dtd necessary. It would be nice if xbel could
be used to express most of these without loss of information.


Oh,well...

Marc

Here are the two scripts:

#
# nsparse.py
#
from htmllib import *
from formatter import NullFormatter
import bookmark

class NSBookmarkParser(HTMLParser):

    def __init__(self):
        HTMLParser.__init__(self,NullFormatter())
        self.inBookmark = 0
        self.inDesc = 0
        self.inFolder = 0
        self.added = None
        self.folded = None
        self.desc = None
        self.title = None
        self.url_href = None
        self.url_modified = None
        self.id = None
        self.ref = None
        self.url_visited = None
        self.url_modified = None
        self.bms = bookmark.Bookmarks()
        
    def start_h1(self,attrs):
        self.inFolder = 1
        self.save_bgn()
        
    def end_h1(self):
        self.title = self.save_end()

    def start_h3(self,attrs):
        self.inFolder = 1
        for a in attrs:
            if a[0]=='add_date':
                self.added=a[1]
            elif a[0]=='folded':
                self.folded='yes'
        self.save_bgn()

    def end_h3(self):
        self.title = self.save_end()

    def start_dl(self,attrs):
        self.flush()
        
    def end_dl(self):
        self.flush()
        self.bms.leave_folder()
        self.inFolder = 0
        
    def do_hr(self,attrs):
        self.flush()
        self.bms.add_separator()
        
    def do_dt(self,attrs):
        self.flush()
        
    def do_dd(self,attrs):
        self.inDesc = 1
        self.save_bgn()        
        
    def start_a(self,attrs):
        for a in attrs:
            if a[0]=='href':
                self.url_href=a[1]
            elif a[0]=='add_date':
                self.added=a[1]
            elif a[0]=='last_visit':
                self.url_visited=a[1]
            elif a[0]=='last_modified':
                self.url_modified=a[1]
            elif a[0]=='aliasid':
                self.id=a[1]
            elif a[0]=='aliasof':
                self.ref=a[1]                
        self.inBookmark = 1
        self.save_bgn()

    def end_a(self):
        self.title = self.save_end()

    def dump_xbel(self):
        self.bms.dump_xbel()

    def dump_netscape(self):
        self.bms.dump_netscape()
        
    def flush(self):
        if self.inDesc == 1:
            self.desc = self.save_end()
            self.inDesc = 0
        if self.inBookmark == 1:
            if self.ref:
                self.bms.add_alias(self.ref)
            else:
                self.bms.add_bookmark(self.added,self.title,self.desc,self.id,self.url_href,self.url_visited,self.url_modified,None,None)
            self.inBookmark = 0
        elif self.inFolder == 1:
            self.bms.add_folder(self.title,self.desc,self.added,self.folded)
            self.inFolder = 0
        self.desc=None
        self.folded=None
        self.added=None
        self.title=None
        self.desc=None
        self.url_href=None
        self.url_modified=None
        self.url_visited=None
        self.ref=None
        self.id=None
        
if __name__ == '__main__':

    p = NSBookmarkParser()
    p.feed(sys.stdin.read())
    p.close()

    if "-ns" in sys.argv:
        p.dump_netscape()
    else:
        p.dump_xbel()


#
# bookmark.py
#
#
"""
Classes to store bookmarks and dump them to XBEL.
"""

import sys,string

# --- maintain a stored for id objects
IDs = {}

def StoreID(id,obj):
    IDs[id]=obj
        
def GetID(id):
    return IDs[id]
    
# --- Class for bookmark container

class Bookmarks:

    def __init__(self):
        self.folders=[]
        self.folder_stack=[]
    def add_folder(self,title,desc,added,folded):
        nf = Folder(title,desc,added,folded)
        if self.folder_stack==[]:
            self.folders.append(nf)
        else:
            self.folder_stack[-1].add_child(nf)
        self.folder_stack.append(nf)

    def add_bookmark(self,added,title,desc,id,href,visited,modified,checked,response):
        nb = Bookmark(added,title,desc,id,href,visited,modified,checked,response)
        if id: StoreID(id,nb)
        if self.folder_stack!=[]:
            self.folder_stack[-1].add_child(nb)
        else:
            self.folders.append(nb)

    def add_separator(self):
        sep = Separator()
        if self.folder_stack!=[]:
            self.folder_stack[-1].add_child(sep)
        else:
            self.folders.append(sep)
            
    def add_alias(self,ref):
        al = Alias(ref)
        if self.folder_stack!=[]:
            self.folder_stack[-1].add_child(al)
        else:
            self.folders.append(al)
        
    def leave_folder(self):
        if self.folder_stack!=[]:
            del self.folder_stack[-1]

    def dump_xbel(self,out=sys.stdout):
        out.write("<!DOCTYPE xbel SYSTEM \"xbel.dtd\">\n")
        out.write("<?xml version=\"1.0\"?>\n")
        out.write("<xbel version=\"0.1\">\n")
        for folder in self.folders:
            folder.dump_xbel(out)
        out.write("</xbel>")

    def dump_netscape(self,out=sys.stdout):
        out.write("<!DOCTYPE NETSCAPE-Bookmark-file-1>\n")
        out.write("<!-- This is an automatically generated file.\n")
        out.write("It will be read and overwritten.\n")
        out.write("Do Not Edit! -->\n")
        # output first folder specially
        f = self.folders[0]
        out.write("<TITLE>%s</TITLE>\n" % f.title)
        out.write("<H1>%s</H1>\n" % f.title)
        out.write("<DD>%s\n<DL><p>\n" % f.desc)
        for folder in f.children:
            folder.dump_netscape(out)
        out.write("  </DL><p>\n")
                  
class Folder:

    def __init__(self,title,desc,added,folded):
        self.added=added
        self.folded=folded
        self.title=title
        self.desc=desc
        # valid children are folders,bookmarks,separators and aliases
        self.children=[]

    def add_child(self,child):
        self.children.append(child)

    def dump_xbel(self,out):
        out.write("  <folder")
        if self.added: out.write(" added=\"%s\"" % self.added)
        if self.folded: out.write(" folded=\"%s\"" % self.folded)
        out.write(">\n")
        out.write("    <title>%s</title>\n" % self.title)
        if self.desc: out.write("    <desc>%s</desc>\n" % self.desc)
        for child in self.children:
            child.dump_xbel(out)
        out.write("  </folder>\n\n")

    def dump_netscape(self,out):
        # if toplevel then output title and h1
        #if self.folders: #??"
        out.write("    <DT><H3")
        if self.folded: out.write(" FOLDED")
        out.write(">%s</H3>\n" % self.title)
        if self.desc: out.write("  <DD>%s" % self.desc)
        out.write("  <DL><p>\n")
        for child in self.children:
            child.dump_netscape(out)            
        out.write("  </DL><p>\n")

# --- Class for bookmarks
        
class Bookmark:

    def __init__(self,added,title,desc,id,href,visited,modified,checked,response):
        self.id=id
        self.added=added
        self.title=title
        self.desc=desc
        self.href=href
        self.visited=visited
        self.modified=modified
        self.checked=checked
        self.response=response

    def dump_xbel(self,out):
        out.write("    <bookmark")
        if self.id: out.write(" id=\"%s\"" % self.id)
        if self.added: out.write(" added=\"%s\"" % self.added)
        out.write(">\n")
        out.write("      <title>%s</title>\n" % self.title)
        if self.desc:  out.write("      <desc>%s</desc>" % self.desc)
        out.write("      <url")
        if self.modified: out.write(" modified=\"%s\"" % self.modified)
        if self.visited: out.write(" visited=\"%s\"" % self.visited)
        if self.id: out.write(" id=\"%s\"" % self.id)
        if self.checked: out.write(" checked=\"%s\"" % self.checked)
        if self.response: out.write(" response=\"%s\"" % self.response)
        out.write(">%s</url>\n" % self.href)
        out.write("    </bookmark>\n")

    def dump_netscape(self,out):
        out.write("    <DT><A HREF=\"%s\"" % self.href)
        if self.id:
            out.write(" ALIASID=\"%s\"" % self.id)
        if self.added:
            out.write(" ADD_DATE=\"%s\"" % self.added)
        else:
            out.write(" ADD_DATE=\"0\"")
        if self.visited:
            out.write(" LAST_VISIT=\"%s\"" % self.visited)
        if self.modified:
            out.write(" LAST_MODIFIED=\"%s\"" % self.modified)
        out.write(">%s</A>\n" % self.title)
        if self.desc:
            out.write("    <DD>%s" % self.desc)
        
class Alias:

    def __init__(self,ref):
        self.ref=ref
        
    def dump_xbel(self,out):
        out.write("    <alias ref=\"%s\"/>" % self.ref)
        
    def dump_netscape(self,out):
        bookref=GetID(self.ref)
        out.write("    <DT><A HREF=\"%s\"" % bookref.href)
        out.write(" ALIASOF=\"%s\"" % self.ref)
        if bookref.added:
            out.write(" ADD_DATE=\"%s\"" % bookref.added)
        else:
            out.write(" ADD_DATE=\"0\"")
        if bookref.visited:
            out.write(" LAST_VISIT=\"%s\"" % bookref.visited)
        if bookref.modified:
            out.write(" LAST_MODIFIED=\"%s\"" % bookref.modified)
        out.write(">%s</A>\n" % bookref.title)
        if bookref.desc:
            out.write("    <DD>%s" % bookref.desc)

class Separator:
    
    def dump_xbel(self,out):
        out.write("      <separator/>\n")

    def dump_netscape(self,out):
        out.write("<HR>\n")


From akuchlin@cnri.reston.va.us  Sun Sep 13 15:06:12 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Sun, 13 Sep 1998 10:06:12 -0400
Subject: [XML-SIG] XBEL: Lynx bookmark parser
Message-ID: <199809131406.KAA00647@207-172-46-194.s194.tnt9.ann.erols.com>

Here's a script to parse Lynx bookmark files, and output them as XBEL.
It seems reasonable to modify ns_parse.py, msie_parse.py, and
adr_parse.py to always output XBEL.  I'm now also working on a SAX
handler that converts XBEL to a Bookmarks instance and then outputs it
in one of the browser formats.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
For animals, the entire universe has been neatly divided into things to (a)
mate with, (b) eat, (c) run away from, and (d) rocks.
    -- Terry Pratchett, _Equal Rites_

#!/usr/bin/env python
#
# lynx_parse.py :
# Read a list of Lynx bookmark files, specified on the command line,
# and outputs the corresponding XBEL document.
#
# Sample usage: ./lynx_parse.py 
#

import bookmark
import re

def parse_lynx_file(bms, input):
    """Convert a Lynx 2.8 bookmark file to XBEL, reading from the
    input file object, and write to the output file object.""" 

    # Read the whole file into memory
    data = input.read()

    # Get the title
    m = re.search("<title>(.*?)</title>", data, re.IGNORECASE)
    if m is None: title = "Untitled"
    else: title = m.group(1)

    bms.add_folder( title, None, None)
    
    hrefpat = re.compile( r"""^ \s* <li> \s*
<a \s+ href \s* = \s* "(?P<url> [^"]* )" \s*>
(?P<name> .*? ) </a>""",
    re.IGNORECASE| re.DOTALL | re.VERBOSE | re.MULTILINE)
    pos = 0
    while 1:
        m = hrefpat.search(data, pos)
        if m is None: break
        pos = m.end()
        url, name = m.group(1,2)
        bms.add_bookmark( name, None, None, url)

    bms.leave_folder()

if __name__ == '__main__':
    import sys
    bms = bookmark.Bookmarks()

    # Determine the owner on Unix platforms
    import os, pwd
    uid = os.getuid()
    t = pwd.getpwuid( uid )
    bms.owner = t[4]

    for file in sys.argv[1:]:
        input = open(file)
        parse_lynx_file(bms, input)

    bms.dump_xbel()


From akuchlin@cnri.reston.va.us  Sun Sep 13 17:39:35 1998
From: akuchlin@cnri.reston.va.us (A.M. Kuchling)
Date: Sun, 13 Sep 1998 12:39:35 -0400
Subject: [XML-SIG] PyExpat module swallowing exceptions
Message-ID: <199809131639.MAA09760@mira.erols.com>

I've come across a curious bug; the SAX PyExpat module seems to
swallow exceptions, but I can't figure out why this is happening.
Here's a test program:

from xml.sax import saxexts,saxlib
class ExcHandler(saxlib.HandlerBase):
    def startElement(self, name, attrs):
        raise SystemError

import StringIO
h = ExcHandler()
p=saxexts.XMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat")
p.setDocumentHandler( h )
p.parseFile( StringIO.StringIO("<anything>data</anything>") )

Notice that the startElement method raises an exception.  Run the
above code, and it quietly runs to completion:

[amk@mira xbel]$ python t.py
[amk@mira xbel]$

Change it to use another parser, such as xmllib, and you get an
exception:

Traceback (innermost last):
  File "t.py", line 11, in ?
    p.parseFile( StringIO.StringIO("<anything>data</anything>") )
   ... lots of stack frames deleted ...
  File "t.py", line 5, in startElement
    raise SystemError
SystemError

It looks to me as if, in the event of an exception being raised from a
handler, there's no way to tell the Expat parser "Hey!  That handler
didn't work, so stop parsing!", and the handlers keep getting called,
the exception being discarded somewhere.  I came across this when
debugging my XBEL reading code; I had written self.add_folder instead
of self.bms.add_folder, but never saw the AttributeError exception
that would have pointed out the problem.  Obviously this is a bad
thing when debugging code and the Expat module is selected as the
parser.

This seems like a glaring flaw in Expat, that there's no way to end
parsing prematurely.  Has anyone told James Clark about this?  Failing
a change to Expat, an apparent fix would be to add "if
(PyErr_Occurred()) return;" to all the handler functions in pyexpat.c,
in order to do nothing.  However, I tried this, and the exception
still is never raised.

What's confusing me is: why is the exception just vanishing?  I
couldn't find an 'except:' responsible, or a PyErr_Clear() in
pyexpat.c.  Anyone got any clues? 

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
May you go safe, my friend, across that dizzy way / No wider than a hair, by
which your people go / From earth to Paradise; may you go safe today / With
stars and space above, and time and stars below.
    -- Lord Dunsany


From fermigie@math.jussieu.fr  Mon Sep 14 14:26:41 1998
From: fermigie@math.jussieu.fr (Stefane Fermigier)
Date: Mon, 14 Sep 1998 15:26:41 +0200
Subject: [XML-SIG] Fun with the DOM.
In-Reply-To: <3512F83B.6A7F88FA@technologist.com>; from Paul Prescod on Fri, Mar 20, 1998 at 06:14:04PM -0500
References: <wkwwdpqbrk.fsf@ifi.uio.no> <3512F83B.6A7F88FA@technologist.com>
Message-ID: <19980914152641.A7042@riemann.math.jussieu.fr>

Hi,

I had some fun with my own implementation of the DOM yesterday.  I made a
toy linuxdoc -> LaTeX transformation engine using ideas from my somewhat
clumsy SGML -> SGML transformer. Basically, the idea is that having
a tree in memory is nice because you can transform nodes in a bottom
-> up fashion (assuming your tree grows downwards, like they usually
do), i.e. you transform the children first then pass the result to the
parents. This is much simpler than an event based transformer (a la
ASP), where you call start_XXX and end_XXX for each node traversed.

The program uses my outdated DOM core implementation along with my
completely bogus ESIS -> DOM builder, and it doesn't implement the
full DTD. You have to have nsgmls or a similar tool to use my program.

The program, which is just a toy but I find rather nice, is available
at the URL: <http://www.math.jussieu.fr/~fermigie/python/dom/sg2tex>.

There is an open issue regarding really complex tranformation, like
transposing a table for instance.

Cheers,

	S.

-- 
Stéfane Fermigier, MdC à l'Université Paris 7. Tel: 01.44.27.61.01 (Bureau).
<www.math.jussieu.fr/~fermigie/>, <www.aful.org>, <www.linux-center.org>. 
"Without hardware memory protection, the machine-dependent actions taken
after an arror can cause a machine crash [...]. (MacIntosh users experience
this problem on a daily basis)." Adrew K. Wright.


From larsga@ifi.uio.no  Tue Sep 15 09:42:37 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Tue, 15 Sep 1998 10:42:37 +0200 (MET DST)
Subject: [XML-SIG] xmlproc: Version 0.52 released!
Message-ID: <199809150842.KAA01625@ifi.uio.no>

I've just released version 0.52 of xmlproc. The main improvements are:

 - Major speed increase for well-formedness parsing (parse down from 50 
   seconds to 30 seconds on my benchmark suite), and definite improvements
   for validating parsing as well.
 - Error reporting improved. Better error messages, and support for error 
   messages in different languages.
 - xvcmd.py option interpretation improved (-l and -o options added)
 - Numerous minor parse bug fixes
 - Some API extensions:
   - CatalogManager.get_public_ids() method added
   - DTD.get_elements() method added
   - Parser.set_error_language() method added
   - optional bufsize argument added to Parser.parse_resource()

Because of the speed increase all xmlproc users are recommended to upgrade
to the new version. xmlproc is now nearly twice as fast as xmllib when not
validating and when validating it is also faster unless the DTD is very
large compared to the document.

The error reporting improvement means that you can now get error messages
in Norwegian and English. If anyone wants to add support for more languages
they are encouraged to do so. The error messages are in the errors.py file.
There is an API for plugging in new languages, but this is still prototypical
and so is not documented yet. 

The API extensions were made because someone (Michael Sobolev, thanks 
Michael) requested them. If you have any wishes in that direction, please 
let me know, and I'll see what I can do.

I'm thinking of adding a list of other programs that use xmlproc to the
xmlproc home page. If anyone knows of such a program please email me so I
can add it.

--Lars M.


From bwaumg@urc.tue.nl  Tue Sep 15 13:47:55 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Tue, 15 Sep 1998 14:47:55 +0200
Subject: [XML-SIG] xmlproc and Docbook XML DTD
Message-ID: <199809151247.OAA00361@asterix.urc.tue.nl>

Hi,

I just installed the newest xmlproc. I'm trying to get it to validate
with Norman Walsh's Docbook XML DTD (db3xml10.dtd). 

It looks as if xmlproc has problems with some of the parameter
entities.

Right out of the box it reports:

ERROR: One of 'IGNORE' or 'INCLUDE' expected at db3xml10.dtd:32:4
TEXT: '%ISOamsa.m'

The whole parameter name is %ISOamsa.module

After removing a series of these because they were ignored anyway
it got to:

dbpoolx.mod:137:4
TEXT: '%dbpool.re'

It's whole name is dbpool.redecl.module

After replacing the section that uses this PE with IGNORE it finally
came to this traceback (only last one showed):

File "C:\Python\site\xml\parsers\xmlproc\xmlproc.py", line 897, in
parse_pe_ref
   self.report_error(3038,name)
NameError: name

It seems as if these longer PE's aren't parsed properly (I looked at
the regexps used but that seems ok).

Is there a restriction on their length?

Or is there anyone who succeeded in using this DTD unmodified?

Marc

---
Marc van Grootel
bwaumg@urc.tue.nl


From larsga@ifi.uio.no  Tue Sep 15 14:05:54 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: Tue, 15 Sep 1998 15:05:54 +0200
Subject: [XML-SIG] xmlproc and Docbook XML DTD
In-Reply-To: <199809151247.OAA00361@asterix.urc.tue.nl>
Message-ID: <3.0.1.32.19980915150554.006b0ab8@ifi.uio.no>

* Marc van Grootel
>
>I just installed the newest xmlproc. I'm trying to get it to validate
>with Norman Walsh's Docbook XML DTD (db3xml10.dtd). 
>
>It looks as if xmlproc has problems with some of the parameter
>entities.
>
>Right out of the box it reports:
>
>ERROR: One of 'IGNORE' or 'INCLUDE' expected at db3xml10.dtd:32:4
>TEXT: '%ISOamsa.m'

This is because xmlproc does not support parameter entities inside
declarations yet. xmlproc 0.52 was releases now because I wanted to
start working on this now and expected that to take a while. For now
I'm afraid you'll have to normalize the DTD before you use it. 

>File "C:\Python\site\xml\parsers\xmlproc\xmlproc.py", line 897, in
>parse_pe_ref
>   self.report_error(3038,name)
>NameError: name

"/)"(#"#!#

This is a bug. I'll fix it tomorrow and post a fix on Thursday.

>It seems as if these longer PE's aren't parsed properly (I looked at
>the regexps used but that seems ok).

Part of the reason why 0.52 is so much faster than 0.51 is that it
no longer uses regexps to parse names. Anyway, I don't think this is
the problem, it looks as though the 'name' variable for some reason
isn't initialized.

>Is there a restriction on their length?

No.

--Lars M.


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Sep 15 14:28:40 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 15 Sep 1998 09:28:40 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD
In-Reply-To: <199809130031.CAA15303@asterix.urc.tue.nl>
References: <199809130031.CAA15303@asterix.urc.tue.nl>
Message-ID: <13822.27528.568968.331881@weyr.cnri.reston.va.us>

--y/dsR0gwNl
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit


  I started working on Grail support for XBEL last night, and would
like to suggest a small change to the DTD.
  There doesn't appear any reason to make the info element or its
children required, so I suggest all three be made optional.  The
machine name, in particular, does not appear to be very useful.  I can 
also envision shared-bookmarks applications where the owner may vary
from folder to folder, so I'd also allow info within each folder.
Allowing it in the folder makes the outermost info superfluous; an
info within the outermost folder would work just fine.  (I'll leave
the folder inside the xbel element, since there may be good reason for 
adding things outside the folder in some applications.)  I've attached
the modified DTD below.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


--y/dsR0gwNl
Content-Type: text/xml
Content-Description: Revised XBEL DTD.
Content-Disposition: inline;
	filename="xbel.dtd"
Content-Transfer-Encoding: 7bit

<!ELEMENT xbel     (folder)>
<!ATTLIST xbel
            version    CDATA   #IMPLIED
>

<!-- contents of info needs some more thought. Adding a meta    -->
<!-- element (like in HTML) makes this open-ended               -->

<!ELEMENT info    (owner?,machinename?)>
<!ELEMENT owner       (#PCDATA)>
<!ELEMENT machinename (#PCDATA)>


<!ELEMENT folder   (title, info?, desc?, (bookmark|folder|separator|alias)+)>
<!ATTLIST folder
            id       ID       #IMPLIED
            added    CDATA    #IMPLIED 
            folded   (yes|no) 'yes'
>

<!ELEMENT bookmark (title,desc?,url)>
<!ATTLIST bookmark
            id       ID       #IMPLIED
            added    CDATA    #IMPLIED
>

<!ELEMENT title      (#PCDATA)>
<!ELEMENT desc       (#PCDATA)>
<!ELEMENT url        (#PCDATA)>
<!ATTLIST url
            visited  CDATA    #IMPLIED
            modified CDATA    #IMPLIED
            response CDATA    #IMPLIED
            checked  CDATA    #IMPLIED
>


<!ELEMENT separator EMPTY>

<!ELEMENT alias EMPTY>
<!ATTLIST alias
            ref       IDREF    #REQUIRED  
>

--y/dsR0gwNl--


From bwaumg@urc.tue.nl  Tue Sep 15 16:10:13 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Tue, 15 Sep 1998 17:10:13 +0200
Subject: [XML-SIG] XBEL DTD
Message-ID: <199809151510.RAA08403@asterix.urc.tue.nl>

Hi,

I was well on my way of suggesting some changes too. Some of them
are the same as suggested by Fred.

I'm working on yet of few other suggestions. Which i'll post about in
a later mail since they need some more explanation and I would like
your opinions.

In addition to the changes suggested by Fred i want to suggest the
following:

  - make the 'info' element contain zero or more 'meta' tags, this
    way we don't have to fight too much about how to name them
    and can add new ones without breaking the DTD

  - also allow 'info' in a bookmark so extra data can be associated
    with a single bookmark

  - drop the 'title' for bookmarks and put that into 'url' and
    put the href itself back into the 'url' as an 'href' attribute.

    So instead of:

      <title>My link</title>
      <url>http://foo</url>

    Write:

      <url href="http://foo">My link</url>

    In the HTML3.2 DTD there's the following quote:

    "The term URL means a CDATA attribute whose value is a Uniform
    Resource Locator, See RFC1808 (June 95) and RFC1738 (Dec 94)."
 
    Putting the href out into element content as #PCDATA seems to
    broad to me.

  - make the 'title' optional too

  - instead of a bookmark a bare 'url' element should be allowed
    too. This should be considered a bookmark without a description
    and/or info.

  - allowing bookmark,url,alias etc directly under xbel

The three last suggestions may seem to come from out of the blue but
they have to do with the use of XBEL as a meta-dtd (see next post) and
the ability to extract XBEL from almost arbitrary XML documents.

I'll post my current DTD next too. Then it's about time I think to
reach a consensus and freeze it so Fred can go on ;)

>   I started working on Grail support for XBEL last night, and would
> like to suggest a small change to the DTD.
>   There doesn't appear any reason to make the info element or its
> children required, so I suggest all three be made optional.

Yup, I did that too already.

> The
> machine name, in particular, does not appear to be very useful.  I can
> also envision shared-bookmarks applications where the owner may vary    
> from folder to folder, so I'd also allow info within each folder.

I'm not sure but maybe the reason Mark suggested it was so a processor
could make certain assumptions. Maybe platform is a better name
anyway.

I even thought about adding optional info to a single bookmark.

There's a problem lurking with shared-bookmarks though. Currently id
is declared as ID. This means a valid document must have unique
id's. Merging different bookmark files may break that constraint.

A possible solution would be to rename id's when merging bookmark
files. An easy way out would be to declare id as CDATA but that's
not a real solution.

> Allowing it in the folder makes the outermost info superfluous; an
> info within the outermost folder would work just fine.  (I'll leave
> the folder inside the xbel element, since there may be good reason for
> adding things outside the folder in some applications.)  I've attached
> the modified DTD below.

I would like to throw out owner and machine in favor of a generic htmlish
meta tag.

  <info>
    <meta name="owner" content="me">
  </info>

If owner and/or machine name are needed you could claim a standard
meta tag with owner, machine or whatever your application needs. This
way all info could be processed by the same code.

Another use for a meta tag is for adding keywords. When these are
preserved it's trivial to put a HTML version on the web and have it
handled properly by the search bots.

Bye

Marc
--
Marc van Grootel
bwaumg@urc.tue.nl


From bwaumg@urc.tue.nl  Tue Sep 15 16:34:58 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Tue, 15 Sep 1998 17:34:58 +0200
Subject: [XML-SIG] XBEL DTD as a meta-dtd
Message-ID: <199809151534.RAA09232@asterix.urc.tue.nl>

Hi,

This post became rather long (my DTD is at the bottom).

I did some experimenting with Geir's xmlarch.py and it works nicely
(once you update to the newest sax stuff).

I made this effort because I thought XBEL could be used in a Website
management tool that checks external links and reports on them
(something like linbot). Such a processor could store information
about the links in the 'info' elements and the id's could refer back
to the original XML document.

In order for XBEL to function as a meta-DTD I needed to loosen some
restrictions and make a few changes to the XBEL DTD. With these
changes it is possible to derive XBEL from many XML documents just by
specifying how the mapping has to take place. This architectural
processing is standardized (annex A.3 of ISO/IEC 10744:1997) so I
could use other architectural engines to do the same (for example XAF
by David Megginson). No coding of specialized XML processors
needed. The XBEL is like a virtual document automatically derived from
the XML source.

For some more examples and explanations look at the documentation for
XAF (http://www.megginson.com/XAF).

Thanks, Geir for making this possible in Python.

At the end I included the xbel dtd as I use it now. Maybe we could
reach a consensus. The DTD is looser now which makes processing it a
little more difficult. Processors that output XBEL are not affected
much since they could always output a more restricted form of XBEL but
it would be nice if a processor that reads XBEL could cope with the
looser XBEL DTD.

Here's an example of two simplified XML fragments:

  <tei>
    <div1><head>Chapter 1</head>
      <p>This is <xref href="a">A</xref>.</p>
      <p>This is <xref href="b">B</xref>.</p>
      <div2><head>Chapter 2</head>
        <p>This is <xref href="c">C</xref>.</p>
      </div2>
    </div1>
  </tei>

[This is not real TEI since it lacks an easy way to refer to an url]

  <book><title>My Book</title>
    <chapter><title>Chapter 1</title>
      <para>This is <ulink url="a">A</ulink>.</para>
      <para>This is <ulink url="b">B</ulink>.</para>
    </chapter>
    <chapter><title>Chapter 2</title>
      <para>This is <ulink url="c">C</ulink>.</para>
    </chapter>
  </book>

Obviously there are some structural differences. Also, in the first a
paragraph is called 'p' in the other 'para', in the first a chapter
is called 'div1' and in the other 'chapter'.

With architectural forms you can extract a structured list of url's
from both of these without creating a separate processor for
each. Just specify how the derivation should work and process the
document with an architectural forms processor (like xmlarch.py).

To show how that works I used the 'book' example:

Here's the complete document:

<?xml version='1.0'?>
<?IS10744:arch name="xbel"
               auto="nArcAuto"
               renamer-att="xbel-atts"
               dtd-system-id="xbel.dtd"
               suppressor-att="suppress"
               ignore-data-att="ignore"
?>
<!DOCTYPE book SYSTEM "db3xml10.dtd" [

  <!ATTLIST title   xbel NMTOKEN "title">
  <!ATTLIST chapter xbel NMTOKEN "folder"
                    xbel-atts NMTOKENS ""
  >
  <!ATTLIST ulink   xbel NMTOKEN "url"
                    xbel-atts NMTOKENS "url href baz #DEFAULT"
                    ignore NMTOKEN "nArcIgnD"
  > 
  <!ATTLIST para    suppress NMTOKEN "sArcNone">
]>

<book><title>My Book</title>

  <chapter id="ch1">
    <title>Chapter 1</title> 

    <para>This is <ulink id="A101" url="a"><acronym>
      <emphasis>A</emphasis></acronym></ulink></para>

    <para>This is <ulink id="A123" 
      url="b">B</ulink></para>

  </chapter>

  <chapter id="ch2">
    <title>Chapter 2</title>

    <paxa>This is <ulink id="A23" url="c">C</ulink></para>

  </chapter>
</book>

Feeding this to xmlarch.py results in the following architectural (or
virtual) document:

<xbel><title>My Book</title>

  <folder id="ch1">
    <title>Chapter 1</title> 

    <url href="a" id="A101">A</url>

    <url href="b" id="A123">B</url>

  </folder>

  <folder id="ch2">
    <title>Chapter 2</title>

    <url href="c" id="A23">C</url>

  </folder>
</xbel>

As you can see xmlarch.py derived the xbel document from the book
document. The chapter element's are changed to folder's. The ulink's
are changed to url's and every url attribute is changed to a href
attribute. It also stripped the elements inside the first ulink.

If we want to use XBEL to work as a meta-dtd for doing these kinds of
things some changes to the DTD are in order. Architectural forms can
do many things but they cannot completely reorder the original
document so the XBEL DTD (meta DTD) and the XML DTD used (client DTD)
need to have some structural similarities.

=========== my current XBEL DTD ================

<!ELEMENT xbel     (title?,info?, (bookmark|folder|url|alias|separator)*)>
<!ATTLIST xbel
            version CDATA   #IMPLIED
>

<!--=================== Info block ================================-->

<!ELEMENT info    (meta)*>

<!ELEMENT meta    EMPTY>
<!ATTLIST meta
            name    CDATA #REQUIRED
            content CDATA #REQUIRED
>


<!--=================== Folder ====================================-->

<!ELEMENT folder   (title?,info?,desc?,(bookmark|folder|separator|alias|url)*)>
<!ATTLIST folder
            id       ID       #IMPLIED
            added    CDATA    #IMPLIED
            folded   (yes|no) 'yes'   
>

<!--=================== URL ======================================-->

<!ELEMENT url        (#PCDATA)>
<!ATTLIST url
            id       ID       #IMPLIED
            href     CDATA    #REQUIRED
            added    CDATA    #IMPLIED
            visited  CDATA    #IMPLIED
            modified CDATA    #IMPLIED
            response CDATA    #IMPLIED
            checked  CDATA    #IMPLIED
>

<!--=================== Bookmark ==================================-->
<!-- a wrapper around an url when it has to contain extra info
     like a description and info

-->
<!ELEMENT bookmark (info?,url,desc?)>

<!ELEMENT desc       (#PCDATA)>

<!--=================== Separator =================================-->

<!ELEMENT separator EMPTY>

<!--=================== Alias =====================================-->

<!ELEMENT alias EMPTY>
<!ATTLIST alias
            ref       IDREF    #REQUIRED
>


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Sep 15 17:19:37 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 15 Sep 1998 12:19:37 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD
In-Reply-To: <199809151510.RAA08403@asterix.urc.tue.nl>
References: <199809151510.RAA08403@asterix.urc.tue.nl>
Message-ID: <13822.37785.563540.803670@weyr.cnri.reston.va.us>

Marc van Grootel writes:
 >   - make the 'info' element contain zero or more 'meta' tags, this
 >     way we don't have to fight too much about how to name them
 >     and can add new ones without breaking the DTD

  I'm OK with this; I would use a slightly different structure than
you did in the example you give farther down in your post.  I'd
prefer:

	<!ELEMENT meta (#PCDATA)>
	<!ATTLIST meta name CDATA #REQUIRED>

so the markup would be:

	<meta name="frobnitz">my data</meta>

 >   - also allow 'info' in a bookmark so extra data can be associated
 >     with a single bookmark

  OK.

 >   - drop the 'title' for bookmarks and put that into 'url' and
 >     put the href itself back into the 'url' as an 'href' attribute.

  I'm not sure I really like this; for a bookmarks list, the URL
itself really is content.

 >     In the HTML3.2 DTD there's the following quote:
...

  The quoted discussion seems very specific to the HTML spec, and is
not general.  If there's some relevant context I'm missing, please
quote that as well.

 >   - make the 'title' optional too
 > 
 >   - instead of a bookmark a bare 'url' element should be allowed
 >     too. This should be considered a bookmark without a description
 >     and/or info.
 >

 >   - allowing bookmark,url,alias etc directly under xbel

  OK, OK, OK.

I said:
 > > The
 > > machine name, in particular, does not appear to be very useful.  I can
 > > also envision shared-bookmarks applications where the owner may vary    
 > > from folder to folder, so I'd also allow info within each folder.

And Marc replied:
 > I'm not sure but maybe the reason Mark suggested it was so a processor
 > could make certain assumptions. Maybe platform is a better name

  Maybe.  I don't know what sort of assumptions would be reasonable
unless it could be used to check the availability of file: URLs.

 > There's a problem lurking with shared-bookmarks though. Currently id
 > is declared as ID. This means a valid document must have unique
 > id's. Merging different bookmark files may break that constraint.

  Generating new IDs on an as-needed basis would be the best
solution for merging, with the option of it being treated as an error
also being available, but this does not affect the shared bookmarks
application I was thinking of.
  I was thinking of a single xbel instance being access simultaneously 
by several users (presumably through a server of some sort), and all
actions on the instance could be immediately reflected in each user's
UI.  This could be useful in maintaining links shared reference
material.  The info element could be used to store access control
information, approved/unapproved flags, etc.

 > Another use for a meta tag is for adding keywords. When these are
 > preserved it's trivial to put a HTML version on the web and have it
 > handled properly by the search bots.

  And can be used to improve searching by the browser; as the user
visits various pages, any keywords or other useful meta information
could be pulled into the bookmarks database and (optionally)
automatically updated on future visits.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From Jack.Jansen@cwi.nl  Tue Sep 15 22:26:29 1998
From: Jack.Jansen@cwi.nl (Jack Jansen)
Date: Tue, 15 Sep 1998 23:26:29 +0200
Subject: [XML-SIG] XBEL DTD
In-Reply-To: Message by "Fred L. Drake" <fdrake@cnri.reston.va.us> ,
 Tue, 15 Sep 1998 12:19:37 -0400 (EDT) , <13822.37785.563540.803670@weyr.cnri.reston.va.us>
Message-ID: <UTC199809152126.XAA11016.jack@snelboot.cwi.nl>

Okay, if we're all putting in requests for our pet feature in the XBEL 
DTD I have one too: I'd like an empty ("pass", in Python terms)
element, with only an id attribute.

With this it should be possible to do two-way-syncing of bookmark
files between machines. If each machine generates IDs in a unique
manner, and replaces items (or folders or whatever) with the pass item 
when you delete them you win something important: on the next sync you 
can differentiate between the situation where the bookmark was deleted 
on machine A or added on machine B.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@cwi.nl      | ++++ if you agree copy these lines to your sig ++++
http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From gstein@lyra.org  Tue Sep 15 22:34:51 1998
From: gstein@lyra.org (Greg Stein)
Date: Tue, 15 Sep 1998 14:34:51 -0700
Subject: [XML-SIG] XBEL DTD
References: <199809151510.RAA08403@asterix.urc.tue.nl>
Message-ID: <35FEDD7B.551CCDBC@lyra.org>

Marc van Grootel wrote:
> ...
> I would like to throw out owner and machine in favor of a generic htmlish
> meta tag.
> 
>   <info>
>     <meta name="owner" content="me">
>   </info>
> 
> If owner and/or machine name are needed you could claim a standard
> meta tag with owner, machine or whatever your application needs. This
> way all info could be processed by the same code.
>...

This is kind of silly. XML is intended to encode the "name" as the
actual tag. Why push this down another level? Using an "owner" tag, you
can extract this information directly from the parse tree. Using a
"meta" tag like above, now the software has to iterate through the meta
tags looking for the information.

XML is enough of an abstraction; you don't want to start creating
additional layers in there. The tendency should be towards additional
tags and less "control" type elements. It does not hurt anything to
specify an optional tag, yet it can make many things easier.

-g

--
Greg Stein (gstein@lyra.org)


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Sep 15 22:53:50 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 15 Sep 1998 17:53:50 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD
In-Reply-To: <UTC199809152126.XAA11016.jack@snelboot.cwi.nl>
References: <fdrake@cnri.reston.va.us>
 <13822.37785.563540.803670@weyr.cnri.reston.va.us>
 <UTC199809152126.XAA11016.jack@snelboot.cwi.nl>
Message-ID: <13822.57838.776721.726280@weyr.cnri.reston.va.us>

Jack Jansen writes:
 > Okay, if we're all putting in requests for our pet feature in the XBEL 
 > DTD I have one too: I'd like an empty ("pass", in Python terms)
 > element, with only an id attribute.

  Your rational is good; I'll go for it.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From gstein@lyra.org  Tue Sep 15 23:00:12 1998
From: gstein@lyra.org (Greg Stein)
Date: Tue, 15 Sep 1998 15:00:12 -0700
Subject: [XML-SIG] XBEL DTD as a meta-dtd
References: <199809151534.RAA09232@asterix.urc.tue.nl>
Message-ID: <35FEE36C.A9181E3@lyra.org>

I'd highly reocmmend using a different DTD for generic URL extraction.
XBEL is for _bookmark_ representation. The nice thing about XML is the
ability to use multiple DTDs as necessary. XML is also supposed to
convey structured information; the more generic it becomes, the less
useful XML becomes.

While on this point, somebody should establish the XBEL DTD somewhere
(XML SIG page?) so that people can refer to it with a namespace
declaration, then augment their tags with the namespace. For example:
<xbel:info ...>

-g

--
Greg Stein (gstein@lyra.org)


From akuchlin@cnri.reston.va.us  Tue Sep 15 23:40:23 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Tue, 15 Sep 1998 18:40:23 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD as a meta-dtd
In-Reply-To: <35FEE36C.A9181E3@lyra.org>
References: <199809151534.RAA09232@asterix.urc.tue.nl>
 <35FEE36C.A9181E3@lyra.org>
Message-ID: <13822.60371.108025.289138@amarok.cnri.reston.va.us>

Greg Stein writes:
>I'd highly recommend using a different DTD for generic URL extraction.
>XBEL is for _bookmark_ representation. The nice thing about XML is the

	Agreed; it seems to complicate XBEL, more than seems necessary 
for a fairly simple application like maintaining a bookmark file.

>While on this point, somebody should establish the XBEL DTD somewhere
>(XML SIG page?) so that people can refer to it with a namespace
>declaration, then augment their tags with the namespace. For example:
><xbel:info ...>

	Good point, but I'm not sure at what URI it should live.
python.org/sigs/xml-sig/ isn't permanent; SIGs are supposed to die
when they've fulfilled their purpose, and the XML-SIG will probably do
so eventually.  That leaves somewhere in /topics/xml/; perhaps
/topics/xml/DTD/ can be used for such DTDs.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
prompt. n. (Unix) A symbol on the screen indicating which shell is attacking
you.
    -- Stan Kelly-Bootle, _The Computer Contradictionary_


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Sep 16 13:49:19 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 16 Sep 1998 08:49:19 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD as a meta-dtd
In-Reply-To: <13822.60371.108025.289138@amarok.cnri.reston.va.us>
References: <199809151534.RAA09232@asterix.urc.tue.nl>
 <35FEE36C.A9181E3@lyra.org>
 <13822.60371.108025.289138@amarok.cnri.reston.va.us>
Message-ID: <13823.46031.723233.700011@weyr.cnri.reston.va.us>

Greg Stein writes:
 > I'd highly recommend using a different DTD for generic URL extraction.
 > XBEL is for _bookmark_ representation. The nice thing about XML is the

Andrew M. Kuchling writes:
 > 	Agreed; it seems to complicate XBEL, more than seems necessary 
 > for a fairly simple application like maintaining a bookmark file.

  I don't think the proposed DTD is too complicated, but it probably
shouldn't get much more complicated.  Jack's "pass" element makes
sense and should be added since it directly related to bookmark
management within applications like Grail.

 > python.org/sigs/xml-sig/ isn't permanent; SIGs are supposed to die
 > when they've fulfilled their purpose, and the XML-SIG will probably do
 > so eventually.  That leaves somewhere in /topics/xml/; perhaps
 > /topics/xml/DTD/ can be used for such DTDs.

  This last variant is almost exactly what I'm spitting out from
Grail; the only difference is that I spelled "DTD" as "dtds" (take
your pick for capitalization, but I think the plural makes sense).


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From akuchlin@cnri.reston.va.us  Wed Sep 16 15:14:23 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed, 16 Sep 1998 10:14:23 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD as a meta-dtd
In-Reply-To: <13823.46031.723233.700011@weyr.cnri.reston.va.us>
References: <199809151534.RAA09232@asterix.urc.tue.nl>
 <35FEE36C.A9181E3@lyra.org>
 <13822.60371.108025.289138@amarok.cnri.reston.va.us>
 <13823.46031.723233.700011@weyr.cnri.reston.va.us>
Message-ID: <13823.50440.29860.876752@amarok.cnri.reston.va.us>

Fred L. Drake writes:
>  I don't think the proposed DTD is too complicated, but it probably
>shouldn't get much more complicated.  Jack's "pass" element makes
>sense and should be added since it directly related to bookmark
>management within applications like Grail.

	I was agreeing more with Greg's reaction to turning it into a
meta-DTD.  The basic problem of expressing a bookmark file is fairly
simple, and the DTD should also be fairly simple.  It's nice to keep
it at the level of complexity where people (such as me) say "Oh, that
looks neat; I'll take an hour and implement it" instead "Gosh, that
looks awfully complicated; I'll pull the covers over my head and hope
it goes away."  In addition, the XBEL code as-is makes an excellent
set of sample programs for the Python/XML package.

> I wrote:
> > so eventually.  That leaves somewhere in /topics/xml/; perhaps
> > /topics/xml/DTD/ can be used for such DTDs.
>
>  This last variant is almost exactly what I'm spitting out from
>Grail; the only difference is that I spelled "DTD" as "dtds" (take
>your pick for capitalization, but I think the plural makes sense).

	Good suggestion, though I tend to read the URL components as
qualifiers, not categories, and hence usually go for the singular:
"dtd" instead of "dtds". Anyway, there's now a page for them at:
http://www.python.org/topics/xml/dtds/

	Add xbel.dtd to the end of that URL to download the DTD; you
can use this in namespace declarations.  When I get time, I'll
probably add the DTD used by the xml.marshal function to that page as
well (unless xml.marshal is obsoleted by Lotos or some other DTD).
This isn't going to be a massive collection of DTDs, just a stable
home for any DTDs that originate within the Python community.

	(The XBEL DTD used is the original one.  When we can settle on
a final version of the DTD, I'll update it.)

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
What can I wish to the youth of my country who devote themselves to
science?... Thirdly, passion. Remember that science demands from a man all his
life. If you had two lives that would not be enough for you. Be passionate in
your work and in your searching.
    -- Ivan Pavlov


From grove@infotek.no  Wed Sep 16 15:46:50 1998
From: grove@infotek.no (Geir Ove Gronmo)
Date: Wed, 16 Sep 1998 16:46:50 +0200
Subject: [XML-SIG] XBEL DTD as a meta-dtd
In-Reply-To: <13823.50440.29860.876752@amarok.cnri.reston.va.us>
References: <13823.46031.723233.700011@weyr.cnri.reston.va.us>
 <199809151534.RAA09232@asterix.urc.tue.nl>
 <35FEE36C.A9181E3@lyra.org>
 <13822.60371.108025.289138@amarok.cnri.reston.va.us>
 <13823.46031.723233.700011@weyr.cnri.reston.va.us>
Message-ID: <199809161445.QAA14942@mail.infotek.no>

At 10:14 16.09.98 -0400, you wrote:
>Fred L. Drake writes:
>>  I don't think the proposed DTD is too complicated, but it probably
>>shouldn't get much more complicated.  Jack's "pass" element makes
>>sense and should be added since it directly related to bookmark
>>management within applications like Grail.
>
>	I was agreeing more with Greg's reaction to turning it into a
>meta-DTD.  

All DTDs can be meta-DTDs (architectural DTDs). The complexity of the DTD
doesn't really matter. The only thing that might be "harder" - is the
mapping from the instance (which is to be architecturally processed) to the
meta-dtd.

So whether anybody calls XBEL a meta-dtd, or not, doesn't matter.

Geir O.

 ==================  Geir Ove Grønmo  ==================
|  STEP Infotek as, Gjerdrumsvei 12, 0486 Oslo, Norway  |
|        grove@infotek.no http://www.infotek.no/        |
 -------------------------------------------------------


From grove@infotek.no  Wed Sep 16 15:58:02 1998
From: grove@infotek.no (Geir Ove Gronmo)
Date: Wed, 16 Sep 1998 16:58:02 +0200
Subject: [XML-SIG] xmlarch: Version 0.11 released
Message-ID: <199809161456.QAA15228@mail.infotek.no>

xmlarch.py: An XML architectural forms processor written in Python

Version:  0.11
Author:   Geir Ove Grønmo
Email:    grove@infotek.no
Released: September 15th 1998

Homepage: http://www.infotek.no/~grove/software/xmlarch/index.html

---

What is xmlarch.py?

The xmlarch.py module contains an XML architectural forms processor written 
in Python. It allows you to process XML architectural forms using any 
parser that uses the SAX interfaces. The module allow you to process 
several architectures in one parse-pass. Architectural document events 
for an architecture can even be broadcasted to multiple DocumentHandlers. 

What's new?

There are no new features in this release. The module should now be placed 
in the xml.arch package. The demo tools have been updated to support the 
new package structure.

Problem with <?IS10744 arch ...?> not being recognized as an architecture 
use declaration is now fixed. Now both <?IS10744:arch ...?> and 
<?IS10744 arch ...?> are supported.

get_bridge_form() was called get_bridge_elem_form() a couple of places. This 
is now fixed.

---

Enjoy!

Geir Ove Grønmo


From bwaumg@urc.tue.nl  Wed Sep 16 16:02:50 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Wed, 16 Sep 1998 17:02:50 +0200
Subject: [XML-SIG] XBEL DTD as a meta-dtd
Message-ID: <199809161502.RAA19327@asterix.urc.tue.nl>

Hi,

So the consensus is, more or less, that 'less is more'. I can agree
to that. My experiment with architectural forms may have led me to far
from the goals of XBEL. I agree with Greg that such URL extraction is
better left to another DTD.

So the scope is 'hierarchical storage for bookmarks'?. But is
a lossless conversion between XBEL and Netscape still a goal? If not
I think that 'separator' should go since it serves no real purpose. 

Even with something that looks so simple there are some important
issues which show up mostly after people start implementing
applications with it. In my opinion we need an escape-hatch to provide
for some of these yet unknown applications. This is what I had in mind
with:

  <info>
    <meta name=".." content="..">
    ...
  </info>

Or as Fred suggested:

  <info>
    <meta name="..">...</meta>
    ...
  </info>

Greg Stein wrote:

> This is kind of silly. XML is intended to encode the "name" as the
> actual tag. Why push this down another level? Using an "owner" tag, you
> can extract this information directly from the parse tree. Using a
> "meta" tag like above, now the software has to iterate through the meta
> tags looking for the information.
>
> XML is enough of an abstraction; you don't want to start creating
> additional layers in there. The tendency should be towards additional
> tags and less "control" type elements. It does not hurt anything to
> specify an optional tag, yet it can make many things easier.

I think it can extend the life-time of the DTD. Maybe then at a later
stage common conventions could make it into the DTD as an explicit
element. This situation is better then defining only a few explicit
elements for info which can lead to tag-abuse by different authors and
applications. These catch-all mechanisms are not uncommon and I don't
think they violate the idea of XML. I rather like one well-crafted DTD
then having multiple DTD's with only minor differences.

If info like 'owner' is so important that it should be declared
explicitly it can also be an (optional) attribute of the elements to
which it belongs (folder and bookmark).

As to the form of the meta element:

Maybe the 'name' attribute should be declared as NMTOKEN to restrict
it to a name token.  With <meta name="..">my data</meta> the content is
#PCDATA so if there are certain characters in the data they should be
encoded ('<' => '&lt;' etc.). For a 'content' attribute things like '<'
and '>' can stay as they are (but watch out for '&' -- see below). 

Where to put the URL's?

Although it may seem like nitpicking I think it is not.

One of the reasons for putting the url itself in an attribute would be
the stricter constraints of CDATA and being able to make it
#REQUIRED. As element content the parser cannot check if the element
really contains a value at all since:

  <url></url> will look ok to the parser.

There's another reason though. 

I looked through my bookmark list and there were several url's that
looked like:

  http://someserver/somepage.html&var=x

A parser will complain when it sees this since '&' preceding a
name-character starts a general entity reference. Which is
probably not defined. Then it encounters the '=' which generates
a warning since a general entity should end with ';'.

I thought it would be safe to put the url in a CDATA attribute. Alas,
it turns out that even in a CDATA attribute a parser would still try
to resolve a general entity. In David Megginson's book (Structuring
XML Documents - p. 19) I found the following explanation:

  CDATA attribute type:

    Note that an attribute type applies to the value of the attribute
    *after* the attribute string has been normalized - general entities
    will still be recognized as part of that normalization process.

So, although I thought putting url's in a CDATA attribute is safe, it
is not. 

The solution might be to url-encode url's. So the above url
becomes:

  http:%3A%2F%2Fsomeserver%2Fsomepage.html%26var%3Dx

Hmmm. Not a pretty sight.

Maybe a structure like:

  <bookmark id=".." href=".." visited=".." ...>
    <title>..</title>
    <desc>..</desc>
  </bookmark>

is not so bad (maybe even with an optional info element?).

Finally, what about the main level? Forest or Tree?

  <xbel>
    <folder>..</folder>
    <folder>..</folder>
    <bookmark>..</bookmark>
  </xbel>

Or:

  <xbel>
    <folder>
      <folder>..</folder>
      <bookmark>..</bookmark>
    </folder>
  </xbel>

I like Fred's suggestion that in the latter an info element directly
under xbel (so outside a folder) could convey other info then the info
elements inside a folder (or maybe even a bookmark). Maybe this even
warrants naming that specific element differently ('header'?).

Do we have to fix a limit for the depth of recursion or should this be
left to every application. Maybe we should say that an XBEL
application should at least be able to handle a depth of x folders.


Marc
---
Marc van Grootel
bwaumg@urc.tue.nl


From bwaumg@urc.tue.nl  Wed Sep 16 16:13:59 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Wed, 16 Sep 1998 17:13:59 +0200
Subject: [XML-SIG] XBEL DTD as a meta-dtd
Message-ID: <199809161513.RAA19851@asterix.urc.tue.nl>

Geir wrote:
> 
> All DTDs can be meta-DTDs (architectural DTDs). The complexity of the DTD
> doesn't really matter. The only thing that might be "harder" - is the
> mapping from the instance (which is to be architecturally processed) to the
> meta-dtd.
> 
> So whether anybody calls XBEL a meta-dtd, or not, doesn't matter.

So although a DTD can always be used as a meta-dtd I thought it would
be nice if mapping from an instance to XBEL would be easy. But in the
end I agree that structuring the DTD so that this would be easy puts
too much strain on the design of XBEL. 


Marc

--
Marc van Grootel
bwaumg@urc.tue.nl


From grove@infotek.no  Wed Sep 16 16:37:25 1998
From: grove@infotek.no (Geir Ove Gronmo)
Date: Wed, 16 Sep 1998 17:37:25 +0200
Subject: [XML-SIG] XBEL DTD as a meta-dtd
In-Reply-To: <199809161513.RAA19851@asterix.urc.tue.nl>
Message-ID: <199809161536.RAA16351@mail.infotek.no>

At 17:13 16.09.98 +0200, Marc van Grootel wrote:
>So although a DTD can always be used as a meta-dtd I thought it would
>be nice if mapping from an instance to XBEL would be easy. 

Yes, I also think that the XBEL DTD should be kept simple.

>But in the
>end I agree that structuring the DTD so that this would be easy puts
>too much strain on the design of XBEL. 

This all depends on which DTDs you have in mind. 

Some factors that come to my mind:

 o The similarity of the structure in the instance and the meta-DTD

 o Mapping between element content to element content is much easier that
mapping from element content to attribute values and vice versa (This is
not yet implemented in xmlarch, but will be soon).

 o Reordering is not possible (as far as I know).

Geir O.


 ==================  Geir Ove Grønmo  ==================
|  STEP Infotek as, Gjerdrumsvei 12, 0486 Oslo, Norway  |
|        grove@infotek.no http://www.infotek.no/        |
 -------------------------------------------------------


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Sep 16 16:47:29 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 16 Sep 1998 11:47:29 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD as a meta-dtd
In-Reply-To: <199809161502.RAA19327@asterix.urc.tue.nl>
References: <199809161502.RAA19327@asterix.urc.tue.nl>
Message-ID: <13823.56721.237919.41358@weyr.cnri.reston.va.us>

Marc van Grootel writes:
 > So the scope is 'hierarchical storage for bookmarks'?. But is
 > a lossless conversion between XBEL and Netscape still a goal? If not
 > I think that 'separator' should go since it serves no real purpose. 

  I'd leave it in; making XBEL specific to bookmarking (rather than
bookmark extraction) does not mean that the requirement for supporting 
everything supported by Navigator and MSIE goes away.  If it can't do
that, it can't be effectively used as an interchange medium, which I
think it should.  (That's what the tools offered here provide, after
all.)  Jack's extension also makes sense within this context.  (It
needs a name, though, perhaps <jacksExtension/>? ;-)

Greg Stein wrote:
 > This is kind of silly. XML is intended to encode the "name" as the
 > actual tag. Why push this down another level? Using an "owner" tag, you
...
 > XML is enough of an abstraction; you don't want to start creating
 > additional layers in there. The tendency should be towards additional

Marc van Grootel wrote:
 > I think it can extend the life-time of the DTD. Maybe then at a later
 > stage common conventions could make it into the DTD as an explicit

  I think Greg has a very good point.  There's no reason that the
contents of <info> or <description> or <title> or anything else can't
be structured.  The instance remains a well-formed XBEL document and
can be down-converted to valid XBEL easily if required.

 > element. This situation is better then defining only a few explicit
 > elements for info which can lead to tag-abuse by different authors and

  Actually, the catch-all is a form of tag abuse, whereas introducing
new elements for specific applications is not.  This doesn't mean that 
there shouldn't be something like <meta>, only that we should be very
clear in the intended use of the element; it may not be as free-form
as we've left it at this point.  (I still think captuing additional
data from Web pages is useful, and <meta> makes a lot of sense as a
mirror for data extracted from <meta> elements in the HTML documents.)

 > think they violate the idea of XML. I rather like one well-crafted DTD
 > then having multiple DTD's with only minor differences.

  There should be one well-crafted base to start from, but as
information becomes more application-specific, it makes sense to use
"subclassed" DTDs.  I have no problems with this; I just want to be
able to determine that the documents are XBEL documents, even if
actually of a "subclass", so that I can load them easily.  But maybe
an architecture declaration would be just as useful.  ;-)

 > Maybe the 'name' attribute should be declared as NMTOKEN to restrict
 > it to a name token.  With <meta name="..">my data</meta> the content is

  This is good, if <meta> is kept.

 > One of the reasons for putting the url itself in an attribute would be
 > the stricter constraints of CDATA and being able to make it
 > #REQUIRED. As element content the parser cannot check if the element

  This is a good reason; I support this.

 > I looked through my bookmark list and there were several url's that
 > looked like:
 > 
 >   http://someserver/somepage.html&var=x
[URL data discussion...]

  The appropriate solution is probably to spit out character
references for special characters (specifically, "<" and "&").  This
is trivial to implement, and the input would have to be handled
correctly according to XML rules anyway.  There is no need to invoke
additional standards here; "URL encoding" is irrelevant in XML, and
has everything to do with the HTTP requests.  Bookmarks are not
limited to the http: scheme, so why should we need that particular
encoding?

 > Do we have to fix a limit for the depth of recursion or should this be
 > left to every application. Maybe we should say that an XBEL
 > application should at least be able to handle a depth of x folders.

  No.  The DTD & associated documentation is about a data model, not
processing limitations.  This issue is strictly an processing issue.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From bwaumg@urc.tue.nl  Wed Sep 16 17:16:40 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Wed, 16 Sep 1998 18:16:40 +0200
Subject: [XML-SIG] XBEL DTD as a meta-dtd
Message-ID: <199809161616.SAA22462@asterix.urc.tue.nl>

Fred L. Drake writes:
> Marc van Grootel writes:
>  > So the scope is 'hierarchical storage for bookmarks'?. But is
>  > a lossless conversion between XBEL and Netscape still a goal? If not
>  > I think that 'separator' should go since it serves no real purpose. 
> 
>   I'd leave it in; making XBEL specific to bookmarking (rather than
> bookmark extraction) does not mean that the requirement for supporting 
> everything supported by Navigator and MSIE goes away.  If it can't do
> that, it can't be effectively used as an interchange medium, which I
> think it should.  (That's what the tools offered here provide, after
> all.)

It's ok with me, leave it in. 

> 
> Greg Stein wrote:
>  > This is kind of silly. XML is intended to encode the "name" as the
>  > actual tag. Why push this down another level? Using an "owner" tag, you
> ...
>  > XML is enough of an abstraction; you don't want to start creating
>  > additional layers in there. The tendency should be towards additional
> 
> Marc van Grootel wrote:
>  > I think it can extend the life-time of the DTD. Maybe then at a later
>  > stage common conventions could make it into the DTD as an explicit
> 
>   I think Greg has a very good point.  There's no reason that the
> contents of <info> or <description> or <title> or anything else can't
> be structured.  The instance remains a well-formed XBEL document and
> can be down-converted to valid XBEL easily if required.

I also didn't mean to banish these more explicit elements if there's a
good reason for them to be there.
> 
>  > element. This situation is better then defining only a few explicit
>  > elements for info which can lead to tag-abuse by different authors and
> 
>   Actually, the catch-all is a form of tag abuse, whereas introducing
> new elements for specific applications is not.  This doesn't mean that 
> there shouldn't be something like <meta>, only that we should be very
> clear in the intended use of the element; it may not be as free-form
> as we've left it at this point.  (I still think captuing additional
> data from Web pages is useful, and <meta> makes a lot of sense as a
> mirror for data extracted from <meta> elements in the HTML documents.)

Agreed. Both situations can lead to tag-abuse. For a first DTD I think
the escape should be there (on-parole). 

> 
>  > think they violate the idea of XML. I rather like one well-crafted DTD
>  > then having multiple DTD's with only minor differences.
> 
>   There should be one well-crafted base to start from, but as
> information becomes more application-specific, it makes sense to use
> "subclassed" DTDs.  I have no problems with this; I just want to be
> able to determine that the documents are XBEL documents, even if
> actually of a "subclass", so that I can load them easily.  But maybe
> an architecture declaration would be just as useful.  ;-)

Right. Why twitch about that? :-)

>  > Maybe the 'name' attribute should be declared as NMTOKEN to restrict
>  > it to a name token.  With <meta name="..">my data</meta> the content is
> 
>   This is good, if <meta> is kept.
> 
>  > One of the reasons for putting the url itself in an attribute would be
>  > the stricter constraints of CDATA and being able to make it
>  > #REQUIRED. As element content the parser cannot check if the element
> 
>   This is a good reason; I support this.
> 
>  > I looked through my bookmark list and there were several url's that
>  > looked like:
>  > 
>  >   http://someserver/somepage.html&var=x
> [URL data discussion...]
> 
>   The appropriate solution is probably to spit out character
> references for special characters (specifically, "<" and "&").  This
> is trivial to implement, and the input would have to be handled
> correctly according to XML rules anyway.  There is no need to invoke
> additional standards here; "URL encoding" is irrelevant in XML, and
> has everything to do with the HTTP requests.  Bookmarks are not
> limited to the http: scheme, so why should we need that particular
> encoding?

Well some sort of encoding was in order. I picked the first one that
came to me. What it boils down to that just sticking an url somewhere
is not enough. These kind of issues should be clearly documented and
belong to the DTD (the informal part).
> 
>  > Do we have to fix a limit for the depth of recursion or should this be
>  > left to every application. Maybe we should say that an XBEL
>  > application should at least be able to handle a depth of x folders.
> 
>   No.  The DTD & associated documentation is about a data model, not
> processing limitations.  This issue is strictly an processing issue.

I don't say it's absolutely necessary. But it's a
consequence of our datamodel and somehwere there should be a hint
about this. The DTD does not consist only of the formal data model but
also other aspects that cannot be expressed formally in a DTD. Things
like extra constraints on data etc (like the URL stuff).

Marc

---
Marc van Grootel
bwaumg@urc.tue.nl


From bwaumg@urc.tue.nl  Wed Sep 16 17:21:28 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Wed, 16 Sep 1998 18:21:28 +0200
Subject: [XML-SIG] XBEL DTD as a meta-dtd
Message-ID: <199809161621.SAA22530@asterix.urc.tue.nl>

Geir Ove Gronmo wrote: 
> This all depends on which DTDs you have in mind. 
> 

TEI-lite and Docbook. 

>  o Mapping between element content to element content is much easier that
> mapping from element content to attribute values and vice versa (This is
> not yet implemented in xmlarch, but will be soon).

I was just about to try that :{

> 
>  o Reordering is not possible (as far as I know).

Don't think so either. 

Marc

---
Marc van Grootel
bwaumg@urc.tue.nl


From akuchlin@cnri.reston.va.us  Wed Sep 16 17:34:49 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed, 16 Sep 1998 12:34:49 -0400 (EDT)
Subject: [XML-SIG] Anonymous CVS access, and current status
Message-ID: <13823.57389.252087.244902@amarok.cnri.reston.va.us>

Anonymous CVS access to the source tree of the Python/XML package is
now available.  A page with instructions is at
	http://www.python.org/sigs/xml-sig/anon-cvs.html 

Briefly: 
Run the following command to log in (the password is "xmlcvs"):

	cvs -d :pserver:xmlcvs@cvs.python.org:/projects/cvsroot login

To check out the source tree, run:

	cvs -z3 -d :pserver:xmlcvs@cvs.python.org:/projects/cvsroot co xml

That will place everything in a subdirectory named "xml".  To update
the code, run:

	cvs -z3 update -d -P

Comments on all aspects of the package are welcomed.  To propagate
changes back into the source tree, post patches or suggestions on the
SIG mailing list, send them to me privately, or, if you're maintaining
a module, just release a new version and announce it.  

	Other notes on the current status of things:

	* The CVS tree now also contains version 0.11 of Geir Ove
Grønmo's xmlarch module.  It would be imported as "from xml.arch
import xmlarch".  Geir, I've also taken the sample code from your
xmlarch Web page and added it to the XML HOWTO.  Reference
documentation for the classes in xmlarch still has to be written,
though.

	* The demo/ directory has been reorganized, with everything
being split up into separate subdirectories instead of being all
dumped in the same place.  The most interesting new demo is the XBEL
code, in demo/xbel/; this is mostly as it was posted by various
people, and hacked around by me a bit, to make the {msie,ns,adr}_parse
modules read the bookmark file and dump it as XBEL.  xbel_parse.py can
then read an XBEL file and dump it in various formats.  Everything
will need to be updated to use the final DTD.  

	* The critical area, to my eyes, is still the DOM
implementation; I'm partway through an attempt at matching the
Proposed Recommendation, but the code doesn't even run yet, much less
function properly. 

	* Better-placed people in the XML community, please correct me
on this: besides DOM, I don't see any XML-related technologies or
standards that will be finalized any time soon.  The first public XSL
working draft just got released, and there are various XML-Data/DCD,
XSchema, and other things being worked on, but none of those things
will be finished within the next 6 months or so.  Is my perception
correct?

	Therefore, I draw the conclusion that, once the DOM
implementation is updated, there's nothing very significant left to
implement for 1.0 of the Python package, so we need only document
things, have some nice sample code, and then we're done with XML
proper for a while.  Wide string support will remain as a problem, but
that's a String-SIG problem.  That's pretty much the same conclusion
as in my last status update.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
Not all readers are prepared, at all times, to make independent judgments. But
the failure of modern education to equip them to do so even when they have the
inclination creates a serious gap in modern culture.
    -- Robertson Davies, _A Voice from the Attic_


From wunder@infoseek.com  Wed Sep 16 17:41:04 1998
From: wunder@infoseek.com (Walter Underwood)
Date: Wed, 16 Sep 1998 09:41:04 -0700
Subject: [XML-SIG] XBEL DTD as a meta-dtd
In-Reply-To: <199809161502.RAA19327@asterix.urc.tue.nl>
Message-ID: <3.0.5.32.19980916094104.00cef710@corp>

At 05:02 PM 9/16/98 +0200, Marc van Grootel wrote:
>
>I looked through my bookmark list and there were several url's that
>looked like:
>
>  http://someserver/somepage.html&var=x
>
> [...]
>
>The solution might be to url-encode url's. So the above url
>becomes:
>
>  http:%3A%2F%2Fsomeserver%2Fsomepage.html%26var%3Dx

Use XML entities. Using two different kinds of escaping (XML and
HTTP) in the same file is unnecessary and confusing.

I've been saving URLs in XML in my product, and entities work
fine. It turns out that you need the entities in other text too,
since someone might use them in a bookmark name ("Arts & Crafts",
"O'Reilly Books"). So just entify them. Here is a snippet of re-hackery
to entify a string:

# This pattern and replacement function are used to map characters
# in a string to XML entities, like this:  entities.sub(entsub,s)
entities = re.compile('[&<>"\']')
def entsub(matchobj):
    c = matchobj.group()
    if   c == '&': return '&amp;'
    elif c == '<': return '&gt;'
    elif c == '>': return '&lt;'
    elif c == "'": return '&apos;'
    elif c == '"': return '&quot;'
    else:          return ''      # logs a message here in my application

Always, always entify strings as you generate XML. If you slip
in an unescaped special character, you can lose the a whole
file worth of data by making it un-parseable (or make someone 
manually edit it to get it back).

Finally, XBEL is doing things that are also done by the Resource
Description Format (RDF). Though the RDF spec is hard to read,
and may fail just because it is drowning in AI-speak rather than 
being useful, it is worth taking a look at.

wunder

Walter R. Underwood
wunder@infoseek.com
wunder@best.com (home)
http://www.best.com/~wunder/
1-408-543-6946


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Sep 16 18:21:33 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 16 Sep 1998 13:21:33 -0400 (EDT)
Subject: [XML-SIG] Anonymous CVS access, and current status
In-Reply-To: <13823.57389.252087.244902@amarok.cnri.reston.va.us>
References: <13823.57389.252087.244902@amarok.cnri.reston.va.us>
Message-ID: <13823.62365.211321.912636@weyr.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > Anonymous CVS access to the source tree of the Python/XML package is
 > now available.  A page with instructions is at

  Cool; works great!

 > dumped in the same place.  The most interesting new demo is the XBEL
 > code, in demo/xbel/; this is mostly as it was posted by various
 > people, and hacked around by me a bit, to make the {msie,ns,adr}_parse

  This will definately be useful, esp. once the DTD is updated a
little.  I'm not sure of the current state, actually.  (Marc, are you
planning to post an update, or would you like me to integrate the most 
recent discussion?  I'm not sure of the state of <meta>; I don't
recall any conclusive "it must be this way".)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Sep 16 18:23:49 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 16 Sep 1998 13:23:49 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD as a meta-dtd
In-Reply-To: <199809161616.SAA22462@asterix.urc.tue.nl>
References: <199809161616.SAA22462@asterix.urc.tue.nl>
Message-ID: <13823.62501.26409.535399@weyr.cnri.reston.va.us>

Marc van Grootel writes:
 > I don't say it's absolutely necessary. But it's a
 > consequence of our datamodel and somehwere there should be a hint
 > about this. The DTD does not consist only of the formal data model but
 > also other aspects that cannot be expressed formally in a DTD. Things
 > like extra constraints on data etc (like the URL stuff).

  Ok, this is good.  We should include in the document comments about
these processing issues, pointing out that data needs to be "entified" 
and that the model doesn't restrict the depth of the hierarchy.
  Has anyone started on the "informal" part of the DTD?


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From akuchlin@cnri.reston.va.us  Wed Sep 16 18:55:37 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed, 16 Sep 1998 13:55:37 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD as a meta-dtd
In-Reply-To: <3.0.5.32.19980916094104.00cef710@corp>
References: <199809161502.RAA19327@asterix.urc.tue.nl>
 <3.0.5.32.19980916094104.00cef710@corp>
Message-ID: <13823.63954.24840.806558@amarok.cnri.reston.va.us>

Walter Underwood writes:
>Always, always entify strings as you generate XML. If you slip
>in an unescaped special character, you can lose the a whole
>file worth of data by making it un-parseable (or make someone 
>manually edit it to get it back).

	That reminds me of something: quite a lot of XML-related code
will need to entify code, so there should really be a utility function
available to do this.  Other generally useful functions may become
apparent, too, so I propose an xml.util module.  For now, it'll only
have 1 function:

def escape(string, entity_dict = {}):
    Escapes &, ", and ' in string.  If entity_dict is specified, it
must be a dictionary mapping strings to their entity replacements.
For example, passing {'\234': '&ecirc;'} would cause the 8-bit
character chr(234) to be replaced with &ecirc;.

Thoughts?  Should there be a way to specific a character range which
would be escaped numerically, as &#42; or whatever?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
I was honestly very nervous of Constance Wheatcroft. And I wasn't the only
one. Her entire family was afraid of her. Dogs were afraid of her. Bindweed in
the hedge would wither as she passed; birds would forget their nesting
instincts and fly back to north Africa at the sound of her hideous cries.
    -- Tom Baker, in his autobiography


From larsga@ifi.uio.no  Fri Sep 18 10:19:45 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 18 Sep 1998 11:19:45 +0200
Subject: [XML-SIG] XBEL DTD as a meta-dtd
References: <199809161502.RAA19327@asterix.urc.tue.nl> 	<3.0.5.32.19980916094104.00cef710@corp> <13823.63954.24840.806558@amarok.cnri.reston.va.us>
Message-ID: <wkbtodu6a6.fsf@ifi.uio.no>

* Andrew M. Kuchling
| 
| Thoughts?  Should there be a way to specific a character range which
| would be escaped numerically, as &#42; or whatever?

I think the xml.util module makes perfect sense, as does the escape
function. I think we'll also eventually want an XMLWriter class to
simplify XML generation as well. I was about to create a module for
myself with these two things anyway.

Here is the escape function I use for element content:

def escape(str):
    return string.replace(string.replace(str,'&',"&amp;"),"<","&lt;")
        
Here is my XMLWriter (note that it is written for data-oriented
documents, not document-like ones):

# A simple XML-generator

import sys,string

class XMLWriter:

    def __init__(self,out=sys.stdout):
        self.out=out
        self.stack=[]

    def doctype(self,root,pubid,sysid):
        if pubid==None:
            self.out.write("<!DOCTYPE %s SYSTEM '%s'>\n" % (root,sysid))
        else:
            self.out.write("<!DOCTYPE %s PUBLIC '%s' '%s'>\n" % (root,pubid,
                                                                 sysid))
        
    def push(self,elem,attrs={}):
        self.__indent()
        self.out.write("<"+elem)
        for (a,v) in attrs.items():
            self.out.write(" %s='%s'" % (a,self.__escape_attr(v))
        self.out.write(">\n")

        self.stack.append(elem)

    def elem(self,elem,content,attrs={}):
        self.__indent()
        self.out.write("<"+elem)
        for (a,v) in attrs.items():
            self.out.write(" %s='%s'" % (a,self.__escape_attr(v))
        self.out.write(">%s</%s>\n" % (self.__escape_cont(content),elem))

    def empty(self,elem,attrs={}):
        self.__indent()
        self.out.write("<"+elem)
        for a in attrs.items():
            self.out.write(" %s='%s'" % a)
        self.out.write("/>\n")
        
    def pop(self):
        elem=self.stack[-1]
        del self.stack[-1]
        self.__indent()
        self.out.write("</%s>\n" % elem)
    
    def __indent(self):
        self.out.write(" "*(len(self.stack)*2))
    
    def __escape_cont(self,str):
        return string.replace(string.replace(str,'&',"&amp;"),"<","&lt;")

    def __escape_attr(self,str):
        str=string.replace(str,'&',"&amp;")
        return string.replace(string.replace(str,"'","&apos;"),"<","&lt;")

--Lars M.


From grove@infotek.no  Fri Sep 18 13:48:18 1998
From: grove@infotek.no (Geir Ove Gronmo)
Date: Fri, 18 Sep 1998 14:48:18 +0200
Subject: [XML-SIG] Anonymous CVS access, and current status
In-Reply-To: <13823.57389.252087.244902@amarok.cnri.reston.va.us>
Message-ID: <199809181247.OAA24090@mail.infotek.no>

At 12:34 16.09.98 -0400, A.M. Kuchling	 wrote:
>	* Better-placed people in the XML community, please correct me
>on this: besides DOM, I don't see any XML-related technologies or
>standards that will be finalized any time soon.  The first public XSL
>working draft just got released, and there are various XML-Data/DCD,
>XSchema, and other things being worked on, but none of those things
>will be finished within the next 6 months or so.  Is my perception
>correct?

There are some XML-related technologies technologies that I would like to
see implemented/integrated into the Python environment. I wouldn't expect
any of them to be included into the version 1.0 of the Python/XML package
though.

   o XLink ( http://www.w3.org/TR/1998/WD-xlink-19980303 )
   o The Hytime modules ( http://www.hytime.org/ )
   o The SGML Extended Facilies (
http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.html )

   o Topic Navigation Maps ( http://www.hightext.com/tnm/ )
   o Resource Description Framework ( http://www.w3.org/RDF/ )

   o A Python wrapper module for the SP package written by James Clark. (
http://www.jclark.com/sp/ )
   o A SAX driver for nsgmls

Geir O.

 ==================  Geir Ove Grønmo  ==================
|  STEP Infotek as, Gjerdrumsvei 12, 0486 Oslo, Norway  |
|        grove@infotek.no http://www.infotek.no/        |
 -------------------------------------------------------


From larsga@ifi.uio.no  Fri Sep 18 13:59:35 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 18 Sep 1998 14:59:35 +0200
Subject: [XML-SIG] Anonymous CVS access, and current status
In-Reply-To: <199809181247.OAA24090@mail.infotek.no>
References: <199809181247.OAA24090@mail.infotek.no>
Message-ID: <wklnnhshjc.fsf@ifi.uio.no>

* Geir Ove Gronmo
|
|    o A Python wrapper module for the SP package written by James Clark. (
| http://www.jclark.com/sp/ )

We have this already:

<URL:http://itrc.uwaterloo.ca:80/~papresco/pysgml/>

|    o A SAX driver for nsgmls

I'm currently working on this. (Or rather, a general ESIS parser,
since that can be used with other parsers as well.)

However, I have a problem with os.popen in that error messages are
written to the console (on Win95) and not to my Python process. Once
I've figured that one out we have a SAX driver for nsgmls.

Of course, an OLE-based driver would also be nice, but I'm leaving
that for someone else (for now at least). It should be pretty easy do
to, since Paul Prescods PySGML (see above) has a module that uses OLE
to communicate with nsgmls (well, SP).

Other than this I whole-heartedly agree with Geir Ove's suggestions.

--Lars M.


From akuchlin@cnri.reston.va.us  Fri Sep 18 16:01:31 1998
From: akuchlin@cnri.reston.va.us (Andrew Kuchling)
Date: Fri, 18 Sep 1998 11:01:31 -0400 (EDT)
Subject: [XML-SIG] Anonymous CVS access, and current status
In-Reply-To: <199809181247.OAA24090@mail.infotek.no>
References: <13823.57389.252087.244902@amarok.cnri.reston.va.us>
 <199809181247.OAA24090@mail.infotek.no>
Message-ID: <13826.30133.623948.638103@newcnri.cnri.reston.va.us>

Geir Ove Gronmo writes:
>   o XLink ( http://www.w3.org/TR/1998/WD-xlink-19980303 )

	Some form of XLink support seems reasonable.

>   o The Hytime modules ( http://www.hytime.org/ )
>   o The SGML Extended Facilies (
>http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.html )

	I couldn't figure out from the above links exactly what you're
suggesting.  Is it simply the specified architectural form support
described at both the above links?

>   o Topic Navigation Maps ( http://www.hightext.com/tnm/ )
>   o Resource Description Framework ( http://www.w3.org/RDF/ )

	Topic Navigation Maps seems to be a meta-DTD, and RDF is a
DTD, so I'm not sure that they belong in the basic package, but
certainly they could be developed and distributed separately.  I don't
think specific DTDs or meta-DTDs are suitable for the basic package,
unless they turn out to be really, *really* common.  They're OK for
demos, of course, but I don't want to install lots of code that most
people won't use much.

>   o A Python wrapper module for the SP package written by James Clark. (
>http://www.jclark.com/sp/ )
>   o A SAX driver for nsgmls

	The Python code should also work fine in JPython (unless
1.5-isms have crept in), so a driver for Java SAX interfaces would
also be useful.  Before 1.0 I'll look into that.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
First, you must know what the thing is, and then after learn the use of the
same.
    -- Robert Recorde


From akuchlin@cnri.reston.va.us  Mon Sep 21 21:04:45 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Mon, 21 Sep 1998 16:04:45 -0400 (EDT)
Subject: [XML-SIG] DOM: backward compatibility
Message-ID: <13830.45088.437497.816628@amarok.cnri.reston.va.us>

Working on the DOM code this weekend, I realized that quite a few
things will be broken by going to the most recent draft.  How
important is backwards compatibility with the current DOM code?  

	I'm not sure how much existing DOM code is out there that will
be broken by incompatible changes, because I'm not sure if people were
seriously using it or not.  So if you're using the current DOM code
for something, please let me know so I can gauge how important
compatibility is.

	Thanks!

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
And as soon as he was sure that he was dead, he got up and shook himself, and
looked around, and there waiting for him on the bed was his wife, with long
claws out, and her eyes blazing like a green cat ready to spring.
    -- Magda's story, in SANDMAN #62: "The Kindly Ones:6"


From guido@CNRI.Reston.Va.US  Mon Sep 21 21:29:27 1998
From: guido@CNRI.Reston.Va.US (Guido van Rossum)
Date: Mon, 21 Sep 1998 16:29:27 -0400
Subject: [XML-SIG] DOM: backward compatibility
In-Reply-To: Your message of "Mon, 21 Sep 1998 16:04:45 EDT."
 <13830.45088.437497.816628@amarok.cnri.reston.va.us>
References: <13830.45088.437497.816628@amarok.cnri.reston.va.us>
Message-ID: <199809212029.QAA07280@eric.CNRI.Reston.Va.US>

> Working on the DOM code this weekend, I realized that quite a few
> things will be broken by going to the most recent draft.  How
> important is backwards compatibility with the current DOM code?  
> 
> 	I'm not sure how much existing DOM code is out there that will
> be broken by incompatible changes, because I'm not sure if people were
> seriously using it or not.  So if you're using the current DOM code
> for something, please let me know so I can gauge how important
> compatibility is.

I don't know anything about DOM code or how much it is used, but I
would like to relate an anecdote that I once heard about the original
Unix "Make" program.  This was probably around Unix v6.  Someone
inside Bell Labs complained to the author about a particular
misfeature (I believe it was about requiring an actual tab character,
-- instead of any whitespace -- to start a command).  The author
responded that he agreed that it was a misfeature, but that there were
already more than ten users (all inside Bell Labs), so that for
reasons of backwards compatibility he couldn't change it.

Please, be considerate of the future and make things right!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From larsga@ifi.uio.no  Mon Sep 21 21:30:21 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 21 Sep 1998 22:30:21 +0200
Subject: [XML-SIG] DOM: backward compatibility
In-Reply-To: <13830.45088.437497.816628@amarok.cnri.reston.va.us>
References: <13830.45088.437497.816628@amarok.cnri.reston.va.us>
Message-ID: <wkvhmh2opu.fsf@ifi.uio.no>

* Andrew M. Kuchling
|
| How important is backwards compatibility with the current DOM code?
 
I'd say that conforming to the final DOM recommendation must have
priority over backwards compatibility. 

Those to whom backward compatibility is very important can just keep
using the old version anyway.

| I'm not sure how much existing DOM code is out there that will be
| broken by incompatible changes, because I'm not sure if people were
| seriously using it or not.

I'm using it, both for personal conversion scripts (the XML tools
list, for example) and PyPointers, but I've known right from the start
that I would have to update my code to follow later DOM releases. In
other words: no complaints from me.

--Lars M.


From akuchlin@cnri.reston.va.us  Mon Sep 21 22:27:32 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Mon, 21 Sep 1998 17:27:32 -0400 (EDT)
Subject: [XML-SIG] DOM: backward compatibility
In-Reply-To: <199809212029.QAA07280@eric.CNRI.Reston.Va.US>
References: <13830.45088.437497.816628@amarok.cnri.reston.va.us>
 <199809212029.QAA07280@eric.CNRI.Reston.Va.US>
Message-ID: <13830.50176.16488.120316@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>Unix "Make" program.  This was probably around Unix v6.  Someone
>inside Bell Labs complained to the author about a particular
>misfeature (I believe it was about requiring an actual tab character,
>-- instead of any whitespace -- to start a command).  The author
>responded that he agreed that it was a misfeature, but that there were
>already more than ten users (all inside Bell Labs), so that for
>reasons of backwards compatibility he couldn't change it.

	I've heard that story.  I also once read an interview
somewhere where Dennis Ritchie, when asked if he would have done
anything differently in Unix, answered "Spelled creat() correctly."

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
In Rome, in Leningrad, in Darwin. "The door flew open, in he ran, the great,
long, red-legged scissorman."
    -- From DOOM PATROL #20


From akuchlin@cnri.reston.va.us  Mon Sep 21 23:08:18 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Mon, 21 Sep 1998 18:08:18 -0400 (EDT)
Subject: [XML-SIG] Utility functions (was: XBEL DTD as a meta-dtd)
In-Reply-To: <wkbtodu6a6.fsf@ifi.uio.no>
References: <199809161502.RAA19327@asterix.urc.tue.nl>
 <3.0.5.32.19980916094104.00cef710@corp>
 <13823.63954.24840.806558@amarok.cnri.reston.va.us>
 <wkbtodu6a6.fsf@ifi.uio.no>
Message-ID: <13830.52482.568860.535654@amarok.cnri.reston.va.us>

Lars Marius Garshol writes:
>def escape(str):
>    return string.replace(string.replace(str,'&',"&amp;"),"<","&lt;")

	According to section 2.4 of the spec, > also needs to be
escaped as &gt; when it's preceded by ]] ; ]]> needs to be ]]&gt;.
It's probably simplest to always escape > as &gt;, even when it's not
necessary.

>Here is my XMLWriter (note that it is written for data-oriented
>documents, not document-like ones):

	An interesting class.  What do people think: should it be
added somewhere?  One could obtain similar results by creating a DOM
tree and then linearising it, but that's also more complicated to
learn, so I don't think the XMLWriter class would be completely
redundant.  On the other hand, perhaps it should be layered on top of
DOM, and if it turns out that most XML users know the DOM API anyway,
then XMLWriter really is redundant after all.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
You played me well, mortal. But I have played me for time out of mind. And I
do Robin Goodfellow better than anyone.
    -- Robin Goodfellow, in SANDMAN #19: "A Midsummer Night's Dream"


From Jeff.Johnson@stn.siemens.com  Wed Sep 23 15:38:00 1998
From: Jeff.Johnson@stn.siemens.com (Jeff.Johnson@stn.siemens.com)
Date: Wed, 23 Sep 1998 10:38:00 -0400
Subject: [XML-SIG] DOM - where can we get the new stuff?
Message-ID: <85256688.00507A97.00@BI01.boca.ssc.siemens.com>


I spoke (via email) to Stefane Fermigier about the parent-child circular
references in the old DOM package and she said that someone else was
working on it.  I've been hoping to get the new package so I could stop
leaking memory.  Is there a way to get the updates now?

* Andrew M. Kuchling
|
| How important is backwards compatibility with the current DOM code?

As far as backwards compatibility, I'll rewrite my code, no complaints from
me and thanks for working on it!


From akuchlin@cnri.reston.va.us  Wed Sep 23 16:51:51 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Wed, 23 Sep 1998 11:51:51 -0400 (EDT)
Subject: [XML-SIG] DOM - where can we get the new stuff?
In-Reply-To: <85256688.00507A97.00@BI01.boca.ssc.siemens.com>
References: <85256688.00507A97.00@BI01.boca.ssc.siemens.com>
Message-ID: <13833.6128.887744.183677@amarok.cnri.reston.va.us>

Jeff.Johnson@stn.siemens.com writes:
>I spoke (via email) to Stefane Fermigier about the parent-child circular
>references in the old DOM package and she said that someone else was
>working on it.  I've been hoping to get the new package so I could stop
>leaking memory.  Is there a way to get the updates now?

	I haven't yet committed any of the updates to the CVS tree,
because they're not finished yet.  Because the changes are so
extensive, I'll send out an announcement to this list when I actually
commit them.  (Currently I'm still going through the DOM draft, and
implementing all the methods and attributes described.)

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
It would take days to catalog your sins, Abbé. I simply don't have the time.
    -- Sebastian, in SEBASTIAN O #2


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Sep 23 17:41:45 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 23 Sep 1998 12:41:45 -0400 (EDT)
Subject: [XML-SIG] XBEL revision
Message-ID: <13833.7733.555953.186008@weyr.cnri.reston.va.us>

  In working on the bookmarks support for Grail, I'm taking a hard
look at how I do XML processing.  To do everything I want with a
well-formed (but not necessarily valid) XBEL instance, a fair amount
of special treatment may be needed to avoid being destructive of
additional content in the file.  I'll try and discuss the general
issues of programmatic editing of well-formed XML in another message,
probably not today.
  The issue which immediately concerns me in this message is the
<info> element.  Marc van Grootel has proposed that it simply contain
(meta*), for whatever definition of <meta> is decided on.  Greg Stein
rightly pointed out that there's a level of silliness to specifying a
particular construct for potentially ad-hoc data that can be stored in 
the <info> element if we use well-formed XML rather than valid XML
(which is supposedly one advantage for XML over SGML).  This is not
to say that there isn't a need for something that stores ad-hoc
information that has some level of structure.
  It is reasonable to separate information stored about the resource
identified by a <bookmark> and application information which relates
to the <bookmark>.  I'd like to propose that distinct elements be
defined for each, and include an attribute on the application-data
element which can be used to specify which processing application it
pertains to.  This allows each application to locate its own data
while more easily avoiding contanimation of other applications'
state.
  Specifically, let's define <metadata> and <appdata>, adjusting
<bookmark> and <xbel> accordingly:

<!ELEMENT xbel     (title?, (bookmark|folder|url|alias|separator)*)>
<!ELEMENT bookmark (metadata?, url, desc?, appdata*)>
<!ELEMENT metadata (#PCDATA)>
<!ELEMENT appdata  (#PCDATA)>
<!ATTLIST appdata
	    application CDATA #REQUIRED
>

  Structuring it this way and documenting our intentions for
<metadata> and <appdata> makes processing XBEL a bit more clear for
applications which want more than simple hierarchical bookmarks, while 
maintaining a fairly simple exchange DTD usable for advanced
applications as well.  (The original application, as I recall! ;)
  Note that a name for Jack Jansen's "pass" node still needs to be
determined, and the appropriate content-model changes incorporated
into the DTD.  Jack, if you can come up with a good name, I'll be glad 
to integrate it into the DTD.  "pass" probably isn't clear enough
outside the Python community.  (Yes, I think we're getting this into
shape to be a very usable document type.)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From Jack.Jansen@cwi.nl  Thu Sep 24 11:56:33 1998
From: Jack.Jansen@cwi.nl (Jack Jansen)
Date: Thu, 24 Sep 1998 12:56:33 +0200
Subject: [XML-SIG] XBEL revision
In-Reply-To: Message by "Fred L. Drake" <fdrake@cnri.reston.va.us> ,
 Wed, 23 Sep 1998 12:41:45 -0400 (EDT) , <13833.7733.555953.186008@weyr.cnri.reston.va.us>
Message-ID: <UTC199809241056.MAA18471.jack@snelboot.cwi.nl>

>   Note that a name for Jack Jansen's "pass" node still needs to be
> determined, and the appropriate content-model changes incorporated
> into the DTD.  Jack, if you can come up with a good name, I'll be glad 
> to integrate it into the DTD.  "pass" probably isn't clear enough
> outside the Python community.  (Yes, I think we're getting this into
> shape to be a very usable document type.)

After a little more thinking I'm not sure whether "pass" is worth it. It can 
solve the problem of determining whether a node was deleted in one bookmark 
file or added in the other, but there are various other issues it doesn't 
solve, such as moves. All in all I'm not sure anymore whether a feature that 
solves only 90% of the cases is really worth it...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@cwi.nl      | ++++ if you agree copy these lines to your sig ++++
http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Jack.Jansen@cwi.nl  Thu Sep 24 14:14:36 1998
From: Jack.Jansen@cwi.nl (Jack Jansen)
Date: Thu, 24 Sep 1998 15:14:36 +0200
Subject: [XML-SIG] Converting HTML to XML, advise wanted
Message-ID: <UTC199809241314.PAA20323.jack@snelboot.cwi.nl>

I have to do a (partial) translation of HTML to an XML-based format (RealText, 
to be specific), and I'm a bit uncertain as to how to proceed. Half a year ago 
I would have just used htmllib.py to parse the html and used a formatter.py 
based class to generate the XML, but nowadays I sort of have the feeling that 
a DOM based approach might be a better path. But, as I've only glanced at the 
DOM stuff on this list I'm not 100% convinced that this is indeed the best way 
to go. It seems PyDOM contains all the needed stuff, but again I'm not 
completely sure of this.

Does anyone have any insights to share?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@cwi.nl      | ++++ if you agree copy these lines to your sig ++++
http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Fred L. Drake, Jr." <fdrake@acm.org  Thu Sep 24 15:50:42 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Thu, 24 Sep 1998 10:50:42 -0400 (EDT)
Subject: [XML-SIG] XBEL revision
In-Reply-To: <UTC199809241056.MAA18471.jack@snelboot.cwi.nl>
References: <fdrake@cnri.reston.va.us>
 <13833.7733.555953.186008@weyr.cnri.reston.va.us>
 <UTC199809241056.MAA18471.jack@snelboot.cwi.nl>
Message-ID: <13834.23378.9690.598955@weyr.cnri.reston.va.us>

Jack Jansen writes:
 > solve the problem of determining whether a node was deleted in one bookmark 
 > file or added in the other, but there are various other issues it doesn't 
 > solve, such as moves. All in all I'm not sure anymore whether a feature that 
 > solves only 90% of the cases is really worth it...

  Good point.  When you figure out the how to better approach it, we
can devise an XBEL 2.0, or something else that includes the required
constructs.  I think a version/variant that supports the kind of
syncing that you seem to be dealing with should be published to allow
other browsers to support it as well.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From Fred L. Drake, Jr." <fdrake@acm.org  Thu Sep 24 19:47:16 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Thu, 24 Sep 1998 14:47:16 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD
Message-ID: <13834.37812.512942.169983@weyr.cnri.reston.va.us>

--/kOlW/3UHa
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit


  Since I've not heard any comments other than Jack's regarding the
changes I've suggested to XBEL, I'm attaching a new (complete) DTD
below.  I think Andrew really wants to get an updated version into the 
CVS repository, and I'd like to get it finallized as well.  If there
are no problems with the DTD over the next several days, I'll start on
the documentation.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


--/kOlW/3UHa
Content-Type: text/xml
Content-Description: Proposed XBEL DTD
Content-Disposition: inline;
	filename="xbel.dtd"
Content-Transfer-Encoding: 7bit

<!ELEMENT xbel     (title?, (bookmark|folder|url|alias|separator)*)>
<!ATTLIST xbel
            version CDATA   #FIXED "1.0"
>

<!ELEMENT title	  (#PCDATA)>

<!--=================== Info blocks ===============================-->

<!-- There's an implicit understanding that metadata and appdata will
     not just be #PCDATA but will contain application-specific elements.
     There may be some need for multiple metadata elements tagged
     similarly to the appdata elements.
  -->

<!ELEMENT metadata (#PCDATA)>
<!ELEMENT appdata  (#PCDATA)>
<!ATTLIST appdata
	    id		ID    #IMPLIED
	    application CDATA #REQUIRED
>

<!--=================== Folder ====================================-->

<!ELEMENT folder   (title?,info?,desc?,(bookmark|folder|separator|alias|url)*)>
<!ATTLIST folder
            id	     ID	      #IMPLIED
	    added     CDATA   #IMPLIED
            folded   (yes|no) 'yes'   
>

<!--=================== URL ======================================-->

<!ELEMENT url        (#PCDATA)>
<!ATTLIST url
            id	     ID	      #IMPLIED
	    added    CDATA    #IMPLIED
            href     CDATA    #REQUIRED
            visited  CDATA    #IMPLIED
            modified CDATA    #IMPLIED
            response CDATA    #IMPLIED
            checked  CDATA    #IMPLIED
>

<!--=================== Bookmark ==================================-->

<!-- a wrapper around an url when it has to contain extra info
     like a description and info blocks
  -->

<!ELEMENT bookmark (metadata?, url, desc?, appdata*)>

<!ELEMENT desc       (#PCDATA)>

<!--=================== Separator =================================-->

<!ELEMENT separator EMPTY>

<!--=================== Alias =====================================-->

<!ELEMENT alias EMPTY>
<!ATTLIST alias
            ref       IDREF    #REQUIRED
>

--/kOlW/3UHa--


From akuchlin@cnri.reston.va.us  Thu Sep 24 20:50:31 1998
From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling)
Date: Thu, 24 Sep 1998 15:50:31 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD
In-Reply-To: <13834.37812.512942.169983@weyr.cnri.reston.va.us>
References: <13834.37812.512942.169983@weyr.cnri.reston.va.us>
Message-ID: <13834.40878.343153.718626@amarok.cnri.reston.va.us>

Fred L. Drake writes:
>changes I've suggested to XBEL, I'm attaching a new (complete) DTD
>below.  I think Andrew really wants to get an updated version into the 
>CVS repository, and I'd like to get it finallized as well.  If there
>are no problems with the DTD over the next several days, I'll start on
>the documentation.

	I'm interested in seeing the XBEL work completed for a few
reasons.  First, it makes a pretty good demonstration program for the
XML toolkit, because it does some real work, but it's not so
complicated that it's difficult to understand.  Second, XBEL is
something that could be pretty useful.  The conversion scripts can be
made really useful with a little work; they simply need to find the
current user's bookmark file, and automatically dump it out as XBEL.
I'll eventually do this for lynx_parse.py and xbel_parse.py, though
it'll be a while before I manage to do that; if anyone wants to grab
the CVS tree and update the code, feel free.

	Once the DTD is settled and the software is updated to match,
it would then be nice to publicize the DTD a little bit: list it on
schema.net, see if the Mozilla people are interested, etc.

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
I do not have a psychiatrist and I do not want one, for the simple reason that
if he listened to me long enough, he might become disturbed.
    -- James Thurber, "Carpe Noctem, If You Can", in _Credos and Curios_ (1962)


From larsga@ifi.uio.no  Thu Sep 24 23:27:17 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 25 Sep 1998 00:27:17 +0200
Subject: [XML-SIG] XBEL DTD
In-Reply-To: <13834.40878.343153.718626@amarok.cnri.reston.va.us>
References: <13834.37812.512942.169983@weyr.cnri.reston.va.us> <13834.40878.343153.718626@amarok.cnri.reston.va.us>
Message-ID: <wkyar9xi2i.fsf@ifi.uio.no>

* Andrew M. Kuchling
| 
| Once the DTD is settled and the software is updated to match, it
| would then be nice to publicize the DTD a little bit: list it on
| schema.net, see if the Mozilla people are interested, etc.

xml-dev would be a very logical place to publicize it, I think.
--Lars M.


From bwaumg@urc.tue.nl  Fri Sep 25 21:05:34 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Fri, 25 Sep 1998 22:05:34 +0200
Subject: [XML-SIG] XBEL DTD
Message-ID: <199809252005.WAA13905@asterix.urc.tue.nl>

Hi,

I've got a couple of comments on the latest XBEL DTD.

The folder element still had a reference to an 'info' element which
didn't exist anymore. However I would vote for leaving 'info' in as a
container for the appdata and metadata. This makes it easier to skip
the whole block of info at once.

Here's some stuff that could go into the beginning of the DTD.

  <!ENTITY lt "&#38;#60;">
  <!ENTITY gt "&#62;">
  <!ENTITY amp "&#38;#38;">
  <!ENTITY apos "&#39;">
  <!ENTITY quot "&#34;">

What other entities should be included? Or should everything be
encoded with &# references.

  <!ENTITY % SPAMCANS 'bookmark|folder|url|alias|separator'>

I would like to suggest the following content models:

  xbel     (title?,info?,desc?,(&SPAMCANS;)*
  folder   (title?,info?,desc?,(&SPAMCANS;)*
  bookmark (url,info?,desc?)

It's not clear to me why there's only one metadata element
allowed. How will metadata be used and how should one choose between
metadata and appdata? I would guess that appdata is kinda private to a
certain app. And metadata is data that one would like to share with
other apps (public) like a list of keywords. If there's only one
metadata element should new metadata stuff be appended to it's
contents. I believe a minimal XML structure even for metadata is
better then just declaring it #PCDATA. 


Marc
--
Marc van Grootel
bwaumg@urc.tue.nl


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Sep 25 22:03:30 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Fri, 25 Sep 1998 17:03:30 -0400 (EDT)
Subject: [XML-SIG] XBEL DTD
In-Reply-To: <199809252005.WAA13905@asterix.urc.tue.nl>
References: <199809252005.WAA13905@asterix.urc.tue.nl>
Message-ID: <13836.1314.478066.340517@weyr.cnri.reston.va.us>

--0d9vCTG2ok
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit


Marc van Grootel writes:
 > The folder element still had a reference to an 'info' element which
 > didn't exist anymore. However I would vote for leaving 'info' in as a
 > container for the appdata and metadata. This makes it easier to skip

  Oops, my fault; I shouldn't drive XEmacs so fast!  ;-)
  I'm fine with using <info> as a container for the <metadata> and
<appinfo> elements.

 >   <!ENTITY lt "&#38;#60;">
 >   <!ENTITY gt "&#62;">
 >   <!ENTITY amp "&#38;#38;">
 >   <!ENTITY apos "&#39;">
 >   <!ENTITY quot "&#34;">
 > 
 > What other entities should be included? Or should everything be
 > encoded with &# references.

  Lots of stuff would be reasonable, but this is sufficient given that 
(I expect) most editing will be done by software other than a text
editor.

 >   <!ENTITY % SPAMCANS 'bookmark|folder|url|alias|separator'>
 > 
 > I would like to suggest the following content models:
 > 
 >   xbel     (title?,info?,desc?,(&SPAMCANS;)*
 >   folder   (title?,info?,desc?,(&SPAMCANS;)*
 >   bookmark (url,info?,desc?)

  Ok, this looks good.

 > It's not clear to me why there's only one metadata element
 > allowed. How will metadata be used and how should one choose between
 > metadata and appdata? I would guess that appdata is kinda private to a
 > certain app. And metadata is data that one would like to share with
 > other apps (public) like a list of keywords. If there's only one
 > metadata element should new metadata stuff be appended to it's
 > contents. I believe a minimal XML structure even for metadata is
 > better then just declaring it #PCDATA. 

  Hmm, my initial thought was that <metadata> would be (essentially)
for things that are provided with the document, perhaps the contents
of HTML <meta> and <link> elements, but you bring up a valid point:
why the distinction between the two types of "related" data and why
just one <metadata>.  After taking another (brief) look at the
immediate plans for the Dublin Core and the embedding-in-HTML approach 
those folks are advocating as a first step, I'll revise this stuff a
little:

  <!ELEMENT info (metadata*)>

  <!ELEMENT metadata (meta*)>
  <!ATTLIST metadata
	    id		ID    #IMPLIED
	    scheme	CDATA #IMPLIED
	    lang	CDATA #IMPLIED
  >
  <!ELEMENT meta (#PCDATA)>
  <!ATTLIST meta
	    name	CDATA #REQUIRED
  >

  An application that wants its own area to write in can simply use a
private value for the scheme attribute.  I've left the attribute
#IMPLIED instead of #REQUIRED with the presumption that a <metadata>
without a name can be used to stash HTML <meta> elements which don't
specify a scheme attribute.  Alternately, we can simply specify a
scheme for this.
  (Should we register an owner identifier so we can create new FPIs?
I think there's an option to use Internet domain names, so we could be 
"-//IDN python.org//" or something like that.  At least we should make 
a recommendation regarding what a scheme identifier should be (URL,
URN), but I think we still want to be able to assign FPIs to any DTDs
that come out of our efforts.)
  I've attached the complete DTD as found in my emacs buffer below.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


--0d9vCTG2ok
Content-Type: text/xml
Content-Description: Proposed XBEL DTD
Content-Disposition: inline;
	filename="xbel.dtd"
Content-Transfer-Encoding: 7bit

<!ENTITY lt "&#38;#60;">
<!ENTITY gt "&#62;">
<!ENTITY amp "&#38;#38;">
<!ENTITY apos "&#39;">
<!ENTITY quot "&#34;">

<!ENTITY % NODES 'bookmark|folder|url|alias|separator'>

<!ELEMENT xbel     (title?, info?, desc?, (%NODES;)*)>
<!ATTLIST xbel
            version CDATA   #FIXED "1.0"
>

<!ELEMENT title	  (#PCDATA)>

<!--=================== Info blocks ===============================-->

<!ELEMENT info (metadata*)>

<!ELEMENT metadata (meta*)>
<!ATTLIST metadata
	    id		ID    #IMPLIED
	    scheme	CDATA #IMPLIED
	    lang	CDATA #IMPLIED
>
<!ELEMENT meta (#PCDATA)>
<!ATTLIST meta
	    name	CDATA #REQUIRED
>

<!--=================== Folder ====================================-->

<!ELEMENT folder   (title?,info?,desc?,(%NODES;)*)>
<!ATTLIST folder
            id	     ID	      #IMPLIED
	    added     CDATA   #IMPLIED
            folded   (yes|no) 'yes'   
>

<!--=================== URL ======================================-->

<!ELEMENT url        (#PCDATA)>
<!ATTLIST url
            id	     ID	      #IMPLIED
	    added    CDATA    #IMPLIED
            href     CDATA    #REQUIRED
            visited  CDATA    #IMPLIED
            modified CDATA    #IMPLIED
            response CDATA    #IMPLIED	-- HTTP response code? --
            checked  CDATA    #IMPLIED
>

<!--=================== Bookmark ==================================-->

<!-- a wrapper around an url when it has to contain extra info
     like a description and info blocks
  -->

<!ELEMENT bookmark (url, info?, desc?)>

<!ELEMENT desc       (#PCDATA)>

<!--=================== Separator =================================-->

<!ELEMENT separator EMPTY>

<!--=================== Alias =====================================-->

<!ELEMENT alias EMPTY>
<!ATTLIST alias
            ref       IDREF    #REQUIRED
>

--0d9vCTG2ok--


From lisarein@finetuning.com  Fri Sep 25 22:26:54 1998
From: lisarein@finetuning.com (Lisa Rein)
Date: Fri, 25 Sep 1998 14:26:54 -0700
Subject: [XML-SIG] Re: XML-SIG digest, Vol 1 #107 - 3 msgs
References: <199809251600.MAA22311@python.org>
Message-ID: <360C0A9E.3861F940@finetuning.com>

> | Once the DTD is settled and the software is updated to match, it
> | would then be nice to publicize the DTD a little bit: list it on
> | schema.net, see if the Mozilla people are interested, etc.
> 
> xml-dev would be a very logical place to publicize it, I think.
> --Lars M.

i was thinking about writing about xbel in my book --would anyone
object?

lisa


From digitome@iol.ie  Sat Sep 26 10:20:21 1998
From: digitome@iol.ie (Sean Mc Grath)
Date: Sat, 26 Sep 1998 10:20:21 +0100
Subject: [XML-SIG] XBEL DTD
In-Reply-To: <199809252005.WAA13905@asterix.urc.tue.nl>
Message-ID: <3.0.6.32.19980926102021.0092f5c0@gpo.iol.ie>

[Marc van Grootel]
>Here's some stuff that could go into the beginning of the DTD.
>
>  <!ENTITY lt "&#38;#60;">
>  <!ENTITY gt "&#62;">
>  <!ENTITY amp "&#38;#38;">
>  <!ENTITY apos "&#39;">
>  <!ENTITY quot "&#34;">
>
These are built-in in to all conforming XML parsers. There is no need to
declare them.

...
>I would like to suggest the following content models:
>
>  xbel     (title?,info?,desc?,(&SPAMCANS;)*
>  folder   (title?,info?,desc?,(&SPAMCANS;)*
>  bookmark (url,info?,desc?)

This needs to be %SPAMCANS; (A parameter entity rather than a general
entity reference.

Cheers,


Sean Mc Grath

def Get_URI_Of_Superlative_Scripting_Language():
	return "http://www.python.org"


From bwaumg@urc.tue.nl  Sat Sep 26 14:33:39 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Sat, 26 Sep 1998 15:33:39 +0200
Subject: [XML-SIG] XBEL DTD
Message-ID: <199809261333.PAA26598@asterix.urc.tue.nl>

> 
> [Marc van Grootel]
> >Here's some stuff that could go into the beginning of the DTD.
> >
> >  <!ENTITY lt "&#38;#60;">
> >  <!ENTITY gt "&#62;">
> >  <!ENTITY amp "&#38;#38;">
> >  <!ENTITY apos "&#39;">
> >  <!ENTITY quot "&#34;">
> >

> Sean Mc Grath wrote:

> These are built-in in to all conforming XML parsers. There is no need to
> declare them.
> 

Yes I know but the XML Rec. states that:

  [4.6 Predefined Entities]
  For interoperability, valid XML documents should declare these
  entities, like any others, before using them.
 
BTW I think it is a good idea to at least include the entities that
HTML includes. Netscape's bookmark file is HTML and maybe there are
others that store their bookmarks in HTML. It's silly if an
application translates something like &Auml; into &amp;uml;


> ..
> >I would like to suggest the following content models:
> >
> >  xbel     (title?,info?,desc?,(&SPAMCANS;)*
> >  folder   (title?,info?,desc?,(&SPAMCANS;)*
> >  bookmark (url,info?,desc?)
> 
> This needs to be %SPAMCANS; (A parameter entity rather than a general
> entity reference.

Oops.
 
> Cheers,

Marc
--
Marc van Grootel
bwaumg@urc.tue.nl


From bwaumg@urc.tue.nl  Sat Sep 26 16:43:04 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Sat, 26 Sep 1998 17:43:04 +0200
Subject: [XML-SIG] Re: XML-SIG digest, Vol 1 #107 - 3 msgs.
Message-ID: <199809261543.RAA28183@asterix.urc.tue.nl>

> 
> i was thinking about writing about xbel in my book --would anyone
> object?
> 
> lisa

Nice. What's it about? The book, I mean.

Marc
--
Marc van Grootel
bwaumg@urc.tue.nl


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Sep 29 20:24:25 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 29 Sep 1998 15:24:25 -0400 (EDT)
Subject: [XML-SIG] ISO 8601 date support
Message-ID: <13841.13289.766318.206344@weyr.cnri.reston.va.us>

  I've just sent Andrew a module that parses and formats ISO 8601
dates, at least as far as the W3C profile supports.  See
http://www.w3.org/TR/NOTE-datetime for the profile.  We're planning on 
it going into the xml.utils package.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From larsga@ifi.uio.no  Tue Sep 29 22:02:44 1998
From: larsga@ifi.uio.no (Lars Marius Garshol)
Date: 29 Sep 1998 22:02:44 +0100
Subject: [XML-SIG] ISO 8601 date support
In-Reply-To: <13841.13289.766318.206344@weyr.cnri.reston.va.us>
References: <13841.13289.766318.206344@weyr.cnri.reston.va.us>
Message-ID: <wk7lymws23.fsf@ifi.uio.no>

* Fred L. Drake
|
| I've just sent Andrew a module that parses and formats ISO 8601
| dates, at least as far as the W3C profile supports.  See
| http://www.w3.org/TR/NOTE-datetime for the profile.  We're planning
| on it going into the xml.utils package.

Great! Definitely a highly useful piece for the cabal. I can think of
several different places where this might be useful in my current XML
stuff.

--Lars M.


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Sep 30 15:56:55 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 30 Sep 1998 10:56:55 -0400 (EDT)
Subject: [XML-SIG] <url> checked and response attributes
Message-ID: <13842.18103.510049.341490@weyr.cnri.reston.va.us>

  I seem to recall that the "checked" and "response" attributes of the 
<url> element were added to XBEL to support stuff that's tracked by
MSIE.  Could someone please clarify the purpose of these attributes
for me?  My current understanding is that "checked" should store the
time when the browser last attempted to access the resource,
regardless of success.  Is the "response" attribute expected to store
the HTTP response code?  The code and the message?  Or something else?
How is it handled for resources not accessed via HTTP (omitted,
perhaps?)?
  I'm planning to send Andrew the public text of the DTD shortly for
updating the repository, but would like some clarification on these
attributes before that.
  The DTD will be assigned the public identifier:

	-//IDN python.org//DTD XML Bookmark Exchange Language 1.0//EN

  I'll start working on the documentation this weekend.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From bwaumg@urc.tue.nl  Wed Sep 30 17:48:45 1998
From: bwaumg@urc.tue.nl (Marc van Grootel)
Date: Wed, 30 Sep 1998 18:48:45 +0200
Subject: [XML-SIG] <url> checked and response attributes
Message-ID: <199809301648.SAA19336@asterix.urc.tue.nl>

> 
>   I seem to recall that the "checked" and "response" attributes of the 
> <url> element were added to XBEL to support stuff that's tracked by
> MSIE.

I suggested them in the very beginning because I thought they
would be useful (not because it is tracked by MSIE - maybe it does
but, does it?).
 
>  Could someone please clarify the purpose of these attributes
> for me?  My current understanding is that "checked" should store the
> time when the browser last attempted to access the resource,
> regardless of success.  Is the "response" attribute expected to store
> the HTTP response code?  

That was the idea.

> The code and the message?  Or something else?
> How is it handled for resources not accessed via HTTP (omitted,
> perhaps?)?

Didn't think of that. 

Attributes for storing the status of a link is still a good idea
but I didn't give the choice of attributes for supporting it
much thought.


>   I'm planning to send Andrew the public text of the DTD shortly for
> updating the repository, but would like some clarification on these
> attributes before that.
>   The DTD will be assigned the public identifier:
> 
> 	-//IDN python.org//DTD XML Bookmark Exchange Language 1.0//EN
> 
>   I'll start working on the documentation this weekend.
> 
> 
>   -Fred

BTW what about the contents of the scheme and  language attributes? I
don't know the Dublin Core but I understand you got it from there?
Could you give an example of an info element?

Marc
--
Marc van Grootel
bwaumg@urc.tue.nl 


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Sep 30 19:07:22 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 30 Sep 1998 14:07:22 -0400 (EDT)
Subject: [XML-SIG] <url> checked and response attributes
In-Reply-To: <199809301648.SAA19336@asterix.urc.tue.nl>
References: <199809301648.SAA19336@asterix.urc.tue.nl>
Message-ID: <13842.29530.662350.571366@weyr.cnri.reston.va.us>

Marc van Grootel writes:
 > I suggested them in the very beginning because I thought they
 > would be useful (not because it is tracked by MSIE - maybe it does
 > but, does it?).

  Good question; 

 > Attributes for storing the status of a link is still a good idea
 > but I didn't give the choice of attributes for supporting it

  I agree that storing this kind of information is potentially very
useful, especially if there's any software that can use it.  I think a 
link-update monitor could very easily use such information.  I am
concerned about adding several "untested" attributes to a primary
element in the hope that someone will actually write enough software
that uses it (more than one app.).  We might want to drop the
attributes from <url> and create a <metadata> profile for it.

 > BTW what about the contents of the scheme and  language attributes? I
 > don't know the Dublin Core but I understand you got it from there?
 > Could you give an example of an info element?

  Good idea.
  Metadata about objects (including documents) is typically given as a 
set of key/value pairs.  The keys are usually just strings (like RFC
822 headers), and values may be strings (possibly constrained by the
definition of the bit of metadata, i.e., it may be boolean, or numeric, 
or a date), or it may be structured in some way (XML, SGML, or
whatever).
  The catch with metadata is to understand what it means (no AI here,
though).  To "understand" metadata, you need to understand what schema 
it conforms to.  As an example, consider a library's cataloging
system.  To understand what a catalog number means, you need to know
what kind of number was assigned: Dewey Decimal, U.S. Library of
Congress (LOC), or something else.  Since there's no reason not to
assign catalog numbers for both Dewey Decimal and LOC use, an <info>
for a book (with completely made up numbers; I don't remember either
system well enough) might look like this:

	<info>
	  <metadata scheme="Library of Congress">
	    <meta name="catalog number">TR567 A45.1</meta>
	    </metadata>
	  <metadata scheme="Dewey Decimal">
	    <meta name="catalog number">Z567 12</meta>
	    </metadata>
	  </info>

  The Dublin Core is a specific metadata system; particular bits of
data about a resource are defined and given identifying keys.  It is
being used on the Web by some projects and the working group has dealt 
with issues related to embedding in HTML as well as semantics.  An
example of storing Dublin Core data in XBEL:

	<info>
	  <metadata scheme="http://purl.oclc.org/metadata/dublin_core/"
		    lang="en">
	    <meta name="Creator">Fred L. Drake, Jr.
	      and Roger E. Masse</meta>
	    <meta name="Publisher">Corporation for National
	      Research Initiatives</meta>
	    <meta name="Description">Python interface to the
	      Kerberos V5 security package.</meta>
	    <meta name="Identifier">
	      URN:hdl:1895.22/1001</meta>
	    <meta name="Identifier">
	      URL:ftp://ftp.python.org/pub/python/contrib/System/krb5module-0.1.tar.gz</meta>
	    </metadata>
	  </info>

  Adding the scheme and lang attributes to <metadata> seems to make
the most sense; typically, several metadata items from a single scheme 
will be used, with natural language text in a single language.
  More information on the Dublin Core is available at
<http://purl.oclc.org/metadata/dublin_core/>.  I'll try to include
some useful examples and links in the XBEL documentation.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Sep 30 21:47:39 1998
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Wed, 30 Sep 1998 16:47:39 -0400 (EDT)
Subject: [XML-SIG] <url> checked and response attributes
In-Reply-To: <199809301648.SAA19336@asterix.urc.tue.nl>
References: <199809301648.SAA19336@asterix.urc.tue.nl>
Message-ID: <13842.39147.458594.392740@weyr.cnri.reston.va.us>

Marc van Grootel writes:
 > I suggested them in the very beginning because I thought they
 > would be useful (not because it is tracked by MSIE - maybe it does
 > but, does it?).

  I think I sent my message before finishing my response on this.
  As far as I can tell, it does not.  But that doesn't mean MSIE
doesn't stash the information somewhere; I could only find the
individual files that represent each bookmark.  The only information
was the title (encoded as the file name, sheesh!), the URL, and the
modification time (in some undetermined format).


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr.	    Reston, VA  20191