From lisarein@finetuning.com Tue Sep 1 17:47:52 1998 From: lisarein@finetuning.com (Lisa Rein) Date: Tue, 01 Sep 1998 09:47:52 -0700 Subject: [XML-SIG] Re: XML-SIG digest, Vol 1 #86 - 1 msg References: <199809011600.MAA14717@python.org> Message-ID: <35EC2538.F78FC6E@finetuning.com> Actually you guys I'm doing a story on these programs and I gotta ask you -- is CVS really any good. Are there specific needs you've found that it can't address? Just trying to get a reality check on what's out there. Thanks, lisa rein http://www.finetuning.com/collect.html xml-sig-admin@python.org wrote: > > Send XML-SIG maillist submissions to > xml-sig@python.org > > To subscribe or unsubscribe via the web, visit > http://www.python.org/mailman/listinfo/xml-sig > or, via email, send a message with subject or body 'help' to > xml-sig-request@python.org > You can reach the person managing the list at > xml-sig-admin@python.org > > (When replying, please edit your Subject line so it is more specific than > "Re: Contents of XML-SIG digest...") > > ------------------------------------------------------------------------ > Today's Topics: > > 1. Re: Could we use a public CVS tree? (Andrew M. Kuchling) > > ------------------------------------------------------------------------ > > Subject: Re: [XML-SIG] Could we use a public CVS tree? > Date: Mon, 31 Aug 1998 17:26:46 -0400 (EDT) > From: "Andrew M. Kuchling" > To: xml-sig@python.org > References: > <13798.47324.921033.764667@amarok.cnri.reston.va.us> > > > Jack Jansen writes: > >I think it definitely would help. However, write access may pose a problem, > >especially if we don't want everyone to be able to change every bit of the > >tree. What may be easier is semi-automatic updates, with a human (i.e. you:-) > >in the loop. Developers would mail diffs to you in an easy to recognize way (a > >different mail alias would probably be easiest), and after a quick check you > >would just feed the mail messages into patch and do the commit. > > I agree that write-access is less vital; people are working on > separate components of the package, so it's quite simple for me to > drop in the latest sgmlop.c or saxlib or whatever, and commit the > resulting changes. > > I'll be working on setting this up, and hope to have it > operational later this week. > > -- > A.M. Kuchling http://starship.skyport.net/crew/amk/ > You'll have to leave my meals on a tray outside the door because I'll be > working pretty late on the secret of making myself invisible, which may take > me almost until eleven o'clock. > -- S.J. Perelman, "Captain Future, Block That Kick!" From rherath@cs.monash.edu.au Wed Sep 2 03:05:45 1998 From: rherath@cs.monash.edu.au (Ravindra N Herath) Date: Wed, 2 Sep 1998 12:05:45 +1000 (EST) Subject: [XML-SIG] XML sample files Message-ID: I am new to the list and am learning XML, could someone help me out and post a sample file that is not complex or direct me to one, so that I have a basic understanding of XML. Thanks, Ravi From fredrik@pythonware.com Wed Sep 2 09:16:43 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 2 Sep 1998 09:16:43 +0100 Subject: [XML-SIG] XML sample files Message-ID: <016c01bdd64b$9a458320$f29b12c2@pythonware.com> >I am new to the list and am learning XML, could someone help me out and >post a sample file that is not complex or direct me to one, so that I have >a basic understanding of XML. Check the XML topic guide: http://www.python.org/topics/xml/ you may wish to start with: http://www.stud.ifi.uio.no/~larsga/download/xml/xml_eng.html for further study, here's some link collections: http://www.sil.org/sgml/xml.html http://www.ucc.ie/xml/ Cheers /F fredrik@pythonware.com http://www.pythonware.com From jprewitt@justintime.com Thu Sep 3 01:55:27 1998 From: jprewitt@justintime.com (Johnny Prewitt) Date: Wed, 02 Sep 1998 17:55:27 -0700 Subject: [XML-SIG] XML Parsing, SF Message-ID: <35EDE8FF.E251D73B@justintime.com> Just in Time Solutions is a serious, pre I.P.O., product development organization.. Just in Time Solutions is developing and deploying Internet bill presentment software. We are leaders in Internet billing and are well positioned to capitalize on this emerging market. We are seeking a Senior Engineer to support our development by analyzing existing parser routines and tools and selecting and implementing the appropriate tool. This effort will involve JAVA, C++, XML and development of APIs. Upon implementation of a parser solution, the Engineer will be assigned other development tasks. As a young, 60-employee organization, based in San Francisco, Just in Time Solutions offers a relaxed, casual work environment and generous compensation including stock options. Our development environment is primarily Java/CORBA. If you know of someone who may be interested, give us a call, or pass the word along. We offer a “bounty “ for referrals that come to work with us. Johnny Prewitt, Recruiting Manager Just In Time Solutions 444 De Haro St. Suite 100 San Francisco, CA. 94107 Tel. 415-553-6481 or 888-652-0864 x6481 Fax 415-553-6496 www.justintime.com From MHammond@skippinet.com.au Thu Sep 3 05:58:37 1998 From: MHammond@skippinet.com.au (Mark Hammond) Date: Thu, 3 Sep 1998 14:58:37 +1000 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? Message-ID: <012601bdd6f7$d3daf410$1301a8c0@bobcat.skippinet.com.au> For no good reason at all, I am toying with the idea of the following little mini-project. If you dont use MSWindows, and/or dont use MSIE, then read no further! MSIE has the concept of "Favorites" (Bookmarks in Netscape speak) built in at the Operating System level. It is really quite trivial - a special folder (directory) called "Favorites" exists, and this is filled with normal Windows95 "shortcuts". If this folder contains sub-folders, then these are shown as sub-menus on the favorites menu. It has bothered me for a while that this makes it quite hard to "publish" (or even archive) the Favourites. So my idea for a mini project is: * Python code can locate and traverse this "favorites" directory. It can use the Windows "shortcuts" API to determine the underlying URL, and other attributes (such as the time the link was last updated, etc). * The above code can generate XML - the attributes for each shortcut can appear in the XML. * Code can be written to format the XML into pretty HTML, so people could publish their favorites, as seemed to be common a while ago * Later code could be written to parse an existing XML file, and update the favorites themselves. This would allow me to send my favorites to someone else, and have them imported locally, for example. As I said, a fairly useless little tool, but does appear to me to be a reasonable starting point to get me going with XML (and at the same time beef up Python's Win95 shell integration features :-) The main benefit is some direct XML experience. Anyone care to help with this? Im happy to do all the Windows specific stuff, but the XML stuff will no doubt cause me to struggle somewhat...Two minds are better than 1, even if they are both clueless :-) Mark. From larsga@ifi.uio.no Thu Sep 3 08:22:38 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Thu, 03 Sep 1998 09:22:38 +0200 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: <012601bdd6f7$d3daf410$1301a8c0@bobcat.skippinet.com.au> Message-ID: <3.0.1.32.19980903092238.00686650@ifi.uio.no> * Mark Hammond > >As I said, a fairly useless little tool, but does appear to me to be a >reasonable starting point to get me going with XML (and at the same time >beef up Python's Win95 shell integration features :-) The main benefit is >some direct XML experience. How about calling it XML Bookmark Exchange Language (XBEL) and adding conversion routines to and from Netscape bookmarks and Opera bookmarks? It could still do what you suggested, but would actually be useful as well... :-) >Anyone care to help with this? Im happy to do all the Windows specific >stuff, but the XML stuff will no doubt cause me to struggle somewhat...Two >minds are better than 1, even if they are both clueless :-) Why not make a stab at the XML stuff and post it here for comments? 184 minds should be even better than 2. :-) --Lars M. From digitome@iol.ie Thu Sep 3 09:06:35 1998 From: digitome@iol.ie (Sean Mc grath) Date: Thu, 03 Sep 1998 09:06:35 +0100 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? Message-ID: <1.5.4.32.19980903080635.008f7348@gpo.iol.ie> Mark, I am very glad to help on the XML side. What is more, if you wish, I can include the XML application as a Python/XML example in my next book which I am currently working on. Why don't we do it all here on the XML-SIG. At 02:58 PM 9/3/98 +1000, you wrote: >For no good reason at all, I am toying with the idea of the following little >mini-project. If you dont use MSWindows, and/or dont use MSIE, then read no >further! > >MSIE has the concept of "Favorites" (Bookmarks in Netscape speak) built in >at the Operating System level. It is really quite trivial - a special >folder (directory) called "Favorites" exists, and this is filled with normal >Windows95 "shortcuts". If this folder contains sub-folders, then these are >shown as sub-menus on the favorites menu. > >It has bothered me for a while that this makes it quite hard to "publish" >(or even archive) the Favourites. > >So my idea for a mini project is: >* Python code can locate and traverse this "favorites" directory. It can >use the Windows "shortcuts" API to determine the underlying URL, and other >attributes (such as the time the link was last updated, etc). >* The above code can generate XML - the attributes for each shortcut can >appear in the XML. >* Code can be written to format the XML into pretty HTML, so people could >publish their favorites, as seemed to be common a while ago >* Later code could be written to parse an existing XML file, and update the >favorites themselves. This would allow me to send my favorites to someone >else, and have them imported locally, for example. > >As I said, a fairly useless little tool, but does appear to me to be a >reasonable starting point to get me going with XML (and at the same time >beef up Python's Win95 shell integration features :-) The main benefit is >some direct XML experience. > >Anyone care to help with this? Im happy to do all the Windows specific >stuff, but the XML stuff will no doubt cause me to struggle somewhat...Two >minds are better than 1, even if they are both clueless :-) > >Mark. > > >_______________________________________________ >XML-SIG maillist - XML-SIG@python.org >http://www.python.org/mailman/listinfo/xml-sig > > Sean Mc Grath http://www.digitome.com/sean.htm +353 96 47391 "Imagine a world without hypothetical situations..." From fredrik@pythonware.com Thu Sep 3 11:28:02 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 3 Sep 1998 11:28:02 +0100 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? Message-ID: <013601bdd725$8ab0db50$f29b12c2@pythonware.com> >MSIE has the concept of "Favorites" (Bookmarks in Netscape speak) built in >at the Operating System level. It is really quite trivial - a special >folder (directory) called "Favorites" exists, and this is filled with normal >Windows95 "shortcuts". If this folder contains sub-folders, then these are >shown as sub-menus on the favorites menu. > >It has bothered me for a while that this makes it quite hard to "publish" >(or even archive) the Favourites. > >So my idea for a mini project is: >* Python code can locate and traverse this "favorites" directory. It can >use the Windows "shortcuts" API to determine the underlying URL, and other >attributes (such as the time the link was last updated, etc). >* The above code can generate XML - the attributes for each shortcut can >appear in the XML. >* Code can be written to format the XML into pretty HTML, so people could >publish their favorites, as seemed to be common a while ago >* Later code could be written to parse an existing XML file, and update the >favorites themselves. This would allow me to send my favorites to someone >else, and have them imported locally, for example. Here's a first stab. This is tested with MSIE 5.0 on a Swedish NT installation (so you definitely need to change the directory to run it -- a production version should of course use the registry to find out where the directory is located). Don't know if earlier versions used shell shortcuts; if that's the case, the "geturl" stuff needs to be rewritten. Cheers /F # # convert "favourites" directory to an XML file # import os, string from cgi import escape DIR = "Favoriter" # swedish version class Node: def __init__(self, name): self.name = name self.data = [] def append(self, item): self.data.append(item) def dump(self, level=0): if not level: print "" prefix = level * " " print prefix + "" if self.name: print prefix, "" + escape(self.name) + "" for item in self.data: if isinstance(item, Node): item.dump(level+1) else: name, url = item print prefix, "" print prefix, " " + escape(name) + "" print prefix, " " + escape(url) + "" print prefix, "" print prefix + "" class Bookmarks: def dump(self): self.root.dump() class MSIE(Bookmarks): # internet explorer def __init__(self): # FIXME: use registry for this! self.root = Node(None) self.path = os.path.join(os.environ["USERPROFILE"], DIR) self.__walk(self.root) def __walk(self, this, subpath=[]): # traverse favourites folder path = os.path.join(self.path, string.join(subpath, os.sep)) for file in os.listdir(path): fullname = os.path.join(path, file) if os.path.isdir(fullname): node = Node(file) this.append(node) self.__walk(node, subpath + [file]) else: url = self.__geturl(fullname) if url: this.append((os.path.splitext(file)[0], url)) def __geturl(self, file): try: fp = open(file) if fp.readline() != "[InternetShortcut]\n": return None while 1: s = fp.readline() if not s: break if s[:4] == "URL=": return s[4:-1] except IOError: pass return None bookmarks = MSIE() bookmarks.dump() From MHammond@skippinet.com.au Thu Sep 3 14:49:21 1998 From: MHammond@skippinet.com.au (Mark Hammond) Date: Thu, 3 Sep 1998 23:49:21 +1000 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? Message-ID: <004501bdd741$ac2899f0$1301a8c0@bobcat.skippinet.com.au> Thanks Lars and Sean! And Fredrik doesnt mess around - Thanks! :-) OK - Here is a simple DTD Ive come up with based on Fredriks code. If you havent run it, it will generate something like: Python Aussie Mirror - Python Language Website http://mirror.aarnet.edu.au/pub/python/www.python.org/ ... Here is my simple DTD, using Lars' "XBEL":-). Any comments? Its all way too simple :-) Then I'll have to knock up a tool to parse these back to a file structure on disk, and also something to generate a .html representation of the tree... Thanks, Mark. From larsga@ifi.uio.no Thu Sep 3 14:57:50 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Thu, 03 Sep 1998 15:57:50 +0200 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: <004501bdd741$ac2899f0$1301a8c0@bobcat.skippinet.com.au> Message-ID: <3.0.1.32.19980903155750.0076bdf8@ifi.uio.no> * Mark Hammond > >Then I'll have to knock up a tool to parse these back to a file >structure on disk, and also something to generate a .html representation of >the tree... Sounds like a suitable project to learn SAX... :) > > > > Maybe NODE should be called FOLDER? It took me a while to figure out that that was what it was meant to be. > What are MACHINE and VERSION meant to contain? Other than that it looks good to me. I'll be giving a course this weekend and can probably make an Opera-to-XBEL converter (and vice versa) while my students do their exercises. If I do I'll post it when it's done. --Lars M. From grove@infotek.no Thu Sep 3 15:21:17 1998 From: grove@infotek.no (Geir Ove Gronmo) Date: Thu, 03 Sep 1998 16:21:17 +0200 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: <004501bdd741$ac2899f0$1301a8c0@bobcat.skippinet.com.au> Message-ID: <199809031428.QAA32554@mail.infotek.no> At 23:49 03.09.98 +1000, Mark Hammond wrote: >Thanks Lars and Sean! And Fredrik doesnt mess around - Thanks! :-) > >OK - Here is a simple DTD Ive come up with based on Fredriks code. If you >havent run it, it will generate something like: > > Python > > Aussie Mirror - Python Language Website > http://mirror.aarnet.edu.au/pub/python/www.python.org/ > >.. > >Here is my simple DTD, using Lars' "XBEL":-). Any comments? Its all way too >simple :-) Then I'll have to knock up a tool to parse these back to a file >structure on disk, and also something to generate a .html representation of >the tree... Notice that names in XML are case sensitive. You'll have to use the same case in the instance as in the DTD. :-) Geir O. ================== Geir Ove Grønmo ================== | STEP Infotek as, Gjerdrumsvei 12, 0486 Oslo, Norway | | grove@infotek.no http://www.infotek.no/ | ------------------------------------------------------- From fredrik@pythonware.com Thu Sep 3 16:44:56 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 3 Sep 1998 16:44:56 +0100 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? Message-ID: <002d01bdd751$ce512870$f29b12c2@pythonware.com> > > > > > > > > > > > > > > > > Note that nodes can contain other nodes (bookmarks and nodes are mixed in the order they are found), and the top node doesn't have a name element. Let's see... Is the following valid syntax? I'm not that happy about the name "node" either... anyone have a better idea? ... And yes, MSIE also uses a timestamp for each bookmark: [InternetShortcut] URL=http://www.secretlabs.com/ Modified=107CD6B43F8ABD019D (haven't figured out how to decipher that one yet) Netscape uses at least three: ADD_DATE, LAST_VISIT, and LAST_MODIFIED (standard time_t's). How about: (where dates are stored according to http://www.w3.org/TR/NOTE-datetime or RFC1766 or something -- is there a "defacto standard" for dates in XML?) Cheers /F fredrik@pythonware.com http://www.pythonware.com From digitome@iol.ie Thu Sep 3 15:41:03 1998 From: digitome@iol.ie (Sean Mc Grath) Date: Thu, 3 Sep 1998 15:41:03 +0100 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? Message-ID: <199809031441.PAA21101@GPO.iol.ie> Mark, A couple of things. 1) XML is case sensitive. How about lowercase or CamelCase for the element type names.? 2) The characters "<" and "&" are special in XML and must be escaped if they occur as part of the content (in URL's for example you can have "&"). This does not effect your DTD, but needs to be born in mind when generating XBEL files. "&" -> "&". "<" -> "<" 3) In XML there are no standard ways of specifying lexical structure in PCDATA (yet). Attributes give better (but still unsatisfactory) control. I am thinking primarily of the date element type. is more checkable than 2005/12/01 4) There are many, many ways to go from XBEL to HTML and other formats:- DSSSL Stylesheet (James Clark's Jade) XSL StyleSheet (James Clark's XT via JPython) Custom Python Translator ... 5) There is a lot of stuff going on in XML at the moment that will all impact on XBEL as it develops:- a) Rendering via XSL b) Hypertext linking via XLink c) Namespaces (making the vocabulary of XBEL formally public and documented via a DTD) d) DCD - A proposal for a more powerful schema language for XML than DTDs I suggest we keep it all very simple for now! Even as it stands a tool like sgrep - structured grepping - really shows up the advantage of XBEL. [Mark Hammond] > > > > > > > > > > > > > > > > > > > Sean Mc Grath - http://www.digitome.com/sean.htm XML by Example:Building E-Commerce Applications (http://www.amazon.com/exec/obidos/ISBN=0139601627/digitomeelectronA/) ParseMe.1st - SGML for Software Developers (http://www.amazon.com/exec/obidos/ISBN=0134889673/digitomeelectronA/) From larsga@ifi.uio.no Thu Sep 3 15:57:41 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Thu, 03 Sep 1998 16:57:41 +0200 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: <002d01bdd751$ce512870$f29b12c2@pythonware.com> Message-ID: <3.0.1.32.19980903165741.0073f310@ifi.uio.no> * Fredrik Lundh > >Note that nodes can contain other nodes (bookmarks and nodes are >mixed in the order they are found), and the top node doesn't have >a name element. Let's see... Is the following valid syntax? > > Yes, but I'm a bit uneasy about making NAME optional. Maybe we should have a separate element for the top NODE? >How about: > > Looks good to me; Opera has CREATED and VISITED. >(where dates are stored according to http://www.w3.org/TR/NOTE-datetime >or RFC1766 or something -- is there a "defacto standard" for dates in XML?) Not at present, but ISO 8601 looks like a likely candidate. I think the 19980902 variant of ISO 8601 is the best one. --Lars M. From fredrik@pythonware.com Thu Sep 3 17:20:24 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 3 Sep 1998 17:20:24 +0100 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? Message-ID: <002901bdd756$c28f9170$f29b12c2@pythonware.com> >1) XML is case sensitive. How about lowercase or CamelCase for the element >type names.? CamelCase!? Look what the CamelFolks have to say about that: Also, instead of writing your variables with leading caps for all of the words like this: MyVariableForLoop. We should instead use the underscore and write it like this: my_variable_for_loop. [tchrist] gave several good reasons, including the fact that Perl is now a global language and it can be hard for those who speak English as a second language to read the variables. Also, we are used to having spaces in words so it makes it more readable for us too. (http://www.perl.com/pace/pub/perldocs/1998/08/show/day4.html) On the other hand, they also say: ...if we need comments in our code, then we didn't write it properly... which only shows that one might as well ignore them... >2) The characters "<" and "&" are special in XML and must be escaped if they >occur as part of the content (in URL's for example you can have "&"). >This does not effect your DTD, but needs to be born in mind when generating >XBEL files. "&" -> "&". "<" -> "<" Hey, my class did that... >3) In XML there are no standard ways of specifying lexical structure in >PCDATA (yet). Attributes give better (but still unsatisfactory) control. >I am thinking primarily of the date element type. > > Ouch! ;-) >I suggest we keep it all very simple for now! Even as it stands a tool like >sgrep - structured grepping - really shows up the advantage of XBEL. sgrep? Is this an existing utility? Where do I find it? Cheers /F fredrik@pythonware.com http://www.pythonware.com From Jack.Jansen@cwi.nl Thu Sep 3 16:17:10 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Thu, 03 Sep 1998 17:17:10 +0200 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: Message by Lars Marius Garshol , Thu, 03 Sep 1998 15:57:50 +0200 , <3.0.1.32.19980903155750.0076bdf8@ifi.uio.no> Message-ID: Looking at Marks DTD (and the code it is based upon) I noticed that I would have done things differently: I would have used elements only for the BOOKMARK and NODE items, and used attributes for the rest. Can anyone enlighten me which method is best, and why? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fleck@informatik.uni-bonn.de Thu Sep 3 17:19:26 1998 From: fleck@informatik.uni-bonn.de (Markus Fleck) Date: Thu, 03 Sep 1998 18:19:26 +0200 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? References: <002901bdd756$c28f9170$f29b12c2@pythonware.com> Message-ID: <35EEC18E.5C8A@informatik.uni-bonn.de> Fredrik Lundh wrote: > >I suggest we keep it all very simple for now! Even as it stands a tool like > >sgrep - structured grepping - really shows up the advantage of XBEL. > > sgrep? Is this an existing utility? Where do I find it? "sgrep" allows for the definition of "regions" of text, which may then be searched selectively. For example, I use the following "sgrep" macros to split the text-only version of the Python FAQ (the version that gets posted to comp.lang.python.announce) into sections: --- CUT --- define(FAQ1, (("1" in "\n\n1. ") .. ( "-\n\n" in "-\n\n2. ") )) define(FAQ2, (("2" in "\n\n2. ") .. ( "-\n\n" in "-\n\n3. ") )) define(FAQ3, (("3" in "\n\n3. ") .. ( "-\n\n" in "-\n\n4. ") )) define(FAQ4, (("4" in "\n\n4. ") .. ( "-\n\n" in "-\n\n5. ") )) define(FAQ5, (("5" in "\n\n5. ") .. ( "-\n\n" in "-\n\n6. ") )) define(FAQ6, (("6" in "\n\n6. ") .. ( "-\n\n" in "-\n\n7. ") )) define(FAQ7, (("7" in "\n\n7. ") .. ( "-\n\n" in "-\n\n8. ") )) define(FAQ8, (("8" in "\n\n8. ") .. (("-\n\n" in "-\n\n1. ") or end) )) --- CUT --- Invoking "sgrep FAQ2 FAQ.txt" would then spit out section 2 of the FAQ. You can also use "sgrep" to search only "Subject: " fields in mail headers of a mailbox file, or only

-tagged text in an HTML file. "sgrep" can be found at . Yours, Markus. -- //////////////////////////////////////////////////////////////////////////// Markus B Fleck - University of Bonn - CS Department IV - fleck@isoc.de UNIX Administrator - comp.lang.python.announce Moderator PINN Open Source Internet Groupware Project - http://cscw.net/pinn/ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ From fermigie@math.jussieu.fr Thu Sep 3 17:27:24 1998 From: fermigie@math.jussieu.fr (Stefane Fermigier) Date: Thu, 3 Sep 1998 18:27:24 +0200 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: <013601bdd725$8ab0db50$f29b12c2@pythonware.com>; from Fredrik Lundh on Thu, Sep 03, 1998 at 11:28:02AM +0100 References: <013601bdd725$8ab0db50$f29b12c2@pythonware.com> Message-ID: <19980903182724.A19852@riemann.math.jussieu.fr> On Thu, Sep 03, 1998 at 11:28:02AM +0100, Fredrik Lundh wrote: > > Here's a first stab. This is tested with MSIE 5.0 on a Swedish NT installation > (so you definitely need to change the directory to run it -- a production version > should of course use the registry to find out where the directory is located). Similar program using pydom: import os, string from xml.dom.writer import XmlWriter from xml.dom.core import * # ROOT_DIR = 'favorites' # Fix this on your machine, I don't have NT. dom_factory = DOMFactory() class Name(Element): def __init__(self, name): Element.__init__(self, 'Name') self.appendChild(dom_factory.createTextNode(name)) class Url(Element): def __init__(self, url): Element.__init__(self, 'Url') self.appendChild(dom_factory.createTextNode(url)) class Folder(Element): def __init__(self, folder_name): Element.__init__(self, 'Folder') self.appendChild(Name(folder_name)) class Bookmark(Element): def __init__(self, name, url): Element.__init__(self, 'Bookmark') self.appendChild(Name(name)) self.appendChild(Url(url)) # One could also use the factory everywhere instead of defining these # classes. # This class is almost untouched from Frederik's version. class MSIE: def __init__(self): self.root = Folder('') self.path = ROOT_DIR # Fix this if you're on Windows. self.__walk(self.root) def __walk(self, this, subpath=[]): # traverse favourites folder path = os.path.join(self.path, string.join(subpath, os.sep)) for file_name in os.listdir(path): fullname = os.path.join(path, file_name) if os.path.isdir(fullname): node = Folder(file_name) this.appendChild(node) self.__walk(node, subpath + [file]) else: url = self.__geturl(fullname) if url: this.appendChild(Bookmark(os.path.splitext(file_name)[0], url)) def __geturl(self, file): try: fp = open(file) if fp.readline() != "[InternetShortcut]\n": return None while 1: s = fp.readline() if not s: break if s[:4] == "URL=": return s[4:-1] except IOError: pass return None bookmarks = MSIE() writer = XmlWriter() writer.newline_after_start = ['Folder', 'Bookmark'] writer.newline_after_end = ['Name', 'Url', 'Folder', 'Bookmark'] writer.write(bookmarks.root) Cheers, S. -- Stéfane Fermigier, MdC à l'Université Paris 7. Tel: 01.44.27.61.01 (Bureau). , , . "Python is so much easier to write and experiment with that I write it in Python first, then translate to Java if necessary - despite being the author of a Java book!" Gordon McMillan From wunder@infoseek.com Thu Sep 3 17:53:58 1998 From: wunder@infoseek.com (Walter Underwood) Date: Thu, 03 Sep 1998 09:53:58 -0700 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: <002901bdd756$c28f9170$f29b12c2@pythonware.com> Message-ID: <3.0.5.32.19980903095358.00bf7290@corp> >>3) In XML there are no standard ways of specifying lexical structure in >>PCDATA (yet). Attributes give better (but still unsatisfactory) control. >>I am thinking primarily of the date element type. >> >> On the other hand, there are times to specify structure without using XML. The web profile of the ISO 8601 date format works fine in this case. See http://www.w3.org/TR/NOTE-datetime for the details. Here are some versions of the above using ISO 8601: 2005-12-01 2005-12-01 and so on. By the way, thanks for all the work on XML parsing. We're using this to add XML support in future versions of Ultraseek Server, our Python-based search engine. wunder Walter R. Underwood wunder@infoseek.com wunder@best.com (home) http://www.best.com/~wunder/ 1-408-543-6946 From mss@transas.com Thu Sep 3 19:48:14 1998 From: mss@transas.com (Michael Sobolev) Date: Thu, 3 Sep 1998 22:48:14 +0400 Subject: [XML-SIG] DTDs.. Message-ID: <19980903224814.A14927@transas.com> Hi, I am trying to figure out how the processed DTD is stored. I took xvcmd.py program that comes with python-xml (debian) distribution and parsed the document. Then I executed parser's get_dtd method (this, I guess, contains the DTD). How can I reverse engineer the DTD for my document? Or, to be more precise, how am I supposed to walk through content_model information of an element? TIA, -- Mike From jtauber@jtauber.com Fri Sep 4 06:33:04 1998 From: jtauber@jtauber.com (James Tauber) Date: Fri, 4 Sep 1998 13:33:04 +0800 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? Message-ID: <006a01bdd7c5$bcd13bc0$bc6118cb@caleb> >>>3) In XML there are no standard ways of specifying lexical structure in >>>PCDATA (yet). That's not *entirely* true. You can use notation attributes. > 2005-12-01 > > 2005-12-01 > The best would be something similar to the third one: 2005-12-01 where scheme is a notation attribute and iso-8601 is a notation referencing the ISO standard. James -- James Tauber / jtauber@jtauber.com http://www.jtauber.com/ Lecturer and Associate Researcher Electronic Commerce Network ( http://www.xmlinfo.com/ Curtin Business School ( http://www.xmlsoftware.com/ Perth, Western Australia ( http://www.schema.net/ From larsga@ifi.uio.no Fri Sep 4 16:16:49 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Fri, 04 Sep 1998 17:16:49 +0200 Subject: [XML-SIG] DTDs.. In-Reply-To: <19980903224814.A14927@transas.com> Message-ID: <3.0.5.32.19980904171649.007b0520@ifi.uio.no> * Michael Sobolev > >I am trying to figure out how the processed DTD is stored. I took xvcmd.py >program that comes with python-xml (debian) distribution and parsed the >document. Then I executed parser's get_dtd method (this, I guess, contains the >DTD). It returns an object that contains the DTD information, yes. >How can I reverse engineer the DTD for my document? Or, to be more precise, >how am I supposed to walk through content_model information of an element? The content model of elements is parsed into a parse tree, converted to a non-deterministic finite automaton and then converted from there to a deterministic finite automaton. The original parse tree is then discarded, which means that you basically don't have any means of getting back to the original content model. However, if you can tell me what it is you need I may add it to the next version. The current DTD interface is just what I needed to implement validation, and may not be optimal for other kinds of uses. --Lars M. From mss@transas.com Fri Sep 4 18:54:56 1998 From: mss@transas.com (Michael Sobolev) Date: Fri, 4 Sep 1998 21:54:56 +0400 Subject: [XML-SIG] DTDs.. In-Reply-To: <3.0.5.32.19980904171649.007b0520@ifi.uio.no>; from Lars Marius Garshol on Fri, Sep 04, 1998 at 05:16:49PM +0200 References: <19980903224814.A14927@transas.com> <3.0.5.32.19980904171649.007b0520@ifi.uio.no> Message-ID: <19980904215456.A17805@transas.com> On Fri, Sep 04, 1998 at 05:16:49PM +0200, Lars Marius Garshol wrote: > The content model of elements is parsed into a parse tree, converted to a > non-deterministic finite automaton and then converted from there to a > deterministic finite automaton. The original parse tree is then discarded, > which means that you basically don't have any means of getting back to the > original content model. You meant that I likely to get an equivalent form? > However, if you can tell me what it is you need I may add it to the next > version. The current DTD interface is just what I needed to implement > validation, and may not be optimal for other kinds of uses. Basically, I need more documentation. It is not obvious how to get all defined elements, for example. And more examples, if possible. :) What I want to know is how: to obtain the list of public identifiers from catalog; to parse a specific DTD (using its public or system id); to get DTD information for a given document. Under DTD information I understand the list of elements (with theirs attributes) and a way for figuring out how the elements may follow one another. Having written my previous message, I understood what the content_model is, and how to make use of it. I am only afraid that since it is not documented (and, therefore, is not fixed) it may easily be changed should you find a different way for validating XML files against DTD. Regards, -- Mike From larsga@ifi.uio.no Fri Sep 4 21:19:21 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Fri, 04 Sep 1998 22:19:21 +0200 Subject: [XML-SIG] DTDs.. In-Reply-To: <19980904215456.A17805@transas.com> References: <3.0.5.32.19980904171649.007b0520@ifi.uio.no> <19980903224814.A14927@transas.com> <3.0.5.32.19980904171649.007b0520@ifi.uio.no> Message-ID: <3.0.5.32.19980904221921.007b2e60@ifi.uio.no> * Lars Marius Garshol > > The original parse tree is then discarded, > which means that you basically don't have any means of getting back to the > original content model. * Michael Sobolev > > You meant that I likely to get an equivalent form? (I assume there's a 'not' missing in that sentence.) Correct. If this is important to you I may consider adding a way to preserve the original content model structure. I've thought about doing so, but since nobody seemed to use the DTD interface I haven't bothered so far. * Michael Sobolev > >Basically, I need more documentation. It is not obvious how to get all >defined elements, for example. Not so strange, since you can't. :) Another thing I've been thinking about, but haven't yet added. It's just two or three lines, so I'll put it in in a couple of days. Expect it in the next release. >And more examples, if possible. :) Maybe I can add an example program that does something interesting with DTD information. >What I want to know is how: > > to obtain the list of public identifiers from catalog; Currently you can't. I'll add this. > to parse a specific DTD (using its public or system id); Hmmm. You can do this now by using the DTDParser class in the xmlproc module. Give it a DTDConsumer (see the DTD API doco) to receive events. I want to move the DTDParser and clean up the interface a little, so I haven't documented this yet, but the DTDParser understands the same methods as XMLProcessor, expcept that you set the DTD handler with 'set_dtd_consumer'. Note that this will break in a future version. > to get DTD information for a given document. Hmmm. Since you already know about the get_dtd method, I'm not sure what more you want. >Under DTD information I understand the list of elements (with theirs attributes) >and a way for figuring out how the elements may follow one another. Having >written my previous message, I understood what the content_model is, and how >to make use of it. I am only afraid that since it is not documented (and, >therefore, is not fixed) it may easily be changed should you find a different >way for validating XML files against DTD. It's not likely to change since the current method seems to work pretty well, but, yes, you do run that risk. This isn't a finished product and I want to keep my options open here... :) However, if you there's some specific information about the content models you want I'll see what I can do. Do you want to be able to reconstruct the original syntax of the declarations, or is there something else you want? --Lars M. From mss@transas.com Fri Sep 4 22:01:23 1998 From: mss@transas.com (Michael Sobolev) Date: Sat, 5 Sep 1998 01:01:23 +0400 Subject: [XML-SIG] DTDs.. In-Reply-To: <3.0.5.32.19980904221921.007b2e60@ifi.uio.no>; from Lars Marius Garshol on Fri, Sep 04, 1998 at 10:19:21PM +0200 References: <3.0.5.32.19980904171649.007b0520@ifi.uio.no> <19980903224814.A14927@transas.com> <3.0.5.32.19980904171649.007b0520@ifi.uio.no> <19980904215456.A17805@transas.com> <3.0.5.32.19980904221921.007b2e60@ifi.uio.no> Message-ID: <19980905010123.A22973@transas.com> On Fri, Sep 04, 1998 at 10:19:21PM +0200, Lars Marius Garshol wrote: > >Basically, I need more documentation. It is not obvious how to get all > >defined elements, for example. > Not so strange, since you can't. :) Another thing I've been thinking about, > but haven't yet added. It's just two or three lines, so I'll put it in in a > couple of days. Expect it in the next release. Well, with current version I can easily get all elements that are used for defining root element, can't I? For the most cases, it's sufficient. > >And more examples, if possible. :) > Maybe I can add an example program that does something interesting with > DTD information. Yes, please. > > to obtain the list of public identifiers from catalog; > Currently you can't. I'll add this. This would be nice. > > to parse a specific DTD (using its public or system id); > > Hmmm. You can do this now by using the DTDParser class in the xmlproc > module. Give it a DTDConsumer (see the DTD API doco) to receive events. > I want to move the DTDParser and clean up the interface a little, so I > haven't documented this yet, but the DTDParser understands the same > methods as XMLProcessor, expcept that you set the DTD handler with > 'set_dtd_consumer'. Note that this will break in a future version. What I mean here is rather an example than functionality. :) Regards, -- Mike From akuchlin@cnri.reston.va.us Fri Sep 4 22:06:18 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Fri, 4 Sep 1998 17:06:18 -0400 (EDT) Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: <3.0.5.32.19980903095358.00bf7290@corp> References: <002901bdd756$c28f9170$f29b12c2@pythonware.com> <3.0.5.32.19980903095358.00bf7290@corp> Message-ID: <13808.21923.568032.333624@newcnri.cnri.reston.va.us> Walter Underwood writes: >By the way, thanks for all the work on XML parsing. We're using >this to add XML support in future versions of Ultraseek Server, >our Python-based search engine. That's very interesting. Can you say anything about the level of the API you're using? That is, are you using xmllib.py, xmllib.py + sgmlop.c, the PyExpat module, or something higher-level such as SAX? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Given time, you'll spin a yarn of what we saw in the ocean. Given time I'll tell the tale of the handsome cabin boy. But given enough time and the right audience, the darkest of secrets scum over into mere curiosities. -- Hob Gadling, in SANDMAN #53: "Hob's Leviathan" From wunder@infoseek.com Fri Sep 4 22:36:09 1998 From: wunder@infoseek.com (Walter Underwood) Date: Fri, 04 Sep 1998 14:36:09 -0700 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: <13808.21923.568032.333624@newcnri.cnri.reston.va.us> References: <3.0.5.32.19980903095358.00bf7290@corp> <002901bdd756$c28f9170$f29b12c2@pythonware.com> <3.0.5.32.19980903095358.00bf7290@corp> Message-ID: <3.0.5.32.19980904143609.00c2b8e0@corp> At 05:06 PM 9/4/98 -0400, Andrew Kuchling wrote: >Walter Underwood writes: >>By the way, thanks for all the work on XML parsing. We're using >>this to add XML support in future versions of Ultraseek Server, >>our Python-based search engine. > > That's very interesting. Can you say anything about the level >of the API you're using? That is, are you using xmllib.py, xmllib.py >+ sgmlop.c, the PyExpat module, or something higher-level such as SAX? Still on xmllib.py (version 0.1), since the work was first done back in May. I'm planning on moving to SAX, and dropping in a faster parser, probably via sgmlop support. We're using XML in another part of the engine, but that is not speed-sensitive. The search engine only requires that the XML be well-formed, since it doesn't really need to know about the DTD, just the text that remains after parsing. Well, we do pay attention to one tag -- the first or <TITLE> tag is considered to be the title of the document for purposes of displaying search hits. If people don't mind a commercial announcement, I'll let the list know when we release the XML-savvy version. wunder Walter R. Underwood wunder@infoseek.com wunder@best.com (home) http://www.best.com/~wunder/ 1-408-543-6946 From larsga@ifi.uio.no Sat Sep 5 07:38:16 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Sat, 05 Sep 1998 08:38:16 +0200 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: <3.0.5.32.19980904143609.00c2b8e0@corp> References: <13808.21923.568032.333624@newcnri.cnri.reston.va.us> <3.0.5.32.19980903095358.00bf7290@corp> <002901bdd756$c28f9170$f29b12c2@pythonware.com> <3.0.5.32.19980903095358.00bf7290@corp> Message-ID: <3.0.5.32.19980905083816.007b62f0@ifi.uio.no> * Walter Underwood > >The search engine only requires that the XML be well-formed, since it >doesn't really need to know about the DTD, just the text that remains >after parsing. Well, we do pay attention to one tag -- the first <title> >or <TITLE> tag is considered to be the title of the document for >purposes of displaying search hits. Hmmm. Have you considered using architectural forms to give page authors more freedom, but still allow you to discover which elements are the equivalents of 'TITLE' and 'AUTHOR' etc? --Lars M. From larsga@ifi.uio.no Sat Sep 5 15:37:12 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Sat, 05 Sep 1998 16:37:12 +0200 Subject: [XML-SIG] Bookmark parsers Message-ID: <3.0.5.32.19980905163712.0079e8f0@ifi.uio.no> Here are some scripts to convert from MSIE, Opera and Netscape bookmarks to Opera, Netscape and XBEL. There's hardly any support for created, visited and modified. Fredriks code has been looted to get the MSIE support. Testing has been minimal so far. (adr_parse.py) """ Small utility to parse Opera bookmark files. """ import string,bookmark # --- Constants short_months={"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05", "Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10", "Nov":"11","Dec":"12"} # --- Parsing exception class OperaParseException(Exception): pass # --- Methods def readfield(infile,fieldname): line=infile.readline() pos=string.find(line,fieldname+"=") if pos==-1: raise OperaParseException("Field '%s' missing" % fieldname) return line[pos+len(fieldname)+1:-1] def swallow_rest(infile): "Reads input until first blank line." while 1: line=infile.readline() if line=="" or line=="\n": break def parse_date(date): # CREATED=904923783 (Fri Sep 04 17:43:03 1998) # VISITED=0 (?) lp=string.find(date,"(") rp=string.find(date,")") if lp==-1 or rp==-1: raise OperaParseException("Date without parentheses") if date[lp:rp+1]=="(?)": return None month=short_months[date[lp+5:lp+8]] day=date[lp+9:lp+11] year=date[rp-4:rp] return "%s%s%s" % (year,month,day) def parse_adr(filename): bms=bookmark.Bookmarks() infile=open(filename) version=infile.readline() while 1: line=infile.readline() if line=="": break if line[:-1]=="#FOLDER": name=readfield(infile,"NAME") created=parse_date(readfield(infile,"CREATED")) visited=parse_date(readfield(infile,"VISITED")) order=readfield(infile,"ORDER") swallow_rest(infile) bms.add_folder(name,created,visited) elif line[:-1]=="#URL": name=readfield(infile,"NAME") url=readfield(infile,"URL") created=parse_date(readfield(infile,"CREATED")) visited=parse_date(readfield(infile,"VISITED")) order=readfield(infile,"ORDER") swallow_rest(infile) bms.add_bookmark(name,created,visited,url) elif line[:-1]=="-": bms.leave_folder() return bms # --- Test-program bms=parse_adr(r"c:\programfiler\opera\opera3.adr") bms.dump_netscape() (msie_parse.py) """ Small utility to convert MSIE favourites to an object structure. Originally written by Fredrik Lundh. """ import bookmark,os,string DIR = "Favoritter" # Norwegian version #USRDIR = os.environ["USERPROFILE"] # NT version USRDIR = r"c:\windows" # 95 version class MSIE: # internet explorer def __init__(self,bookmarks): # FIXME: use registry for this! self.bms=bookmarks self.root = None self.path = os.path.join(USRDIR, DIR) self.__walk() def __walk(self, subpath=[]): # traverse favourites folder path = os.path.join(self.path, string.join(subpath, os.sep)) for file in os.listdir(path): fullname = os.path.join(path, file) if os.path.isdir(fullname): self.bms.add_folder(file,None,None) self.__walk(subpath + [file]) else: url = self.__geturl(fullname) if url: self.bms.add_bookmark(os.path.splitext(file)[0],None, None,url) def __geturl(self, file): try: fp = open(file) if fp.readline() != "[InternetShortcut]\n": return None while 1: s = fp.readline() if not s: break if s[:4] == "URL=": return s[4:-1] except IOError: pass return None # --- Testprogram msie=MSIE(bookmark.Bookmarks()) msie.bms.dump_xbel() (ns_parse.py) """ Small utility that parses Netscape bookmarks. """ from xml.sax import saxexts,saxlib import bookmark # --- SAX handler for Netscape bookmarks class NetscapeHandler(saxlib.HandlerBase): def __init__(self): self.bms=bookmark.Bookmarks() self.cur_elem=None self.added=None self.url=None self.visited=None self.last_modified=None def startElement(self,name,attrs): if name=="h3": self.cur_elem="h3" self.added=attrs["add_date"] elif name=="a": self.cur_elem="a" self.added=attrs["add_date"] self.url=attrs["href"] self.visited=attrs["last_visit"] self.last_modified=attrs["last_modified"] def characters(self,data,start,length): if self.cur_elem=="h3": self.bms.add_folder(data[start:start+length],None,None) elif self.cur_elem=="a": self.bms.add_bookmark(data[start:start+length],None,None,self.url) def endElement(self,name): if name=="h3": self.cur_elem=None elif name=="dl": self.bms.leave_folder() elif name=="a": self.cur_elem=None # --- Main program ns_handler=NetscapeHandler() p=saxexts.SGMLParserFactory.make_parser() p.setDocumentHandler(ns_handler) p.parseFile(open(r"h:/internet/netscape/bookmark.htm")) ns_handler.bms.dump_netscape() (bookmark.py) """ Classes to store bookmarks and dump them to XBEL. """ import sys,string # --- Class for bookmark container class Bookmarks: def __init__(self): self.folders=[] self.folder_stack=[] def add_folder(self,name,created,visited): nf=Folder(name,created,visited) if self.folder_stack==[]: self.folders.append(nf) else: self.folder_stack[-1].add_child(nf) self.folder_stack.append(nf) def add_bookmark(self,name,created,visited,url): nb=Bookmark(name,created,visited,url) if self.folder_stack!=[]: self.folder_stack[-1].add_child(nb) else: self.folders.append(nb) def leave_folder(self): if self.folder_stack!=[]: del self.folder_stack[-1] def dump_xbel(self,out=sys.stdout): out.write("<XBEL>\n") for folder in self.folders: folder.dump_xbel(out) out.write("<XBEL>") def dump_adr(self,out=sys.stdout): out.write("Opera Hotlist version 2.0\n\n") for folder in self.folders: folder.dump_adr(out) def dump_netscape(self,out=sys.stdout): out.write("<!DOCTYPE NETSCAPE-Bookmark-file-1>\n") out.write("<!-- This is an automatically generated file.\n") out.write("It will be read and overwritten.\n") out.write("Do Not Edit! -->\n") out.write("<TITLE>Skriv HELE NAVNET her's Bookmarks\n") out.write("

Skriv HELE NAVNET her's Bookmarks

\n\n") out.write("

\n") for folder in self.folders: folder.dump_netscape(out) out.write("

\n") # --- Superclass for folder and bookmarks class Node: def __init__(self,name,created,visited): self.name=name self.created=created self.visited=visited # --- Class for folders class Folder(Node): def __init__(self,name,created,visited): Node.__init__(self,name,created,visited) self.children=[] def add_child(self,child): self.children.append(child) def dump_xbel(self,out): out.write(" \n") out.write(" %s\n" % self.name) for child in self.children: child.dump_xbel(out) out.write(" \n") def dump_adr(self,out): out.write("#FOLDER\n") out.write("\tNAME=%s\n" % self.name) out.write("\tCREATED=%s\n" % "0 (?)") out.write("\tVISITED=%s\n" % "0 (?)") out.write("\tORDER=-1\n") out.write("\n") for child in self.children: child.dump_adr(out) out.write("\n") out.write("-\n") def dump_netscape(self,out): out.write("

%s

\n" % self.name) out.write("

\n") for child in self.children: child.dump_netscape(out) out.write("

\n") # --- Class for bookmarks class Bookmark(Node): def __init__(self,name,created,visited,url): Node.__init__(self,name,created,visited) self.url=url def dump_xbel(self,out): out.write(" \n") out.write(" %s\n" % self.name) out.write(" %s\n" % self.url) if self.created!=None: out.write(" %s\n" % self.created) if self.visited!=None: out.write(" %s\n" % self.visited) out.write(" %s\n" % (self.url,self.name)) --Lars M. From lisarein@finetuning.com Sun Sep 6 00:04:55 1998 From: lisarein@finetuning.com (Lisa Rein) Date: Sat, 05 Sep 1998 16:04:55 -0700 Subject: [XML-SIG] Re: XML-SIG digest, Vol 1 #90 - 7 msgs References: <199809051600.MAA10334@python.org> Message-ID: <35F1C397.317347AF@finetuning.com> Walter R. Underwood said: > > The search engine only requires that the XML be well-formed, since it > doesn't really need to know about the DTD, just the text that remains > after parsing. Well, we do pay attention to one tag -- the first > or <TITLE> tag is considered to be the title of the document for > purposes of displaying search hits. > Hello Walter: I am very curious how exactly XML is being utilized in the search engine if the only tag being taken into account is the (first) TITLE tag (just like a search engine would use during a "bag of words" approach) and not using a DTD -- making any semantic associations impossible. If you're not going to deal with the text until after it's parsed, why are you using XML? Are you doing some kind of indexing or another variation I haven't of? Do tell ;-) Thanks, lisa rein http://www.finetuning.com/editor.html From stuart.hungerford@webone.com.au Sun Sep 6 07:24:02 1998 From: stuart.hungerford@webone.com.au (Stuart Hungerford) Date: Sun, 6 Sep 1998 16:24:02 +1000 Subject: [XML-SIG] Status of XML python stuff on Win32? Message-ID: <000101bdd95e$f3718be0$0b2c08d2@alderman> Folks, Can someone tell me what the status of the Python XMl software tools is for the Win32 platform? I believe xmlproc should work "out of the box", but the collection of tools would need extra work? From akuchlin@cnri.reston.va.us Sun Sep 6 15:43:26 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 6 Sep 1998 10:43:26 -0400 Subject: [XML-SIG] Marshalling to XML, again Message-ID: <199809061443.KAA00545@207-172-112-146.s146.tnt4.ann.erols.com> Here's another version of the xml.marshal module. (That name will have to be changed now, though, since xml.marshal uses Python's original marshal to handle code objects. Any suggestions?) For example, take the recursive list produced by this code: recursive_list = [None, 1, pow(3,65L), '<fake tag>', 1+5j] recursive_list.append( recursive_list ) Here's the marshalled version (pretty-printed; the module just produces one long line): <?xml version="1.0"?> <!DOCTYPE marshal SYSTEM "marshal.dtd"> <marshal> <list id="i135737736"> <none/> <integer>1</integer> <long>10301051460877537453973547267843</long> <string><fake tag></string> <complex> <float>1.0</float> <float>5.0</float> </complex> <reference id="i135737736"/> </list> </marshal> The DTD for the marshalling format is available as the __dtd__ attribute of the module; does this seem like a useful convention for future modules? Comments on the code, DTD, etc. are welcome. There's been some discussion of marshalling scripting language data types on the Casbah list and on the Perl-XML list recently; Dave Winer's suggestion for XML-RPC bears some relation to this. It would be very useful if some common DTD was agreed upon, which would allow painlessly exchanging data between Python and Perl, Frontier, or whatever. (However, I lack the time to read all the relevant mailing lists and agitate for a specification.) If no such common DTD arises, is this module still useful, and should it be included? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ In a modern university if you ask for knowledge they will provide it in almost any form -- though if you ask for out-of-fashion things they may say, like the people in shops, "Sorry, there's no call for it." -- Robertson Davies, _The Rebel Angels_ # xml.marshal : Marshals simple Python data types into an XML-based # format. The interface is the same as the built-in module of the # same name, with four functions: # dump(value, file), load(file) # dumps(value), loads(string) from types import * import string __dtd__ = """ <!ELEMENT marshal (integer | string | float | long | complex | code | none | tuple | list | dictionary)> <!ELEMENT none EMPTY> <!ELEMENT reference EMPTY> <!ELEMENT integer (#PCDATA)> <!ELEMENT string (#PCDATA)> <!ELEMENT float (#PCDATA)> <!ELEMENT long (#PCDATA)> <!ELEMENT code (#PCDATA)> <!ELEMENT complex (float, float)> <!ELEMENT tuple (integer | string | float | long | complex | code | none | tuple | list | dictionary | reference)*> <!ELEMENT list (integer | string | float | long | complex | code | none | tuple | list | dictionary | reference)*> <!ELEMENT dictionary ( (integer | string | long | float | complex | code | tuple | reference ), (integer | string | float | long | complex | code | none | tuple | list | dictionary | reference) )* > <!ATTLIST list id ID #REQUIRED> <!ATTLIST dictionary id ID #REQUIRED> <!ATTLIST reference id IDREF #REQUIRED> """ # Dictionary mapping some of the simple types to the corresponding tag _mapping = {StringType:'string', IntType:'integer', FloatType:'float'} # XML version and DOCTYPE declaration PROLOGUE = """<?xml version="1.0"?> <!DOCTYPE marshal SYSTEM "marshal.dtd"> """ def _marshal(value, dict): L = [] t = type(value) ; i = str( id(value) ) if dict.has_key( i ): # This object has already been marshalled, so # emit a reference element. L.append( '<reference id="i%s"/>' % (i, ) ) elif _mapping.has_key( t ): # Some simple type: integer, string, or float name = _mapping[t] L.append( '<'+name + '>') s = str(value) if '&' in s or '>' in s or '<' in s: s = string.replace(s, '&', '&') s = string.replace(s, '<', '<') s = string.replace(s, '>', '>') L.append( s ) L.append( '</' + name + '>') elif t == LongType: L.append('<long>%s</long>' % (str(value)[:-1],) ) elif t == TupleType: L.append( '<tuple>') for elem in value: L = L + _marshal(elem, dict) L.append( '</tuple>') elif t == ListType: dict[ i ] = 1 L.append( '<list id="i%s">' %(i,) ) for elem in value: L = L + _marshal(elem, dict) L.append( '</list>') elif t == DictType: dict[ i ] = 1 L.append( '<dictionary id="i%s">' %(i,) ) for key, v in value.items(): L = L + _marshal(key, dict) L = L + _marshal(v, dict) L.append( '</dictionary>') elif t == NoneType: L.append( '<none/>') elif t == ComplexType: # XXX should it be <complex><real>...</real><imag>...</imag></complex>? L.append( '<complex><float>' ) L.append( str(value.real) ) L.append( '</float><float>' ) L.append( str(value.imag) ) L.append( '</float>' ) L.append( '</complex>' ) elif t == CodeType: # The full information about code objects is only available # from the C level, so we'll use the built-in marshal module # to convert the code object into a string, and include it in # the HTML. import marshal, base64 L.append( '<code>' ) s = marshal.dumps(value) s = base64.encodestring(s) L.append( s ) L.append( '</code>' ) dict[ i ] = 'code' return L from xml.sax import saxlib DICT = 'dict' ; LIST = 'list' ; TUPLE='tuple' class _unmarshalHandler(saxlib.HandlerBase): def __init__(self): saxlib.HandlerBase.__init__(self) def startElement(self, name, attrs): if name == 'marshal': self.dict = {} self.data_stack = [] return elif name == 'reference': assert attrs.has_key('id') id = attrs['id'] assert self.dict.has_key(id) self.data_stack.append( self.dict[id] ) if name=='dictionary': self.data_stack.append(DICT) d = {} id = attrs[ 'id'] self.dict[ id ] = d self.data_stack.append( d ) elif name=='list': self.data_stack.append(LIST) L = [] id = attrs[ 'id'] self.dict[ id ] = L self.data_stack.append( L ) elif name=='tuple': self.data_stack.append(TUPLE) else: self.data_stack.append( [] ) def characters(self, ch, start, length): self.data_stack[-1].append(ch[start:start+length]) def endElement(self, name): ds = self.data_stack if name == 'string': ds[-1] = string.join(ds[-1], "") elif name == 'integer': ds[-1] = string.join(ds[-1], "") ds[-1] = string.atoi( ds[-1] ) elif name == 'long': ds[-1] = string.join(ds[-1], "") ds[-1] = string.atol( ds[-1] ) elif name == 'float': ds[-1] = string.join(ds[-1], "") ds[-1] = string.atof( ds[-1] ) elif name == 'none': ds[-1] = None elif name == 'complex': c = ds[-2] + ds[-1]*1j ds[-3:] = [c] elif name == 'code': import marshal, base64 s = string.join(ds[-1], "") s = base64.decodestring( s ) ds[-1] = marshal.loads(s) elif name == 'dictionary': for index in range(len(ds)-1, -1, -1): if ds[index] is DICT: break assert index!=-1 d = ds[index+1] for i in range(index+2, len(ds), 2): key = ds[i] ; value =ds[i+1] d[key] = value ds[index:] = [ d ] elif name == 'list': for index in range(len(ds)-1, -1, -1): if ds[index] is LIST: break assert index!=-1 L = ds[index+1] L[:] = ds[index+2 : len(ds)] ds[index:] = [ L ] elif name == 'tuple': for index in range(len(ds)-1, -1, -1): if ds[index] is TUPLE: break assert index!=-1 t = tuple( ds[index+1 : len(ds)] ) ds[index:] = [ t ] def dump(value, file): "Write the value on the open file" L = _marshal(value, {} ) L = [PROLOGUE + '<marshal>'] + L + ['</marshal>'] file.write( string.join(L, "") ) def load(file): "Read one value from the open file" h = _unmarshalHandler() from xml.sax import saxexts p=saxexts.make_parser() p.setDocumentHandler(h) p.parseFile(file) return h.data_stack[0] def dumps(value): "Marshal value, returning the resulting string" L = _marshal(value, {} ) L = [PROLOGUE + '<marshal>'] + L + ['</marshal>'] return string.join(L, "") def loads(string): "Read one value from the string" import StringIO file = StringIO.StringIO(string) return load(file) if __name__ == '__main__': print "Testing XML marshalling..." L=[None, 1, pow(2,123L), 19.72, 1+5j, "here is a string & a <fake tag> ", (1,2,3), ['alpha', 'beta', 'gamma'], {'key':'value', 1:2}, dumps.func_code ] # Try all the above bits of data import StringIO for item in L + [ L ]: s = dumps(item) print s output = loads(s) # Try it from a file file = StringIO.StringIO() dump(item, file) file.seek(0) output2 = load(file) print repr(item), s assert item==output and item==output2 and output==output2 recursive_list = [None, 1, pow(3,65L), '<fake tag>', 1+5j] recursive_list.append( recursive_list ) s = dumps(recursive_list) print s output = loads(s) print repr(output) From mss@transas.com Sun Sep 6 18:12:21 1998 From: mss@transas.com (Michael Sobolev) Date: Sun, 6 Sep 1998 21:12:21 +0400 Subject: [XML-SIG] a small question Message-ID: <19980906211221.A9066@transas.com> Is this declaration is valid? <!ENTITY % lang.params "lang CDATA #REQUIRED"> <!ELEMENT comment (#PCDATA)> <!ATTLIST comment %lang.params;> If no, what exactly is incorrect? If yes, why xmlproc does not process it properly? :) TIA, -- Mike From colds@nwlink.com Sun Sep 6 20:06:07 1998 From: colds@nwlink.com (Chris Olds) Date: Sun, 06 Sep 1998 12:06:07 -0700 Subject: [XML-SIG] a small question References: <19980906211221.A9066@transas.com> Message-ID: <35F2DD1F.E5C2D166@nwlink.com> This is legal unless it is in the internal subset, i.e. the part of the DTD in the document instance. In the document instance, parameter entities can only appear where an entity, element or attribute list declaration can appear, and must yield a complete declaration. As for why xmlproc, I haven't tried it on this yet (a complete document example and an explanation of what you mean by not processing properly would help). Michael Sobolev wrote: > > Is this declaration is valid? > > <!ENTITY % lang.params "lang CDATA #REQUIRED"> > <!ELEMENT comment (#PCDATA)> > <!ATTLIST comment > %lang.params;> > > If no, what exactly is incorrect? If yes, why xmlproc does not process > it properly? :) /cco From mss@transas.com Sun Sep 6 20:46:28 1998 From: mss@transas.com (Michael Sobolev) Date: Sun, 6 Sep 1998 23:46:28 +0400 Subject: [XML-SIG] a small question In-Reply-To: <35F2DD1F.E5C2D166@nwlink.com>; from Chris Olds on Sun, Sep 06, 1998 at 12:06:07PM -0700 References: <19980906211221.A9066@transas.com> <35F2DD1F.E5C2D166@nwlink.com> Message-ID: <19980906234628.A6858@transas.com> OK. My document looks like: <?xml version="1.0"?> <!DOCTYPE info SYSTEM "my.dtd"> <info> ... </info> my.dtd: <!ENTITY % common.decl SYSTEM "common.mod"> <!-- other similar declarations --> %common.decl; common.mod: > > <!ENTITY % lang.params "lang CDATA #REQUIRED"> > > <!ELEMENT comment (#PCDATA)> > > <!ATTLIST comment > > %lang.params;> What I get: /home/mss/xml/common.mod:8:5: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* /home/mss/xml/common.mod:8:5: Whitespace expected here /home/mss/xml/common.mod:8:5: Expected type or alternative list /home/mss/xml/common.mod:16:5: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* /home/mss/xml/common.mod:16:5: Whitespace expected here /home/mss/xml/common.mod:16:5: Expected type or alternative list /home/mss/xml/common.mod:22:5: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* /home/mss/xml/common.mod:22:5: Whitespace expected here /home/mss/xml/common.mod:22:5: Expected type or alternative list /home/mss/xml/other.mod:15:5: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* /home/mss/xml/other.mod:15:5: Whitespace expected here /home/mss/xml/other.mod:15:5: Expected type or alternative list info.xml:7:22: Unknown attribute 'lang' Where common.mod:8:5 is for first %lang.params;. Is it correct usage? -- Mike From Jack.Jansen@cwi.nl Sun Sep 6 22:29:31 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Sun, 06 Sep 1998 23:29:31 +0200 Subject: [XML-SIG] Marshalling to XML, again In-Reply-To: Message by "A.M. Kuchling" <amk1@erols.com> , Sun, 6 Sep 1998 10:43:26 -0400 , <199809061443.KAA00545@207-172-112-146.s146.tnt4.ann.erols.com> Message-ID: <UTC199809062129.XAA15618.jack@snelboot.cwi.nl> Recently, "A.M. Kuchling" <amk1@erols.com> said: > There's been some discussion of marshalling scripting language data > types on the Casbah list and on the Perl-XML list recently; Dave > Winer's suggestion for XML-RPC bears some relation to this. It would > be very useful if some common DTD was agreed upon, which would allow > painlessly exchanging data between Python and Perl, Frontier, or > whatever. This used to be my view, but after a bit more thinking I think that what we want is not a common DTD but a number of easily convertible DTDs, possibly with a common subset. The various object types in the various languages each have their ideosyncracies, and it may be important to keep these. Unless you have to convert the data structures to your language of choice, in which case you want to read the objects in the most logical but still representable form, of course. The question then is whether it is possible, upon reading an XML representation of a yet-unknown language, to automatically convert the objects to the nearest representation of your language. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From gstein@lyra.org Sun Sep 6 22:25:50 1998 From: gstein@lyra.org (Greg Stein) Date: Sun, 06 Sep 1998 14:25:50 -0700 Subject: [XML-SIG] Marshalling to XML, again References: <199809061443.KAA00545@207-172-112-146.s146.tnt4.ann.erols.com> Message-ID: <35F2FDDE.3C979D78@lyra.org> A.M. Kuchling wrote: > > Here's another version of the xml.marshal module. (That name will > have to be changed now, though, since xml.marshal uses Python's > original marshal to handle code objects. Any suggestions?) Why does it need to change? > For example, take the recursive list produced by this code: > ... > > There's been some discussion of marshalling scripting language data > types on the Casbah list and on the Perl-XML list recently; Dave > Winer's suggestion for XML-RPC bears some relation to this. It would > be very useful if some common DTD was agreed upon, which would allow > painlessly exchanging data between Python and Perl, Frontier, or > whatever. (However, I lack the time to read all the relevant mailing > lists and agitate for a specification.) Dave, MSFT, and another company are defining an XML-based RPC thing, which they're calling SOAP (Simple Object Access Protocol). They haven't released a spec yet, but the intent to provide a low-level RPC that would slide in underneath the various Distributed Object systems. This would allow, say, a Windows-based system to use DCOM to call an object on a Linux system, where the calls and parameters are marshalled in XML. The data types that they would use will follow what the IE5 version of MSXML can do for data typing. It is detailed here: http://www.microsoft.com/xml/authoring/dataTypes/dataTypes.htm I'm not on those lists -- are there web archives somewhere? I'd be interested in reading those threads. > If no such common DTD arises, is this module still useful, and should > it be included? Yes, definitely. When SOAP is completed, I'd like to hook up the Linux end of it :-) (and the marshalling will be needed). Sure, it could change or whatever, but for any type of RPC system, the XML-based marshalling done by this module will be cool. Note: a Python client talking to a Python server could recognize that fact, marshal using the builtin, and then embed the data into a PCDATA element (or maybe encode using base64 for simplicity). It would fall back to the above marshalling for unknown targets. MSFT will similarly try to use a faster marshalling between its platform ("interoperable, but it works better if you use Windows all around" is always their motto :-) -g -- Greg Stein (gstein@lyra.org) From MHammond@skippinet.com.au Sun Sep 6 14:29:24 1998 From: MHammond@skippinet.com.au (Mark Hammond) Date: Sun, 6 Sep 1998 23:29:24 +1000 Subject: [XML-SIG] XBEL DTD Message-ID: <01e101bdd9ed$22d0fcc0$1301a8c0@bobcat.skippinet.com.au> Fredrik and Jack both hit exactly on 2 questions I had. I would really like some comments on them. Jack asked: > I would have used elements only > for the BOOKMARK and NODE items, > and used attributes for the rest. >Can anyone enlighten me which method is best, and why? Any comments? The best I came up with is that attributes require quotes, and elements dont?? But logically I agree many of these things are actually attributes. Should they be attributes instead of elements? And Fredrik asked: >a name element. Let's see... Is the following valid syntax? > > <!ELEMENT NODE (NAME?, (BOOKMARK|NODE)+)> I have no idea, and I could not find an answer myself. Im glad you noticed! I am running with it for now :-) Sean asked about the CaseOfTheTags?? No one seemed to go with that idea? I kinda like it. And lastly, the discussion on dates seemed to settle with James indicating the XML would look like: <date scheme="iso-8601">2005-12-01</date> But I am unsure what this means to the DTD?? So the new DTD (only a few mods) now looks like: <!-- DTD for XBEL - XML Bookmark Exchange Language --> <!ELEMENT XBEL (INFO, FOLDER+)> <!ELEMENT FOLDER (NAME?, (BOOKMARK|FOLDER)+)> <!ELEMENT BOOKMARK (NAME, URL, ADDED?, VISITED?, MODIFIED?)> <!ELEMENT INFO (OWNER, DATE?, MACHINENAME?)> <!ELEMENT OWNER (#PCDATA)> <!ELEMENT MACHINENAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT URL (#PCDATA)> <!ELEMENT ADDED (#PCDATA)> <!ELEMENT VISITED (#PCDATA)> <!ELEMENT MODIFIED (#PCDATA)> Mark. From digitome@iol.ie Mon Sep 7 08:20:54 1998 From: digitome@iol.ie (Sean Mc grath) Date: Mon, 07 Sep 1998 08:20:54 +0100 Subject: [XML-SIG] XBEL DTD Message-ID: <1.5.4.32.19980907072054.0094dfdc@gpo.iol.ie> [Mark Hammond] >Fredrik and Jack both hit exactly on 2 questions I had. I would really like >some comments on them. Jack asked: > >> I would have used elements only >> for the BOOKMARK and NODE items, >> and used attributes for the rest. >>Can anyone enlighten me which method is best, and why? > >Any comments? The best I came up with is that attributes require quotes, >and elements dont?? But logically I agree many of these things are actually >attributes. Should they be attributes instead of elements? The attribute versus element debate is one of the nuggets of the SGML/XML world. You can express some extra (though rather minor) validity constraints for attribute values in that they can be one of a pre-defined set of types. Attribute values can also be layered on at parse-time rather than added to the document itself. This has a number of very useful applications culminating in the powerful notion of a document architecture. Lets leave document architectures alone for now... Some argue that attributes should only be used for content that is not logically part of the document. I.e. if it should not disappear when you strip tags, don't put it in an attribute. Others argue that attributes are redundant and should be used sparingly if at all. Me? I throw a small drop of 10 year old Irish Whiskey over my left shoulder whilst standing on one leg. If one of the little people appear, I use an attribute, otherewise PCDATA. > >And Fredrik asked: >>a name element. Let's see... Is the following valid syntax? >> >> <!ELEMENT NODE (NAME?, (BOOKMARK|NODE)+)> > >I have no idea, and I could not find an answer myself. Im glad you noticed! >I am running with it for now :-) Perfectly valid syntax. > >Sean asked about the CaseOfTheTags?? No one seemed to go with that idea? I >kinda like it. SoDoI. XBEL documents are gonna LOOK REALLY LOUD. all lowercase is, i think prefereable to all uppcase whatever about camelcase... > >And lastly, the discussion on dates seemed to settle with James indicating >the XML would look like: ><date scheme="iso-8601">2005-12-01</date> >But I am unsure what this means to the DTD?? XML parsers do not know anything about dates. You need to layer on a program that knows about iso-8601. The above is fine XML markup but you do not get the implied semantic check from XML. Having said that, this stuff is on the way really soon now. The first salvo was a Tim Bray propsal for data typing in XML. Then came a formal submission to the W3C called XML-Data. The latest state of play is a joint Microsoft/IBM/Tim Bray propsal called DCD. Full info to be found on W3C.ORG. Me? I use (abuse?) fixed attributes and Python:- <!ATTLIST date value CDATA #REQUIRED python-value CDATA #FIXED "Is8601Date"> I have a Python program that kicks in immediately after a parse and hunts for attributes of the form "python-X". This attribute value is treated as a Python predicate function and passed the real value of attribute X. You get the idea. I am not saying this is the way to go. I think DCD syntax is that way to go because DCD will be built into a bunch of tools including the Python ones. What I am saying, is that right now, we have to roll our own validation code for dates. Sean Mc Grath http://www.digitome.com/sean.htm +353 96 47391 "Imagine a world without hypothetical situations..." From gstein@lyra.org Mon Sep 7 09:38:34 1998 From: gstein@lyra.org (Greg Stein) Date: Mon, 07 Sep 1998 01:38:34 -0700 Subject: [XML-SIG] XBEL DTD References: <1.5.4.32.19980907072054.0094dfdc@gpo.iol.ie> Message-ID: <35F39B8A.579081E0@lyra.org> Sean Mc grath wrote: > ... > Having said that, this stuff is on the way really soon now. The first > salvo was a Tim Bray propsal for data typing in XML. Then came a formal > submission to the W3C called XML-Data. The latest state of play is > a joint Microsoft/IBM/Tim Bray propsal called DCD. Full info to be > found on W3C.ORG. Me? I use (abuse?) fixed attributes and Python:- > ... Euh... unless I'm horribly mistaken, XML-Data is an XML DTD that describes a schema for describing schemas :-) i.e. rather than using that specialized DTD syntax, you can describe the schema in XML. Of course, this implies that you can start using a host of XML tools for manipulating the actual schema. Sure, there are some parts of XML-Data that are used to define the constraints (and type) of an attribute, but it isn't very complete. -g -- Greg Stein (gstein@lyra.org) From digitome@iol.ie Mon Sep 7 09:51:16 1998 From: digitome@iol.ie (Sean Mc Grath) Date: Mon, 7 Sep 1998 09:51:16 +0100 Subject: [XML-SIG] XBEL DTD Message-ID: <199809070851.JAA18542@GPO.iol.ie> >Sean Mc grath wrote: >> ... >> Having said that, this stuff is on the way really soon now. The first >> salvo was a Tim Bray propsal for data typing in XML. Then came a formal >> submission to the W3C called XML-Data. The latest state of play is >> a joint Microsoft/IBM/Tim Bray propsal called DCD. Full info to be >> found on W3C.ORG. Me? I use (abuse?) fixed attributes and Python:- >> ... > [Greg Stein] >Euh... unless I'm horribly mistaken, XML-Data is an XML DTD that >describes a schema for describing schemas :-) Yes. A common and powerful technique in the SGML/XML world. Meta-DTDs. >i.e. rather than using >that specialized DTD syntax, you can describe the schema in XML. Of >course, this implies that you can start using a host of XML tools for >manipulating the actual schema. Right. > >Sure, there are some parts of XML-Data that are used to define the >constraints (and type) of an attribute, but it isn't very complete. > It has been superceeded by DCD. </Sean> Sean Mc Grath - http://www.digitome.com/sean.htm XML by Example:Building E-Commerce Applications (http://www.amazon.com/exec/obidos/ISBN=0139601627/digitomeelectronA/) ParseMe.1st - SGML for Software Developers (http://www.amazon.com/exec/obidos/ISBN=0134889673/digitomeelectronA/) From akuchlin@cnri.reston.va.us Mon Sep 7 18:53:09 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Mon, 7 Sep 1998 13:53:09 -0400 Subject: [XML-SIG] IBM XML developer survey Message-ID: <199809071753.NAA00568@207-172-56-245.s245.tnt12.ann.erols.com> Found this on scripting.com: IBM is running a survey of XML developers, in order to design an XML Web site resource. Interestingly, there's one section which asks you to list your skills in various areas; Python is listed along with HTML, CGI, PageMill, and various other (mostly Web-related) items. http://www.networking.ibm.com/survey/survey.nsf/surveyone This is part 1 of the survey; fill out both parts, and you get a free T-shirt. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Consumers are like roaches -- you spray them and they get immune after a while. -- David Lubars From ken@bitsko.slc.ut.us Tue Sep 8 03:12:04 1998 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 07 Sep 1998 21:12:04 -0500 Subject: [XML-SIG] Re: Marshalling to XML, again Message-ID: <m3ww7f8ia3.fsf@biff.bitsko.slc.ut.us> A.M. Kuchling <akuchlin@cnri.reston.va.us> writes: > Here's another version of the xml.marshal module. (That name will > have to be changed now, though, since xml.marshal uses Python's > original marshal to handle code objects. Any suggestions?) > For example, take the recursive list produced by this code: > recursive_list = [None, 1, pow(3,65L), '<fake tag>', 1+5j] > recursive_list.append( recursive_list ) > There's been some discussion of marshalling scripting language data > types on the Casbah list and on the Perl-XML list recently; Speaking from the Casbah project, we're just about to go 0.1 on our Lightweight Distributed Objects. 0.1 won't have the XML serialization implemented, but it is specified and will be plug-in compatible with the current binary implementation. The Python implementation is browsable at: <http://www.ntlug.org/cgi-bin/cvsweb/lotos/python/> The implementation for 0.1 includes the binary serialization, the connection, and a remote proxy (method forwarder). The last items to clean up for 0.1 are the documentation and creating a coherent release. Initial notes on the XML serialization are in text format at: <http://www.bitsko.slc.ut.us/~ken/casbah/xml-serialization.txt> The LDO equivalent serialization would look something like this: <list id=1> <null> <value>1</value> <value>10301051460877537453973547267843</value> <value><fake tag></value> <dict type="complex"> <value>real</value><value>1.0</value> <value>imaginary</value><value>5.0</value> </dict> <ref id=1> <list> In a followup message, Greg Stein <gstein@lyra.org> comments: > Note: a Python client talking to a Python server could recognize > that fact, marshal using the builtin, and then embed the data into a > PCDATA element (or maybe encode using base64 for simplicity). It > would fall back to the above marshalling for unknown targets. We leaving a hook to support this in LDO, but assuming that the built-in marshaling would completely replace the XML or binary marshaling, rather than embedding it in XML. This is more DO-SIG related, but from reading the above sample you can probably guess an issue we're facing in the Python implementation: LDO assumes that an implementation supports automatic or explicit coercion from strings to numerics. LDO supports numeric types (ints and floats), but doesn't require them. If the scenario was Python-to-Python, this wouldn't be a problem because the calling Python code would encode using a numeric type and the called Python code would decode the numeric type. The problem comes from command-line or TCL calling code talking to a Python called code, or a Python calling code talking to a shell or TCL called code -- numeric and string types aren't distinguished and Python doesn't coerce between strings and numerics. The solutions we've thought of so far are: 1) require implementations to indicate numeric values (a real problem for shells and TCL) 2) require Python code to handle remote calls specially (a problem specific to Python users that we had hoped to avoid) 3) use CORBA IDLs (no worse than Java, C++, or CORBA) 4) migrate Python to use non-math operators for string functions so that math operators can signal coercion (an unlikely option) (4) is the most elegant, but also the most difficult. For now we're starting with (2) and expecting (3) to be the ``final'' solution. -- Ken MacLeod ken@bitsko.slc.ut.us From MHammond@skippinet.com.au Tue Sep 8 02:08:50 1998 From: MHammond@skippinet.com.au (Mark Hammond) Date: Tue, 8 Sep 1998 11:08:50 +1000 Subject: [XML-SIG] XBEL DTD Message-ID: <00a601bddafe$cf377b30$1301a8c0@bobcat.skippinet.com.au> >Some argue that attributes should only be used for content that is not >logically part of the document. I.e. if it should not disappear when you >strip tags, don't put it in an attribute. Others argue that attributes >are redundant and should be used sparingly if at all. Me? I throw Hmm. This sounds like a reasonable "rule of thumb" to me. Does anyone disagree with this. This does seem to fit the existing HTML model - eg, an "IMG" tag - the size attributes dont really form part of the document. Dont know about an "anchor" tag - the HREF is an attribute - IMO this is a necessary part of the document. But if we stick with this definition, then the DTD with only elements seems correct. >a small drop of 10 year old Irish Whiskey over my left shoulder >whilst standing on one leg. If one of the little people appear, >I use an attribute, otherewise PCDATA. :-) I can relate to that! Hopefully this means you only use attributes very rarely (or after a _long_ session :-) >>Sean asked about the CaseOfTheTags?? No one seemed to go with that idea? I >>kinda like it. >SoDoI. XBEL documents are gonna LOOK REALLY LOUD. all lowercase is, i think >prefereable to all uppcase whatever about camelcase... OK - no one making noises, so I will use lower case (all our elements are single words, so no need for mixed case) Interesting about "CamelCase". Fredrik thought it means "Perl" (the obvious Camel reference). Personally, I took it as being derived from the silhouette of a real camel - the humps relate to the caps in the middle of the word. I wonder where it derived from - does it really mean "Perl"? Maybe we should call it "Kangaroo Case" ;-) (coined by someone from "skippi-net" - coincidence, or conspiracy - you be the judge :-) Mark. From larsga@ifi.uio.no Tue Sep 8 09:17:18 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Tue, 08 Sep 1998 10:17:18 +0200 Subject: [XML-SIG] a small question In-Reply-To: <19980906211221.A9066@transas.com> Message-ID: <3.0.1.32.19980908101718.006c967c@ifi.uio.no> * Michael Sobolev > >Is this declaration is valid? > > <!ENTITY % lang.params "lang CDATA #REQUIRED"> > <!ELEMENT comment (#PCDATA)> > <!ATTLIST comment > %lang.params;> Yes, this declaration is perfectly valid. >If yes, why xmlproc does not process it properly? :) xmlproc currently only allows parameter entity references between declarations and not inside them. I've now found a way to implement this (I think), so this may appear in 0.60. --Lars M. From bottoni@cadlab.it Tue Sep 8 10:00:43 1998 From: bottoni@cadlab.it (Alessandro Bottoni) Date: Tue, 8 Sep 1998 11:00:43 +0200 Subject: [XML-SIG] Any example of HTML Processing with Python/SAX? Message-ID: <005d01bddb07$28cbc9a0$172b2bc1@pc6d2.cadlab.it> I'm starting to work with Python on HTML and XML documents, so I'm looking for sample applications of HTML and XML processing with Python, XMLLIB, HTMLLIB and, most important, SAX. Does anybody knows where I could find a few good examples? (Of course, I have already sacked www.python.org , starship.skyport.net , http://www.stud.ifi.uio.no/~larsga/download/python/xml/index.html and www.pythonware.com ) Does anybody want to share any source code example/fragment with me? TIA ------------------------------ Alessandro Bottoni Technical Writer Cad.Lab SPA Bologna, Italy --------------------- From akuchlin@cnri.reston.va.us Tue Sep 8 15:18:56 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Tue, 8 Sep 1998 10:18:56 -0400 (EDT) Subject: [XML-SIG] Re: FREE DOM In-Reply-To: <35F0DA72.563D1632@totten.com> References: <35F0DA72.563D1632@totten.com> Message-ID: <13813.15100.631957.233022@newcnri.cnri.reston.va.us> John Totten writes (in a private message): >Anyone working on a Python version of SAXDOM/FREEDOM? > John Totten [Cc'ed to xml-sig@python.org, because the answer is of interest] I spent some of this weekend working on the PyDOM code, trying to bring it into compliance with the most recent DOM spec. Nothing releasable yet, though, though it hopefully won't take much longer. DOM's moving through the W3C's process faster than I expected, possibly becoming a Recommendation in September. Therefore I think a DOM implementation should be part of 1.0, instead of being postponed until after 1.0. Garbage collection is going to be a problem, though. DOM nodes allow retrieving both the parent node, and the children. The obvious implementation is to have .parent and .children attributes, but those create cycles, which will lead to uncollected garbage. One solution is to require calling a .destroy() (or similarly-named) method when you're done with a node. The method would then do something like: def destroy(self): del self.parent for i in self.children: i.destroy() del self.children This is simple to implement, but it means that you have to remember to call .destroy(). Does anyone see a representation that would avoid the necessity of doing this? I was thinking of just having .children in each node, and then there would be a global dictionary that mapped nodes to their parent objects. Because it's global, it wouldn't participate in any cycles, but cleaning it up is also a pain. Anyone have a suggestion? (Other than continually visiting Guido and whining for non-refcounting GC?) -- A.M. Kuchling http://starship.skyport.net/crew/amk/ It was a wasted life, but God forbid that one should be hard upon it, or upon anything in this world that is not deliberately and coldly wrong . . . -- Charles Dickens, in a letter to his friend John Forster. From fredrik@pythonware.com Tue Sep 8 16:53:34 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 8 Sep 1998 16:53:34 +0100 Subject: [XML-SIG] Re: FREE DOM Message-ID: <000901bddb40$d71947b0$f29b12c2@pythonware.com> > One solution is to require calling a .destroy() (or >similarly-named) method when you're done with a node. The method >would then do something like: > > def destroy(self): > del self.parent > for i in self.children: > i.destroy() > del self.children > >This is simple to implement, but it means that you have to remember to >call .destroy(). Does anyone see a representation that would avoid >the necessity of doing this? I was thinking of just having .children >in each node, and then there would be a global dictionary that mapped >nodes to their parent objects. Because it's global, it wouldn't >participate in any cycles, but cleaning it up is also a pain. Yup. *When* should you do the clean-up in that case? Since all nodes will have an extra reference (from the global dictionary), they'll never go away unless you explicitly call a cleanup function... (alright, you can have a "purge" function that kills nodes with reference count=1, and use a background thread to call that function now and then...) I definitely prefer the "destroy" pattern (or rather, I prefer to use visitors for this, but that's another story). > Anyone have a suggestion? (Other than continually visiting >Guido and whining for non-refcounting GC?) Well, I see no reason why you cannot keep on doing that as well ;-) Cheers /F PS. Does anyone have pointers to SAXDOM and/or FreeDOM? From larsga@ifi.uio.no Tue Sep 8 15:47:41 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Tue, 08 Sep 1998 16:47:41 +0200 Subject: [XML-SIG] Re: FREE DOM In-Reply-To: <000901bddb40$d71947b0$f29b12c2@pythonware.com> Message-ID: <3.0.1.32.19980908164741.0075b2e4@ifi.uio.no> * Fredrik Lundh > >PS. Does anyone have pointers to SAXDOM and/or FreeDOM? SAXDOM has changed name to FreeDOM, which has changed name to The Docuverse DOM SDK. You can find it at: <URL:http://www.docuverse.com/domsdk/index.html> When looking for free XML tools, this is (IMHO) the place to start: <URL:http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html> --Lars M. From wunder@infoseek.com Tue Sep 8 18:27:52 1998 From: wunder@infoseek.com (Walter Underwood) Date: Tue, 08 Sep 1998 10:27:52 -0700 Subject: [XML-SIG] Useless fun thing for XML - comments or helpers? In-Reply-To: <3.0.5.32.19980905083816.007b62f0@ifi.uio.no> References: <3.0.5.32.19980904143609.00c2b8e0@corp> <13808.21923.568032.333624@newcnri.cnri.reston.va.us> <3.0.5.32.19980903095358.00bf7290@corp> <002901bdd756$c28f9170$f29b12c2@pythonware.com> <3.0.5.32.19980903095358.00bf7290@corp> Message-ID: <3.0.5.32.19980908102752.00a48df0@corp> At 08:38 AM 9/5/98 +0200, Lars Marius Garshol wrote: >* Walter Underwood >> >> [...] Well, we do pay attention to one tag -- the first <title> >>or <TITLE> tag is considered to be the title of the document for >>purposes of displaying search hits. > >Hmmm. Have you considered using architectural forms to give page authors >more freedom, but still allow you to discover which elements are the >equivalents of 'TITLE' and 'AUTHOR' etc? The general form of our answer for feature requests is "if paying customers want it, we'll look at it". Of course, we're providing XML even though we only have one customer asking for it (so far). The Architectural Forms proposal looks interesting, and I actually hope it catches on, since it could make our job easier. The search engine only needs to know a little bit of info, basically, what is content, what is meta-content, and what is formatting. Actual interpretation and display is the job of some other program. That is why the search engine only needs well-formed XML, rather than valid XML. But a *small* set of common base architectural forms could allow the parser to sort out some of the basic data/metadata elements. Interestingly, this supports the earlier rule-of-thumb in the attribute vs. element discussion. If it is something that should be searchable, represent it with an element. At 04:04 PM 9/5/98 -0700, Lisa Rein wrote: >I am very curious how exactly XML is being utilized in the search engine >if the only tag being taken into account is the (first) TITLE tag (just >like a search engine would use during a "bag of words" approach) and not >using a DTD -- making any semantic associations impossible. > >If you're not going to deal with the text until after it's parsed, why >are you using XML? Are you doing some kind of indexing or another >variation I haven't of? Do tell ;-) The goal is to make XML documents "findable" via web search. If we treated them as raw text, the elements names would show up in search results and would swamp queries like "xml" or "doctype" with irrelevant hits. Parsing the XML allows us to give quality results. Being independent of the DTD allows us to handle the widest variety of documents. So far, that looks like a "sweet spot" in XML support. DTD-specific search can get very complex, very fast. Remember, the web server still serves the document. The search engine only provides a URL to it. So the search engine just needs enough info to serve a URL. Anything else gets in the way. One clarification -- this feature is for the Ultraseek Server product (http://software.infoseek.com), a search engine that people can buy and run locally. Ultraseek Server features are somewhat indpendent of features for www.infoseek.com, the on-line search service. Finally, the XML market is very new, and this will be the first release of our XML support. As the market matures, customers will tell us what they want and don't want, and we'll respond. wunder Walter R. Underwood wunder@infoseek.com wunder@best.com (home) http://www.best.com/~wunder/ 1-408-543-6946 From mss@transas.com Tue Sep 8 19:47:27 1998 From: mss@transas.com (Michael Sobolev) Date: Tue, 8 Sep 1998 22:47:27 +0400 Subject: [XML-SIG] yet another question. Message-ID: <19980908224727.A24349@transas.com> Let's suppose the following DTD. <!ELEMENT foo (bar+)> <!ELEMENT bar (#PCDATA)> I believe that the following text conforms the above specification: <foo> <bar>1</bar> <bar>2</bar> </foo> If I run pyexpat parser on the above text, I will get something like: start_element foo pcdata \n pcdata ' ' start_element bar pcdata 1 end_element bar pcdata \n pcdata ' ' start_element bar pcdata 2 end_element bar pcdata \n end_element foo This is fine since expat is not a validating parser. What should I expect from a validating one? After the declaration, foo cannot have any pcdata at all. TIA, -- Mike From ken@bitsko.slc.ut.us Tue Sep 8 20:06:03 1998 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: 08 Sep 1998 14:06:03 -0500 Subject: [XML-SIG] Re: FREE DOM In-Reply-To: "Fredrik Lundh"'s message of Tue, 8 Sep 1998 16:53:34 +0100 References: <000901bddb40$d71947b0$f29b12c2@pythonware.com> Message-ID: <m3k93e8lwk.fsf@biff.bitsko.slc.ut.us> "Fredrik Lundh" <fredrik@pythonware.com> writes: > > One solution is to require calling a .destroy() (or > >similarly-named) method when you're done with a node. The method > >would then do something like: > > > > def destroy(self): > > del self.parent > > for i in self.children: > > i.destroy() > > del self.children > > > >This is simple to implement, but it means that you have to remember > >to call .destroy(). Does anyone see a representation that would > >avoid the necessity of doing this? I was thinking of just having > >.children in each node, and then there would be a global dictionary > >that mapped nodes to their parent objects. Because it's global, it > >wouldn't participate in any cycles, but cleaning it up is also a > >pain. > I definitely prefer the "destroy" pattern (or rather, I prefer to > use visitors for this, but that's another story). I've used a proxy-iterator to solve this problem and it seems to be working well. When you build the tree, don't include parent references. But when somebody asks for a tree object, return a proxy for the tree object that includes a parent reference. Create iterator methods in the proxy object that return new proxies with a correct parent proxy-iterator. The proxy-iterator classes are shadow classes for the object model classes, so there's a one-to-one correspondence. The tree objects end up being simple data objects, it's the proxy-iterator that conforms to the DOM interface. For any ``active'' proxy-iterators, there will be a reference, but as soon as the proxy-iterator is collected, the reference will go away, leaving only the root of the tree as the primary reference -- release the root and the entire tree is collected. A side benefit of the proxy-iterator is that you can now share tree fragments during processing, because the child-parent relationship is contained in the proxy-iterator, not in the tree. -- Ken MacLeod ken@bitsko.slc.ut.us From akuchlin@cnri.reston.va.us Tue Sep 8 20:33:24 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Tue, 8 Sep 1998 15:33:24 -0400 (EDT) Subject: [XML-SIG] yet another question. In-Reply-To: <19980908224727.A24349@transas.com> References: <19980908224727.A24349@transas.com> Message-ID: <13813.33682.769880.357155@amarok.cnri.reston.va.us> Michael Sobolev writes: > <foo> > <bar>1</bar> > <bar>2</bar> > </foo> >This is fine since expat is not a validating parser. What should I >expect from a validating one? After the declaration, foo cannot have >any pcdata at all. Consult the annotated XML spec at www.xml.com. Section 2.10 discusses this: An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content. "Element content" is defined in section 3.2.1 as: An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S). So, a validating parser must still tell the application that this whitespace is present, though it might not use the same mechanism it uses for #PCDATA content. For example, in the SAX interface there's a method called ignorableWhitespace that would be used. I'd imagine that few applications will care about this, since few will treat <bar>1</bar>\n<bar>2</bar> differently from <bar>1</bar><bar>2</bar>. XML editors are probably the big exception to this, since an editor would want to preserve whitespace when editing a document. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Most of my ideas were rejected and I got used to it. One can get fond of almost anything, even rejection. -- Tom Baker, in his autobiography From fredrik@pythonware.com Tue Sep 8 21:49:49 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 8 Sep 1998 21:49:49 +0100 Subject: [XML-SIG] Re: FREE DOM Message-ID: <001b01bddb6a$39945280$f29b12c2@pythonware.com> Ken MacLeod wrote: > > /F wrote: >> I definitely prefer the "destroy" pattern (or rather, I prefer to >> use visitors for this, but that's another story). > >I've used a proxy-iterator to solve this problem and it seems to be >working well. Now that you mention it... (short break while /F loads the "opal" project into opal) module opal.core.XML: class XMLNode class XMLParser class XMLTreeBuilder(XMLParser) class XMLIterator def load def dump (You're right, of course; if it doesn't violate the COM API, this is a much better solution...) Cheers /F fredrik@pythonware.com http://www.pythonware.com From fleck@informatik.uni-bonn.de Tue Sep 8 21:47:32 1998 From: fleck@informatik.uni-bonn.de (Markus Fleck) Date: Tue, 08 Sep 1998 22:47:32 +0200 Subject: [XML-SIG] HUMOR: oos.org - "Our Own Standards"... Message-ID: <35F597E4.4AA5@informatik.uni-bonn.de> Hi! For all those who are tired of reading (and implementing) overly complicated standards documents, you might enjoy having a quick look at <http://www.oos.org>, the "Our Own Standards" organization :-), who have just published their LML ("Lightweight Markup Language") specification, v1.1. Their rule is "KEIS", "Keep It Even Simpler". The cool thing is that LML is HTML backwards-compatible and can be displayed by any of the more popular WWW browsers... OOS.ORG is offering Basic Membership for US$2.500 per year (you won't be allowed to make suggestions, and may not take part in the decision-making process, though :-), and Full Membership at US$25.000 per year (note: special offer for first 200 applicants only). If you send them a nice anecdote "about why you dislike all those new standards" to <mailto:info@oss.org>, you may also qualify for their "Oppressed Engineer Support Plan". Have fun. :-) Yours, Markus. PS: Please CC any anecdotes... :-) -- SCSI: System Can't See It ISDN: It Still Does Nothing PCMCIA: People Can't Memorize Computer Industry Acronyms TWAIN: Technology Without An Interesting Name (really!) From fredrik@pythonware.com Wed Sep 9 15:28:50 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 9 Sep 1998 15:28:50 +0100 Subject: [XML-SIG] XBEL DTD Message-ID: <009501bddbfe$2c1c7300$f29b12c2@pythonware.com> >>Some argue that attributes should only be used for content that is not >>logically part of the document. I.e. if it should not disappear when you >>strip tags, don't put it in an attribute. Others argue that attributes >>are redundant and should be used sparingly if at all. Me? I throw > >Hmm. This sounds like a reasonable "rule of thumb" to me. Does anyone >disagree with this. The following just appeared in my mailbox: From: "John E. Simpson" <simpson@POLARIS.NET> Subject: Re: Attributes and Elements To: XML-L@LISTSERV.HEANET.IE >What is the different between attributes and elements and when should they be >used? Look here for Robin Cover's excellent discussion of the issues: http://www.sil.org/sgml/elementsAndAttrs.html (The domain has changed, but I don't have the new one at hand -- for now, the above URL will work.) Cheers /F From akuchlin@cnri.reston.va.us Wed Sep 9 15:03:06 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 9 Sep 1998 10:03:06 -0400 (EDT) Subject: [XML-SIG] Re: FREE DOM In-Reply-To: <m3k93e8lwk.fsf@biff.bitsko.slc.ut.us> References: <000901bddb40$d71947b0$f29b12c2@pythonware.com> <m3k93e8lwk.fsf@biff.bitsko.slc.ut.us> Message-ID: <13814.35247.953457.226574@amarok.cnri.reston.va.us> Ken MacLeod writes: >When you build the tree, don't include parent references. But when >somebody asks for a tree object, return a proxy for the tree object >that includes a parent reference. Create iterator methods in the >proxy object that return new proxies with a correct parent >proxy-iterator. That seems like a reasonable strategy, but how do you determine what the parent reference should be, in general? It's obviously trivial to construct a proxy for some special cases, such as the children of a node, but how would you find the parent of a node without actually storing a reference to it? Storing a non-reference, such as an integer ID? Walking the tree? Something else? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ First, you must know what the thing is, and then after learn the use of the same. -- Robert Recorde From fredrik@pythonware.com Wed Sep 9 17:34:46 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 9 Sep 1998 17:34:46 +0100 Subject: [XML-SIG] Re: FREE DOM Message-ID: <00e701bddc0f$c291d800$f29b12c2@pythonware.com> > That seems like a reasonable strategy, but how do you >determine what the parent reference should be, in general? It's >obviously trivial to construct a proxy for some special cases, such as >the children of a node, but how would you find the parent of a node >without actually storing a reference to it? Storing a non-reference, >such as an integer ID? Walking the tree? Something else? The iterator uses a a parent list which is updated when you move around in the tree. If you go down, it adds the current node to the parent list. If you go up, it removes a node. Cheers /F From ken@bitsko.slc.ut.us Wed Sep 9 17:38:04 1998 From: ken@bitsko.slc.ut.us (Ken MacLeod) Date: Wed, 9 Sep 1998 11:38:04 -0500 (CDT) Subject: [XML-SIG] Re: FREE DOM Message-ID: <199809091638.LAA13523@bitsko.slc.ut.us> Fredrik Lundh writes: > Andrew M. Kuchling writes: > > That seems like a reasonable strategy, but how do you determine > >what the parent reference should be, in general? It's obviously > >trivial to construct a proxy for some special cases, such as the > >children of a node, but how would you find the parent of a node > >without actually storing a reference to it? Storing a > >non-reference, such as an integer ID? Walking the tree? Something > >else? > The iterator uses a a parent list which is updated when you move > around in the tree. If you go down, it adds the current node to the > parent list. If you go up, it removes a node. The way I implemented it, I created a new proxy object for next, prev, first_child, etc. The proxy object carried a `parent' member that pointed back to the parent _proxy_, so instead of a list it was a chain back up to the parent. In this case, the proxy-iterator isn't an iterator in the sense that it has a ``current node'' and you point the iterator to new nodes by calling next, prev, first_child, etc. Instead, the iterator functions actually return a new proxy. This technique also allows you to pass proxy-iterators around as easily as nodes themselves. From bwaumg@urc.tue.nl Wed Sep 9 21:31:11 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Wed, 09 Sep 1998 22:31:11 +0200 Subject: [XML-SIG] XBEL DTD Message-ID: <199809092031.WAA11519@asterix.urc.tue.nl> Hi, For the purpose of discussion I added my attempt at what is dubbed the XBEL DTD. I took Mark Hammond's as a starting point. I scoped it a little wider and included most features of the Netscape bookmark format. Between MSIE and NS the latter offers more and the bookmark organizer is much much better. When a user saves a bookmark file to XBEL I think it should be possible to convert it back to Netscape without much loss of data. Although this useless fun thing started with Mark's idea for the MS favourites I would like the DTD to be able to express the NS format too (dunno about the Opera bookmark format). Here are most of the changes with Mark's initial DTD: - optional 'folded' attribute (to restore the state of a folder NS) - optional description element for 'folder' and 'bookmark' elements (NS) - added 'id' attribute to 'folder' and 'bookmark' - added 'alias' element with 'ref' attribute to reference bookmarks (NS) maybe this can be implemented with shortcuts on MS Favourites? - simplified the top level to (info,folder) - changed 'name' into 'title' and made it a required element - put all timestamps in attributes (where 'added' belongs to 'bookmark' and the other to the 'url') - no dates only timestamps (more precise and all these attributes can be treated with the same code) - optional 'separator' element (not very useful but NS uses it in it's menu's) - added 'added' timestamp attribute for folders too - added link checking attributes to 'url' element (MSIE offers the possibility to subscribe and notify you of changes so we need the last checked time and status code) Some other issues: - Are duplicate names/titles allowed? Since Favourites use the filename they are restricted in characters/length it is also not possible to have duplicate names (there is no way the parser could check for this). - Should the folder hierarchy be a forest or a tree. In NS the top level can have a description and title instead of adding these to the xbel element using (info,folder) or (info?,folder) lets the folder element itself take care of that. - The visited, modified, etc attributes belong to the url not to the bookmarks itself. The added attribute belongs to the bookmark element As I said this is for discussion. I looked at it mainly from the modelling side and did not consider implementation in any of the Python XML parsers. --- Marc bwaumg@urc.tue.nl Here's the DTD: ================ snip snip snip ================== <!ELEMENT xbel (info, folder)> <!ATTLIST xbel version CDATA #IMPLIED > <!-- contents of info needs some more thought. Adding a meta --> <!-- element (like in HTML) makes this open-ended --> <!ELEMENT info (owner,machinename)> <!ELEMENT owner (#PCDATA)> <!ELEMENT machinename (#PCDATA)> <!ELEMENT folder (title, desc?, (bookmark|folder|separator|alias)+)> <!ATTLIST folder id ID #IMPLIED added CDATA #IMPLIED folded (yes|no) 'yes' > <!ELEMENT bookmark (title,desc?,url)> <!ATTLIST bookmark id ID #IMPLIED added CDATA #IMPLIED > <!ELEMENT title (#PCDATA)> <!ELEMENT desc (#PCDATA)> <!ELEMENT url (#PCDATA)> <!ATTLIST url visited CDATA #IMPLIED modified CDATA #IMPLIED response CDATA #IMPLIED checked CDATA #IMPLIED > <!ELEMENT separator EMPTY> <!ELEMENT alias EMPTY> <!ATTLIST alias ref IDREF #REQUIRED > From Fred L. Drake, Jr." <fdrake@acm.org Wed Sep 9 21:55:57 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 9 Sep 1998 16:55:57 -0400 (EDT) Subject: [XML-SIG] XBEL DTD In-Reply-To: <199809092031.WAA11519@asterix.urc.tue.nl> References: <199809092031.WAA11519@asterix.urc.tue.nl> Message-ID: <13814.60253.831905.443040@weyr.cnri.reston.va.us> This is starting to look like a potentially interesting bookmarks format. Once the DTD shapes up (and it looks like it's well on the way with Marc's contribution), I'll add support for XBEL in Grail. ;-) -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From heaney@mail.cambridge.scr.slb.com Fri Sep 11 15:01:28 1998 From: heaney@mail.cambridge.scr.slb.com (Steven Heaney) Date: 11 Sep 98 15:01:28 +0100 Subject: [XML-SIG] (fwd) WebDAV extensions to urllib/httplib? Message-ID: <B21EEBD6-FB477@134.32.101.215> On Mon, Aug 31, 1998 10:06 am, Greg Stein <mailto:gstein@lyra.org> wrote: >Andrew M. Kuchling wrote: >> >> "Steven Heaney" writes: >> >Can anybody point me to some software to kick-start development of >> >a client library for interacting with a 'WebDAV' server? Specifically, >> >I have in mind a forms-based interface to the Netscape Web Publisher >> >functionality. >> >> I don't know of anyone who's started on implementing bits of >> WebDAV in Python, but most of the pieces--httplib.py, XML parsing--are >> probably already in place, and you'd only have to glue them together. >> (This is gathered from a cursory glance at the WebDAV draft, so take >> it with a grain of salt.) > >I'd be interested in following your work on this, as I had planned to >start a similar library in about a month. I'll happily consult on info >for a while, and take a more direct role later. > >thx >-g > Andrew, Greg, Thanks for the input. Right now, I'm going to follow the line of least resistance which is to use JPython to access the Java client library provided by Netscape to communicate with their Web Publishing server. I'm pretty sure this does not conform with the current draft of the standard or, of course, provide the basis for a 'standard' Python module, but it fits my purposes at the moment. I simply need to create some wrappers (for convenience) to the Java classes provided and I'm up and running. It's also introduced me to JPython, which is one of those 'wow' things you come across occasionally. Cheers, Steve. ........................................................................ Steven Heaney Schlumberger http://www.slb.com/cgi-bin/people.pl?type=person&name=steven%20heaney From akuchlin@cnri.reston.va.us Fri Sep 11 16:51:39 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 11 Sep 1998 11:51:39 -0400 (EDT) Subject: [XML-SIG] XBEL DTD In-Reply-To: <199809092031.WAA11519@asterix.urc.tue.nl> References: <199809092031.WAA11519@asterix.urc.tue.nl> Message-ID: <13817.16056.184603.582709@amarok.cnri.reston.va.us> Marc van Grootel writes: >For the purpose of discussion I added my attempt >at what is dubbed the XBEL DTD. >I took Mark Hammond's as a starting point. >I scoped it a little wider and included most features >of the Netscape bookmark format. Between MSIE and NS What was the group's reaction to Marc's revised DTD? I saw no problems with it; while it makes the format a bit more complicated, it seems to be required in order to support lossless conversion from Netscape format. I like this little effort, and the XBEL programs will make a good demo to include with the XML software. To that end, I've written a little program to read Lynx bookmarks and generate the corresponding XBEL (using Mark's original DTD). Given that various people have already written programs to convert Netscape or IE bookmarks to XBEL, we now only need something to convert an XBEL document to an attractive HTML version, to make it easy to display our bookmark lists on the Web. Displaying XBEL files could be done with XSL, or with the XML rendering features in Mozilla, but that would limit the potential user base greatly; rendering to HTML seems the obvious course to follow. Longer-term, what applications are enabled if you have lots of people's bookmark files in a machine-readable form. You could build a high-quality list of links by finding the most commonly linked-to pages; I can't think of any other use off the top of my head. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ I am not here to mourn him. I mourned the loss of my love a long time ago. I am here to say goodbye to a stranger who once did me a good turn. And to the man who gave my son the death he craved. -- Calliope, in SANDMAN #71, part two of "The Wake" From Fred L. Drake, Jr." <fdrake@acm.org Fri Sep 11 17:23:05 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 11 Sep 1998 12:23:05 -0400 (EDT) Subject: [XML-SIG] XBEL DTD In-Reply-To: <13817.16056.184603.582709@amarok.cnri.reston.va.us> References: <199809092031.WAA11519@asterix.urc.tue.nl> <13817.16056.184603.582709@amarok.cnri.reston.va.us> Message-ID: <13817.20073.56664.26194@weyr.cnri.reston.va.us> Andrew M. Kuchling writes: > What was the group's reaction to Marc's revised DTD? I saw no I liked it. I'll be glad to add support for XBEL in Grail (both internally and in the external bookmarks2html script (which should be renamed...). -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From bwaumg@urc.tue.nl Sun Sep 13 01:31:50 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Sun, 13 Sep 1998 02:31:50 +0200 Subject: [XML-SIG] XBEL DTD Message-ID: <199809130031.CAA15303@asterix.urc.tue.nl> Hi, I attached a modified version of Lars's nsparse.py and bookmarks.py. I changed nsparse to use htmllib since I thought it could cause problems when xmlproc gets HTML (empty elements: <hr> vs. <hr/> ?). I didn't check that though. I changed bookmarks.py to output xbel XML according to the dtd (oops that's a lie -- it doesn't output an info element and in the dtd I defined it as a required element) I recently sent. Oh, and I removed the dump_adr methods 'cause I didn't know how to implement the new features for Opera. cat bookmark.htm | nsparse.py -ns >bookmark2.html cat bookmark.htm | nsparse.py >bookmark.xml I only ran this one time on my big bookmark file and it worked. Don't hit me if it blows up. It's just an illustration for getting stuff into the new dtd. The code may be a bit messy too. I'm a recent Python convert hope it's not too much baby-talk ;) I also thought that it would be nice to be able to store extra info in the xbel file on different levels. This could be done by borrowing the HTML meta tag idea: <xbel> <info> <meta name="generator" content="grail:?)"> <meta name="created" content="123456789"> </info> <folder> <info> <meta name="x" content="10"> </info> <bookmark>...</bookmark> <bookmark> <info><meta name="y" content="20"></info> ... </bookmark> etc. We could then store arbitrary data with the major elements (xbel,folder,bookmark). It's an easy enough addition without adding much complexity. And if you don't need it just ignore the info elements it. Maybe it could be used in web-maintainance tools like linbot. What bookmark formats should be supported? I would like to see excerpts of different kinds (like Lynx, Opera) and see if any of those makes changes to the dtd necessary. It would be nice if xbel could be used to express most of these without loss of information. Oh,well... Marc Here are the two scripts: # # nsparse.py # from htmllib import * from formatter import NullFormatter import bookmark class NSBookmarkParser(HTMLParser): def __init__(self): HTMLParser.__init__(self,NullFormatter()) self.inBookmark = 0 self.inDesc = 0 self.inFolder = 0 self.added = None self.folded = None self.desc = None self.title = None self.url_href = None self.url_modified = None self.id = None self.ref = None self.url_visited = None self.url_modified = None self.bms = bookmark.Bookmarks() def start_h1(self,attrs): self.inFolder = 1 self.save_bgn() def end_h1(self): self.title = self.save_end() def start_h3(self,attrs): self.inFolder = 1 for a in attrs: if a[0]=='add_date': self.added=a[1] elif a[0]=='folded': self.folded='yes' self.save_bgn() def end_h3(self): self.title = self.save_end() def start_dl(self,attrs): self.flush() def end_dl(self): self.flush() self.bms.leave_folder() self.inFolder = 0 def do_hr(self,attrs): self.flush() self.bms.add_separator() def do_dt(self,attrs): self.flush() def do_dd(self,attrs): self.inDesc = 1 self.save_bgn() def start_a(self,attrs): for a in attrs: if a[0]=='href': self.url_href=a[1] elif a[0]=='add_date': self.added=a[1] elif a[0]=='last_visit': self.url_visited=a[1] elif a[0]=='last_modified': self.url_modified=a[1] elif a[0]=='aliasid': self.id=a[1] elif a[0]=='aliasof': self.ref=a[1] self.inBookmark = 1 self.save_bgn() def end_a(self): self.title = self.save_end() def dump_xbel(self): self.bms.dump_xbel() def dump_netscape(self): self.bms.dump_netscape() def flush(self): if self.inDesc == 1: self.desc = self.save_end() self.inDesc = 0 if self.inBookmark == 1: if self.ref: self.bms.add_alias(self.ref) else: self.bms.add_bookmark(self.added,self.title,self.desc,self.id,self.url_href,self.url_visited,self.url_modified,None,None) self.inBookmark = 0 elif self.inFolder == 1: self.bms.add_folder(self.title,self.desc,self.added,self.folded) self.inFolder = 0 self.desc=None self.folded=None self.added=None self.title=None self.desc=None self.url_href=None self.url_modified=None self.url_visited=None self.ref=None self.id=None if __name__ == '__main__': p = NSBookmarkParser() p.feed(sys.stdin.read()) p.close() if "-ns" in sys.argv: p.dump_netscape() else: p.dump_xbel() # # bookmark.py # # """ Classes to store bookmarks and dump them to XBEL. """ import sys,string # --- maintain a stored for id objects IDs = {} def StoreID(id,obj): IDs[id]=obj def GetID(id): return IDs[id] # --- Class for bookmark container class Bookmarks: def __init__(self): self.folders=[] self.folder_stack=[] def add_folder(self,title,desc,added,folded): nf = Folder(title,desc,added,folded) if self.folder_stack==[]: self.folders.append(nf) else: self.folder_stack[-1].add_child(nf) self.folder_stack.append(nf) def add_bookmark(self,added,title,desc,id,href,visited,modified,checked,response): nb = Bookmark(added,title,desc,id,href,visited,modified,checked,response) if id: StoreID(id,nb) if self.folder_stack!=[]: self.folder_stack[-1].add_child(nb) else: self.folders.append(nb) def add_separator(self): sep = Separator() if self.folder_stack!=[]: self.folder_stack[-1].add_child(sep) else: self.folders.append(sep) def add_alias(self,ref): al = Alias(ref) if self.folder_stack!=[]: self.folder_stack[-1].add_child(al) else: self.folders.append(al) def leave_folder(self): if self.folder_stack!=[]: del self.folder_stack[-1] def dump_xbel(self,out=sys.stdout): out.write("<!DOCTYPE xbel SYSTEM \"xbel.dtd\">\n") out.write("<?xml version=\"1.0\"?>\n") out.write("<xbel version=\"0.1\">\n") for folder in self.folders: folder.dump_xbel(out) out.write("</xbel>") def dump_netscape(self,out=sys.stdout): out.write("<!DOCTYPE NETSCAPE-Bookmark-file-1>\n") out.write("<!-- This is an automatically generated file.\n") out.write("It will be read and overwritten.\n") out.write("Do Not Edit! -->\n") # output first folder specially f = self.folders[0] out.write("<TITLE>%s\n" % f.title) out.write("

%s

\n" % f.title) out.write("
%s\n

\n" % f.desc) for folder in f.children: folder.dump_netscape(out) out.write("

\n") class Folder: def __init__(self,title,desc,added,folded): self.added=added self.folded=folded self.title=title self.desc=desc # valid children are folders,bookmarks,separators and aliases self.children=[] def add_child(self,child): self.children.append(child) def dump_xbel(self,out): out.write(" \n") out.write(" %s\n" % self.title) if self.desc: out.write(" %s\n" % self.desc) for child in self.children: child.dump_xbel(out) out.write(" \n\n") def dump_netscape(self,out): # if toplevel then output title and h1 #if self.folders: #??" out.write("

%s

\n" % self.title) if self.desc: out.write("
%s" % self.desc) out.write("

\n") for child in self.children: child.dump_netscape(out) out.write("

\n") # --- Class for bookmarks class Bookmark: def __init__(self,added,title,desc,id,href,visited,modified,checked,response): self.id=id self.added=added self.title=title self.desc=desc self.href=href self.visited=visited self.modified=modified self.checked=checked self.response=response def dump_xbel(self,out): out.write(" \n") out.write(" %s\n" % self.title) if self.desc: out.write(" %s" % self.desc) out.write(" %s\n" % self.href) out.write(" \n") def dump_netscape(self,out): out.write("

%s\n" % self.title) if self.desc: out.write("
%s" % self.desc) class Alias: def __init__(self,ref): self.ref=ref def dump_xbel(self,out): out.write(" " % self.ref) def dump_netscape(self,out): bookref=GetID(self.ref) out.write("
%s\n" % bookref.title) if bookref.desc: out.write("
%s" % bookref.desc) class Separator: def dump_xbel(self,out): out.write(" \n") def dump_netscape(self,out): out.write("
\n") From akuchlin@cnri.reston.va.us Sun Sep 13 15:06:12 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 13 Sep 1998 10:06:12 -0400 Subject: [XML-SIG] XBEL: Lynx bookmark parser Message-ID: <199809131406.KAA00647@207-172-46-194.s194.tnt9.ann.erols.com> Here's a script to parse Lynx bookmark files, and output them as XBEL. It seems reasonable to modify ns_parse.py, msie_parse.py, and adr_parse.py to always output XBEL. I'm now also working on a SAX handler that converts XBEL to a Bookmarks instance and then outputs it in one of the browser formats. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ For animals, the entire universe has been neatly divided into things to (a) mate with, (b) eat, (c) run away from, and (d) rocks. -- Terry Pratchett, _Equal Rites_ #!/usr/bin/env python # # lynx_parse.py : # Read a list of Lynx bookmark files, specified on the command line, # and outputs the corresponding XBEL document. # # Sample usage: ./lynx_parse.py # import bookmark import re def parse_lynx_file(bms, input): """Convert a Lynx 2.8 bookmark file to XBEL, reading from the input file object, and write to the output file object.""" # Read the whole file into memory data = input.read() # Get the title m = re.search("(.*?)", data, re.IGNORECASE) if m is None: title = "Untitled" else: title = m.group(1) bms.add_folder( title, None, None) hrefpat = re.compile( r"""^ \s*
  • \s* [^"]* )" \s*> (?P .*? ) """, re.IGNORECASE| re.DOTALL | re.VERBOSE | re.MULTILINE) pos = 0 while 1: m = hrefpat.search(data, pos) if m is None: break pos = m.end() url, name = m.group(1,2) bms.add_bookmark( name, None, None, url) bms.leave_folder() if __name__ == '__main__': import sys bms = bookmark.Bookmarks() # Determine the owner on Unix platforms import os, pwd uid = os.getuid() t = pwd.getpwuid( uid ) bms.owner = t[4] for file in sys.argv[1:]: input = open(file) parse_lynx_file(bms, input) bms.dump_xbel() From akuchlin@cnri.reston.va.us Sun Sep 13 17:39:35 1998 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Sun, 13 Sep 1998 12:39:35 -0400 Subject: [XML-SIG] PyExpat module swallowing exceptions Message-ID: <199809131639.MAA09760@mira.erols.com> I've come across a curious bug; the SAX PyExpat module seems to swallow exceptions, but I can't figure out why this is happening. Here's a test program: from xml.sax import saxexts,saxlib class ExcHandler(saxlib.HandlerBase): def startElement(self, name, attrs): raise SystemError import StringIO h = ExcHandler() p=saxexts.XMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat") p.setDocumentHandler( h ) p.parseFile( StringIO.StringIO("data") ) Notice that the startElement method raises an exception. Run the above code, and it quietly runs to completion: [amk@mira xbel]$ python t.py [amk@mira xbel]$ Change it to use another parser, such as xmllib, and you get an exception: Traceback (innermost last): File "t.py", line 11, in ? p.parseFile( StringIO.StringIO("data") ) ... lots of stack frames deleted ... File "t.py", line 5, in startElement raise SystemError SystemError It looks to me as if, in the event of an exception being raised from a handler, there's no way to tell the Expat parser "Hey! That handler didn't work, so stop parsing!", and the handlers keep getting called, the exception being discarded somewhere. I came across this when debugging my XBEL reading code; I had written self.add_folder instead of self.bms.add_folder, but never saw the AttributeError exception that would have pointed out the problem. Obviously this is a bad thing when debugging code and the Expat module is selected as the parser. This seems like a glaring flaw in Expat, that there's no way to end parsing prematurely. Has anyone told James Clark about this? Failing a change to Expat, an apparent fix would be to add "if (PyErr_Occurred()) return;" to all the handler functions in pyexpat.c, in order to do nothing. However, I tried this, and the exception still is never raised. What's confusing me is: why is the exception just vanishing? I couldn't find an 'except:' responsible, or a PyErr_Clear() in pyexpat.c. Anyone got any clues? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ May you go safe, my friend, across that dizzy way / No wider than a hair, by which your people go / From earth to Paradise; may you go safe today / With stars and space above, and time and stars below. -- Lord Dunsany From fermigie@math.jussieu.fr Mon Sep 14 14:26:41 1998 From: fermigie@math.jussieu.fr (Stefane Fermigier) Date: Mon, 14 Sep 1998 15:26:41 +0200 Subject: [XML-SIG] Fun with the DOM. In-Reply-To: <3512F83B.6A7F88FA@technologist.com>; from Paul Prescod on Fri, Mar 20, 1998 at 06:14:04PM -0500 References: <3512F83B.6A7F88FA@technologist.com> Message-ID: <19980914152641.A7042@riemann.math.jussieu.fr> Hi, I had some fun with my own implementation of the DOM yesterday. I made a toy linuxdoc -> LaTeX transformation engine using ideas from my somewhat clumsy SGML -> SGML transformer. Basically, the idea is that having a tree in memory is nice because you can transform nodes in a bottom -> up fashion (assuming your tree grows downwards, like they usually do), i.e. you transform the children first then pass the result to the parents. This is much simpler than an event based transformer (a la ASP), where you call start_XXX and end_XXX for each node traversed. The program uses my outdated DOM core implementation along with my completely bogus ESIS -> DOM builder, and it doesn't implement the full DTD. You have to have nsgmls or a similar tool to use my program. The program, which is just a toy but I find rather nice, is available at the URL: . There is an open issue regarding really complex tranformation, like transposing a table for instance. Cheers, S. -- Stéfane Fermigier, MdC à l'Université Paris 7. Tel: 01.44.27.61.01 (Bureau). , , . "Without hardware memory protection, the machine-dependent actions taken after an arror can cause a machine crash [...]. (MacIntosh users experience this problem on a daily basis)." Adrew K. Wright. From larsga@ifi.uio.no Tue Sep 15 09:42:37 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Tue, 15 Sep 1998 10:42:37 +0200 (MET DST) Subject: [XML-SIG] xmlproc: Version 0.52 released! Message-ID: <199809150842.KAA01625@ifi.uio.no> I've just released version 0.52 of xmlproc. The main improvements are: - Major speed increase for well-formedness parsing (parse down from 50 seconds to 30 seconds on my benchmark suite), and definite improvements for validating parsing as well. - Error reporting improved. Better error messages, and support for error messages in different languages. - xvcmd.py option interpretation improved (-l and -o options added) - Numerous minor parse bug fixes - Some API extensions: - CatalogManager.get_public_ids() method added - DTD.get_elements() method added - Parser.set_error_language() method added - optional bufsize argument added to Parser.parse_resource() Because of the speed increase all xmlproc users are recommended to upgrade to the new version. xmlproc is now nearly twice as fast as xmllib when not validating and when validating it is also faster unless the DTD is very large compared to the document. The error reporting improvement means that you can now get error messages in Norwegian and English. If anyone wants to add support for more languages they are encouraged to do so. The error messages are in the errors.py file. There is an API for plugging in new languages, but this is still prototypical and so is not documented yet. The API extensions were made because someone (Michael Sobolev, thanks Michael) requested them. If you have any wishes in that direction, please let me know, and I'll see what I can do. I'm thinking of adding a list of other programs that use xmlproc to the xmlproc home page. If anyone knows of such a program please email me so I can add it. --Lars M. From bwaumg@urc.tue.nl Tue Sep 15 13:47:55 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Tue, 15 Sep 1998 14:47:55 +0200 Subject: [XML-SIG] xmlproc and Docbook XML DTD Message-ID: <199809151247.OAA00361@asterix.urc.tue.nl> Hi, I just installed the newest xmlproc. I'm trying to get it to validate with Norman Walsh's Docbook XML DTD (db3xml10.dtd). It looks as if xmlproc has problems with some of the parameter entities. Right out of the box it reports: ERROR: One of 'IGNORE' or 'INCLUDE' expected at db3xml10.dtd:32:4 TEXT: '%ISOamsa.m' The whole parameter name is %ISOamsa.module After removing a series of these because they were ignored anyway it got to: dbpoolx.mod:137:4 TEXT: '%dbpool.re' It's whole name is dbpool.redecl.module After replacing the section that uses this PE with IGNORE it finally came to this traceback (only last one showed): File "C:\Python\site\xml\parsers\xmlproc\xmlproc.py", line 897, in parse_pe_ref self.report_error(3038,name) NameError: name It seems as if these longer PE's aren't parsed properly (I looked at the regexps used but that seems ok). Is there a restriction on their length? Or is there anyone who succeeded in using this DTD unmodified? Marc --- Marc van Grootel bwaumg@urc.tue.nl From larsga@ifi.uio.no Tue Sep 15 14:05:54 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: Tue, 15 Sep 1998 15:05:54 +0200 Subject: [XML-SIG] xmlproc and Docbook XML DTD In-Reply-To: <199809151247.OAA00361@asterix.urc.tue.nl> Message-ID: <3.0.1.32.19980915150554.006b0ab8@ifi.uio.no> * Marc van Grootel > >I just installed the newest xmlproc. I'm trying to get it to validate >with Norman Walsh's Docbook XML DTD (db3xml10.dtd). > >It looks as if xmlproc has problems with some of the parameter >entities. > >Right out of the box it reports: > >ERROR: One of 'IGNORE' or 'INCLUDE' expected at db3xml10.dtd:32:4 >TEXT: '%ISOamsa.m' This is because xmlproc does not support parameter entities inside declarations yet. xmlproc 0.52 was releases now because I wanted to start working on this now and expected that to take a while. For now I'm afraid you'll have to normalize the DTD before you use it. >File "C:\Python\site\xml\parsers\xmlproc\xmlproc.py", line 897, in >parse_pe_ref > self.report_error(3038,name) >NameError: name "/)"(#"#!# This is a bug. I'll fix it tomorrow and post a fix on Thursday. >It seems as if these longer PE's aren't parsed properly (I looked at >the regexps used but that seems ok). Part of the reason why 0.52 is so much faster than 0.51 is that it no longer uses regexps to parse names. Anyway, I don't think this is the problem, it looks as though the 'name' variable for some reason isn't initialized. >Is there a restriction on their length? No. --Lars M. From Fred L. Drake, Jr." References: <199809130031.CAA15303@asterix.urc.tue.nl> Message-ID: <13822.27528.568968.331881@weyr.cnri.reston.va.us> --y/dsR0gwNl Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit I started working on Grail support for XBEL last night, and would like to suggest a small change to the DTD. There doesn't appear any reason to make the info element or its children required, so I suggest all three be made optional. The machine name, in particular, does not appear to be very useful. I can also envision shared-bookmarks applications where the owner may vary from folder to folder, so I'd also allow info within each folder. Allowing it in the folder makes the outermost info superfluous; an info within the outermost folder would work just fine. (I'll leave the folder inside the xbel element, since there may be good reason for adding things outside the folder in some applications.) I've attached the modified DTD below. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 --y/dsR0gwNl Content-Type: text/xml Content-Description: Revised XBEL DTD. Content-Disposition: inline; filename="xbel.dtd" Content-Transfer-Encoding: 7bit --y/dsR0gwNl-- From bwaumg@urc.tue.nl Tue Sep 15 16:10:13 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Tue, 15 Sep 1998 17:10:13 +0200 Subject: [XML-SIG] XBEL DTD Message-ID: <199809151510.RAA08403@asterix.urc.tue.nl> Hi, I was well on my way of suggesting some changes too. Some of them are the same as suggested by Fred. I'm working on yet of few other suggestions. Which i'll post about in a later mail since they need some more explanation and I would like your opinions. In addition to the changes suggested by Fred i want to suggest the following: - make the 'info' element contain zero or more 'meta' tags, this way we don't have to fight too much about how to name them and can add new ones without breaking the DTD - also allow 'info' in a bookmark so extra data can be associated with a single bookmark - drop the 'title' for bookmarks and put that into 'url' and put the href itself back into the 'url' as an 'href' attribute. So instead of: My link http://foo Write: My link In the HTML3.2 DTD there's the following quote: "The term URL means a CDATA attribute whose value is a Uniform Resource Locator, See RFC1808 (June 95) and RFC1738 (Dec 94)." Putting the href out into element content as #PCDATA seems to broad to me. - make the 'title' optional too - instead of a bookmark a bare 'url' element should be allowed too. This should be considered a bookmark without a description and/or info. - allowing bookmark,url,alias etc directly under xbel The three last suggestions may seem to come from out of the blue but they have to do with the use of XBEL as a meta-dtd (see next post) and the ability to extract XBEL from almost arbitrary XML documents. I'll post my current DTD next too. Then it's about time I think to reach a consensus and freeze it so Fred can go on ;) > I started working on Grail support for XBEL last night, and would > like to suggest a small change to the DTD. > There doesn't appear any reason to make the info element or its > children required, so I suggest all three be made optional. Yup, I did that too already. > The > machine name, in particular, does not appear to be very useful. I can > also envision shared-bookmarks applications where the owner may vary > from folder to folder, so I'd also allow info within each folder. I'm not sure but maybe the reason Mark suggested it was so a processor could make certain assumptions. Maybe platform is a better name anyway. I even thought about adding optional info to a single bookmark. There's a problem lurking with shared-bookmarks though. Currently id is declared as ID. This means a valid document must have unique id's. Merging different bookmark files may break that constraint. A possible solution would be to rename id's when merging bookmark files. An easy way out would be to declare id as CDATA but that's not a real solution. > Allowing it in the folder makes the outermost info superfluous; an > info within the outermost folder would work just fine. (I'll leave > the folder inside the xbel element, since there may be good reason for > adding things outside the folder in some applications.) I've attached > the modified DTD below. I would like to throw out owner and machine in favor of a generic htmlish meta tag. If owner and/or machine name are needed you could claim a standard meta tag with owner, machine or whatever your application needs. This way all info could be processed by the same code. Another use for a meta tag is for adding keywords. When these are preserved it's trivial to put a HTML version on the web and have it handled properly by the search bots. Bye Marc -- Marc van Grootel bwaumg@urc.tue.nl From bwaumg@urc.tue.nl Tue Sep 15 16:34:58 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Tue, 15 Sep 1998 17:34:58 +0200 Subject: [XML-SIG] XBEL DTD as a meta-dtd Message-ID: <199809151534.RAA09232@asterix.urc.tue.nl> Hi, This post became rather long (my DTD is at the bottom). I did some experimenting with Geir's xmlarch.py and it works nicely (once you update to the newest sax stuff). I made this effort because I thought XBEL could be used in a Website management tool that checks external links and reports on them (something like linbot). Such a processor could store information about the links in the 'info' elements and the id's could refer back to the original XML document. In order for XBEL to function as a meta-DTD I needed to loosen some restrictions and make a few changes to the XBEL DTD. With these changes it is possible to derive XBEL from many XML documents just by specifying how the mapping has to take place. This architectural processing is standardized (annex A.3 of ISO/IEC 10744:1997) so I could use other architectural engines to do the same (for example XAF by David Megginson). No coding of specialized XML processors needed. The XBEL is like a virtual document automatically derived from the XML source. For some more examples and explanations look at the documentation for XAF (http://www.megginson.com/XAF). Thanks, Geir for making this possible in Python. At the end I included the xbel dtd as I use it now. Maybe we could reach a consensus. The DTD is looser now which makes processing it a little more difficult. Processors that output XBEL are not affected much since they could always output a more restricted form of XBEL but it would be nice if a processor that reads XBEL could cope with the looser XBEL DTD. Here's an example of two simplified XML fragments: Chapter 1

    This is A.

    This is B.

    Chapter 2

    This is C.

    [This is not real TEI since it lacks an easy way to refer to an url] My Book Chapter 1 This is A. This is B. Chapter 2 This is C. Obviously there are some structural differences. Also, in the first a paragraph is called 'p' in the other 'para', in the first a chapter is called 'div1' and in the other 'chapter'. With architectural forms you can extract a structured list of url's from both of these without creating a separate processor for each. Just specify how the derivation should work and process the document with an architectural forms processor (like xmlarch.py). To show how that works I used the 'book' example: Here's the complete document: ]> My Book Chapter 1 This is A This is B Chapter 2 This is C Feeding this to xmlarch.py results in the following architectural (or virtual) document: My Book Chapter 1 A B Chapter 2 C As you can see xmlarch.py derived the xbel document from the book document. The chapter element's are changed to folder's. The ulink's are changed to url's and every url attribute is changed to a href attribute. It also stripped the elements inside the first ulink. If we want to use XBEL to work as a meta-dtd for doing these kinds of things some changes to the DTD are in order. Architectural forms can do many things but they cannot completely reorder the original document so the XBEL DTD (meta DTD) and the XML DTD used (client DTD) need to have some structural similarities. =========== my current XBEL DTD ================ From Fred L. Drake, Jr." References: <199809151510.RAA08403@asterix.urc.tue.nl> Message-ID: <13822.37785.563540.803670@weyr.cnri.reston.va.us> Marc van Grootel writes: > - make the 'info' element contain zero or more 'meta' tags, this > way we don't have to fight too much about how to name them > and can add new ones without breaking the DTD I'm OK with this; I would use a slightly different structure than you did in the example you give farther down in your post. I'd prefer: so the markup would be: my data > - also allow 'info' in a bookmark so extra data can be associated > with a single bookmark OK. > - drop the 'title' for bookmarks and put that into 'url' and > put the href itself back into the 'url' as an 'href' attribute. I'm not sure I really like this; for a bookmarks list, the URL itself really is content. > In the HTML3.2 DTD there's the following quote: ... The quoted discussion seems very specific to the HTML spec, and is not general. If there's some relevant context I'm missing, please quote that as well. > - make the 'title' optional too > > - instead of a bookmark a bare 'url' element should be allowed > too. This should be considered a bookmark without a description > and/or info. > > - allowing bookmark,url,alias etc directly under xbel OK, OK, OK. I said: > > The > > machine name, in particular, does not appear to be very useful. I can > > also envision shared-bookmarks applications where the owner may vary > > from folder to folder, so I'd also allow info within each folder. And Marc replied: > I'm not sure but maybe the reason Mark suggested it was so a processor > could make certain assumptions. Maybe platform is a better name Maybe. I don't know what sort of assumptions would be reasonable unless it could be used to check the availability of file: URLs. > There's a problem lurking with shared-bookmarks though. Currently id > is declared as ID. This means a valid document must have unique > id's. Merging different bookmark files may break that constraint. Generating new IDs on an as-needed basis would be the best solution for merging, with the option of it being treated as an error also being available, but this does not affect the shared bookmarks application I was thinking of. I was thinking of a single xbel instance being access simultaneously by several users (presumably through a server of some sort), and all actions on the instance could be immediately reflected in each user's UI. This could be useful in maintaining links shared reference material. The info element could be used to store access control information, approved/unapproved flags, etc. > Another use for a meta tag is for adding keywords. When these are > preserved it's trivial to put a HTML version on the web and have it > handled properly by the search bots. And can be used to improve searching by the browser; as the user visits various pages, any keywords or other useful meta information could be pulled into the bookmarks database and (optionally) automatically updated on future visits. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Jack.Jansen@cwi.nl Tue Sep 15 22:26:29 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Tue, 15 Sep 1998 23:26:29 +0200 Subject: [XML-SIG] XBEL DTD In-Reply-To: Message by "Fred L. Drake" , Tue, 15 Sep 1998 12:19:37 -0400 (EDT) , <13822.37785.563540.803670@weyr.cnri.reston.va.us> Message-ID: Okay, if we're all putting in requests for our pet feature in the XBEL DTD I have one too: I'd like an empty ("pass", in Python terms) element, with only an id attribute. With this it should be possible to do two-way-syncing of bookmark files between machines. If each machine generates IDs in a unique manner, and replaces items (or folders or whatever) with the pass item when you delete them you win something important: on the next sync you can differentiate between the situation where the bookmark was deleted on machine A or added on machine B. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From gstein@lyra.org Tue Sep 15 22:34:51 1998 From: gstein@lyra.org (Greg Stein) Date: Tue, 15 Sep 1998 14:34:51 -0700 Subject: [XML-SIG] XBEL DTD References: <199809151510.RAA08403@asterix.urc.tue.nl> Message-ID: <35FEDD7B.551CCDBC@lyra.org> Marc van Grootel wrote: > ... > I would like to throw out owner and machine in favor of a generic htmlish > meta tag. > > > > > > If owner and/or machine name are needed you could claim a standard > meta tag with owner, machine or whatever your application needs. This > way all info could be processed by the same code. >... This is kind of silly. XML is intended to encode the "name" as the actual tag. Why push this down another level? Using an "owner" tag, you can extract this information directly from the parse tree. Using a "meta" tag like above, now the software has to iterate through the meta tags looking for the information. XML is enough of an abstraction; you don't want to start creating additional layers in there. The tendency should be towards additional tags and less "control" type elements. It does not hurt anything to specify an optional tag, yet it can make many things easier. -g -- Greg Stein (gstein@lyra.org) From Fred L. Drake, Jr." References: <13822.37785.563540.803670@weyr.cnri.reston.va.us> Message-ID: <13822.57838.776721.726280@weyr.cnri.reston.va.us> Jack Jansen writes: > Okay, if we're all putting in requests for our pet feature in the XBEL > DTD I have one too: I'd like an empty ("pass", in Python terms) > element, with only an id attribute. Your rational is good; I'll go for it. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From gstein@lyra.org Tue Sep 15 23:00:12 1998 From: gstein@lyra.org (Greg Stein) Date: Tue, 15 Sep 1998 15:00:12 -0700 Subject: [XML-SIG] XBEL DTD as a meta-dtd References: <199809151534.RAA09232@asterix.urc.tue.nl> Message-ID: <35FEE36C.A9181E3@lyra.org> I'd highly reocmmend using a different DTD for generic URL extraction. XBEL is for _bookmark_ representation. The nice thing about XML is the ability to use multiple DTDs as necessary. XML is also supposed to convey structured information; the more generic it becomes, the less useful XML becomes. While on this point, somebody should establish the XBEL DTD somewhere (XML SIG page?) so that people can refer to it with a namespace declaration, then augment their tags with the namespace. For example: -g -- Greg Stein (gstein@lyra.org) From akuchlin@cnri.reston.va.us Tue Sep 15 23:40:23 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Tue, 15 Sep 1998 18:40:23 -0400 (EDT) Subject: [XML-SIG] XBEL DTD as a meta-dtd In-Reply-To: <35FEE36C.A9181E3@lyra.org> References: <199809151534.RAA09232@asterix.urc.tue.nl> <35FEE36C.A9181E3@lyra.org> Message-ID: <13822.60371.108025.289138@amarok.cnri.reston.va.us> Greg Stein writes: >I'd highly recommend using a different DTD for generic URL extraction. >XBEL is for _bookmark_ representation. The nice thing about XML is the Agreed; it seems to complicate XBEL, more than seems necessary for a fairly simple application like maintaining a bookmark file. >While on this point, somebody should establish the XBEL DTD somewhere >(XML SIG page?) so that people can refer to it with a namespace >declaration, then augment their tags with the namespace. For example: > Good point, but I'm not sure at what URI it should live. python.org/sigs/xml-sig/ isn't permanent; SIGs are supposed to die when they've fulfilled their purpose, and the XML-SIG will probably do so eventually. That leaves somewhere in /topics/xml/; perhaps /topics/xml/DTD/ can be used for such DTDs. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ prompt. n. (Unix) A symbol on the screen indicating which shell is attacking you. -- Stan Kelly-Bootle, _The Computer Contradictionary_ From Fred L. Drake, Jr." References: <199809151534.RAA09232@asterix.urc.tue.nl> <35FEE36C.A9181E3@lyra.org> <13822.60371.108025.289138@amarok.cnri.reston.va.us> Message-ID: <13823.46031.723233.700011@weyr.cnri.reston.va.us> Greg Stein writes: > I'd highly recommend using a different DTD for generic URL extraction. > XBEL is for _bookmark_ representation. The nice thing about XML is the Andrew M. Kuchling writes: > Agreed; it seems to complicate XBEL, more than seems necessary > for a fairly simple application like maintaining a bookmark file. I don't think the proposed DTD is too complicated, but it probably shouldn't get much more complicated. Jack's "pass" element makes sense and should be added since it directly related to bookmark management within applications like Grail. > python.org/sigs/xml-sig/ isn't permanent; SIGs are supposed to die > when they've fulfilled their purpose, and the XML-SIG will probably do > so eventually. That leaves somewhere in /topics/xml/; perhaps > /topics/xml/DTD/ can be used for such DTDs. This last variant is almost exactly what I'm spitting out from Grail; the only difference is that I spelled "DTD" as "dtds" (take your pick for capitalization, but I think the plural makes sense). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From akuchlin@cnri.reston.va.us Wed Sep 16 15:14:23 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 16 Sep 1998 10:14:23 -0400 (EDT) Subject: [XML-SIG] XBEL DTD as a meta-dtd In-Reply-To: <13823.46031.723233.700011@weyr.cnri.reston.va.us> References: <199809151534.RAA09232@asterix.urc.tue.nl> <35FEE36C.A9181E3@lyra.org> <13822.60371.108025.289138@amarok.cnri.reston.va.us> <13823.46031.723233.700011@weyr.cnri.reston.va.us> Message-ID: <13823.50440.29860.876752@amarok.cnri.reston.va.us> Fred L. Drake writes: > I don't think the proposed DTD is too complicated, but it probably >shouldn't get much more complicated. Jack's "pass" element makes >sense and should be added since it directly related to bookmark >management within applications like Grail. I was agreeing more with Greg's reaction to turning it into a meta-DTD. The basic problem of expressing a bookmark file is fairly simple, and the DTD should also be fairly simple. It's nice to keep it at the level of complexity where people (such as me) say "Oh, that looks neat; I'll take an hour and implement it" instead "Gosh, that looks awfully complicated; I'll pull the covers over my head and hope it goes away." In addition, the XBEL code as-is makes an excellent set of sample programs for the Python/XML package. > I wrote: > > so eventually. That leaves somewhere in /topics/xml/; perhaps > > /topics/xml/DTD/ can be used for such DTDs. > > This last variant is almost exactly what I'm spitting out from >Grail; the only difference is that I spelled "DTD" as "dtds" (take >your pick for capitalization, but I think the plural makes sense). Good suggestion, though I tend to read the URL components as qualifiers, not categories, and hence usually go for the singular: "dtd" instead of "dtds". Anyway, there's now a page for them at: http://www.python.org/topics/xml/dtds/ Add xbel.dtd to the end of that URL to download the DTD; you can use this in namespace declarations. When I get time, I'll probably add the DTD used by the xml.marshal function to that page as well (unless xml.marshal is obsoleted by Lotos or some other DTD). This isn't going to be a massive collection of DTDs, just a stable home for any DTDs that originate within the Python community. (The XBEL DTD used is the original one. When we can settle on a final version of the DTD, I'll update it.) -- A.M. Kuchling http://starship.skyport.net/crew/amk/ What can I wish to the youth of my country who devote themselves to science?... Thirdly, passion. Remember that science demands from a man all his life. If you had two lives that would not be enough for you. Be passionate in your work and in your searching. -- Ivan Pavlov From grove@infotek.no Wed Sep 16 15:46:50 1998 From: grove@infotek.no (Geir Ove Gronmo) Date: Wed, 16 Sep 1998 16:46:50 +0200 Subject: [XML-SIG] XBEL DTD as a meta-dtd In-Reply-To: <13823.50440.29860.876752@amarok.cnri.reston.va.us> References: <13823.46031.723233.700011@weyr.cnri.reston.va.us> <199809151534.RAA09232@asterix.urc.tue.nl> <35FEE36C.A9181E3@lyra.org> <13822.60371.108025.289138@amarok.cnri.reston.va.us> <13823.46031.723233.700011@weyr.cnri.reston.va.us> Message-ID: <199809161445.QAA14942@mail.infotek.no> At 10:14 16.09.98 -0400, you wrote: >Fred L. Drake writes: >> I don't think the proposed DTD is too complicated, but it probably >>shouldn't get much more complicated. Jack's "pass" element makes >>sense and should be added since it directly related to bookmark >>management within applications like Grail. > > I was agreeing more with Greg's reaction to turning it into a >meta-DTD. All DTDs can be meta-DTDs (architectural DTDs). The complexity of the DTD doesn't really matter. The only thing that might be "harder" - is the mapping from the instance (which is to be architecturally processed) to the meta-dtd. So whether anybody calls XBEL a meta-dtd, or not, doesn't matter. Geir O. ================== Geir Ove Grønmo ================== | STEP Infotek as, Gjerdrumsvei 12, 0486 Oslo, Norway | | grove@infotek.no http://www.infotek.no/ | ------------------------------------------------------- From grove@infotek.no Wed Sep 16 15:58:02 1998 From: grove@infotek.no (Geir Ove Gronmo) Date: Wed, 16 Sep 1998 16:58:02 +0200 Subject: [XML-SIG] xmlarch: Version 0.11 released Message-ID: <199809161456.QAA15228@mail.infotek.no> xmlarch.py: An XML architectural forms processor written in Python Version: 0.11 Author: Geir Ove Grønmo Email: grove@infotek.no Released: September 15th 1998 Homepage: http://www.infotek.no/~grove/software/xmlarch/index.html --- What is xmlarch.py? The xmlarch.py module contains an XML architectural forms processor written in Python. It allows you to process XML architectural forms using any parser that uses the SAX interfaces. The module allow you to process several architectures in one parse-pass. Architectural document events for an architecture can even be broadcasted to multiple DocumentHandlers. What's new? There are no new features in this release. The module should now be placed in the xml.arch package. The demo tools have been updated to support the new package structure. Problem with not being recognized as an architecture use declaration is now fixed. Now both and are supported. get_bridge_form() was called get_bridge_elem_form() a couple of places. This is now fixed. --- Enjoy! Geir Ove Grønmo From bwaumg@urc.tue.nl Wed Sep 16 16:02:50 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Wed, 16 Sep 1998 17:02:50 +0200 Subject: [XML-SIG] XBEL DTD as a meta-dtd Message-ID: <199809161502.RAA19327@asterix.urc.tue.nl> Hi, So the consensus is, more or less, that 'less is more'. I can agree to that. My experiment with architectural forms may have led me to far from the goals of XBEL. I agree with Greg that such URL extraction is better left to another DTD. So the scope is 'hierarchical storage for bookmarks'?. But is a lossless conversion between XBEL and Netscape still a goal? If not I think that 'separator' should go since it serves no real purpose. Even with something that looks so simple there are some important issues which show up mostly after people start implementing applications with it. In my opinion we need an escape-hatch to provide for some of these yet unknown applications. This is what I had in mind with: ... Or as Fred suggested: ... ... Greg Stein wrote: > This is kind of silly. XML is intended to encode the "name" as the > actual tag. Why push this down another level? Using an "owner" tag, you > can extract this information directly from the parse tree. Using a > "meta" tag like above, now the software has to iterate through the meta > tags looking for the information. > > XML is enough of an abstraction; you don't want to start creating > additional layers in there. The tendency should be towards additional > tags and less "control" type elements. It does not hurt anything to > specify an optional tag, yet it can make many things easier. I think it can extend the life-time of the DTD. Maybe then at a later stage common conventions could make it into the DTD as an explicit element. This situation is better then defining only a few explicit elements for info which can lead to tag-abuse by different authors and applications. These catch-all mechanisms are not uncommon and I don't think they violate the idea of XML. I rather like one well-crafted DTD then having multiple DTD's with only minor differences. If info like 'owner' is so important that it should be declared explicitly it can also be an (optional) attribute of the elements to which it belongs (folder and bookmark). As to the form of the meta element: Maybe the 'name' attribute should be declared as NMTOKEN to restrict it to a name token. With my data the content is #PCDATA so if there are certain characters in the data they should be encoded ('<' => '<' etc.). For a 'content' attribute things like '<' and '>' can stay as they are (but watch out for '&' -- see below). Where to put the URL's? Although it may seem like nitpicking I think it is not. One of the reasons for putting the url itself in an attribute would be the stricter constraints of CDATA and being able to make it #REQUIRED. As element content the parser cannot check if the element really contains a value at all since: will look ok to the parser. There's another reason though. I looked through my bookmark list and there were several url's that looked like: http://someserver/somepage.html&var=x A parser will complain when it sees this since '&' preceding a name-character starts a general entity reference. Which is probably not defined. Then it encounters the '=' which generates a warning since a general entity should end with ';'. I thought it would be safe to put the url in a CDATA attribute. Alas, it turns out that even in a CDATA attribute a parser would still try to resolve a general entity. In David Megginson's book (Structuring XML Documents - p. 19) I found the following explanation: CDATA attribute type: Note that an attribute type applies to the value of the attribute *after* the attribute string has been normalized - general entities will still be recognized as part of that normalization process. So, although I thought putting url's in a CDATA attribute is safe, it is not. The solution might be to url-encode url's. So the above url becomes: http:%3A%2F%2Fsomeserver%2Fsomepage.html%26var%3Dx Hmmm. Not a pretty sight. Maybe a structure like: .. .. is not so bad (maybe even with an optional info element?). Finally, what about the main level? Forest or Tree? .. .. .. Or: .. .. I like Fred's suggestion that in the latter an info element directly under xbel (so outside a folder) could convey other info then the info elements inside a folder (or maybe even a bookmark). Maybe this even warrants naming that specific element differently ('header'?). Do we have to fix a limit for the depth of recursion or should this be left to every application. Maybe we should say that an XBEL application should at least be able to handle a depth of x folders. Marc --- Marc van Grootel bwaumg@urc.tue.nl From bwaumg@urc.tue.nl Wed Sep 16 16:13:59 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Wed, 16 Sep 1998 17:13:59 +0200 Subject: [XML-SIG] XBEL DTD as a meta-dtd Message-ID: <199809161513.RAA19851@asterix.urc.tue.nl> Geir wrote: > > All DTDs can be meta-DTDs (architectural DTDs). The complexity of the DTD > doesn't really matter. The only thing that might be "harder" - is the > mapping from the instance (which is to be architecturally processed) to the > meta-dtd. > > So whether anybody calls XBEL a meta-dtd, or not, doesn't matter. So although a DTD can always be used as a meta-dtd I thought it would be nice if mapping from an instance to XBEL would be easy. But in the end I agree that structuring the DTD so that this would be easy puts too much strain on the design of XBEL. Marc -- Marc van Grootel bwaumg@urc.tue.nl From grove@infotek.no Wed Sep 16 16:37:25 1998 From: grove@infotek.no (Geir Ove Gronmo) Date: Wed, 16 Sep 1998 17:37:25 +0200 Subject: [XML-SIG] XBEL DTD as a meta-dtd In-Reply-To: <199809161513.RAA19851@asterix.urc.tue.nl> Message-ID: <199809161536.RAA16351@mail.infotek.no> At 17:13 16.09.98 +0200, Marc van Grootel wrote: >So although a DTD can always be used as a meta-dtd I thought it would >be nice if mapping from an instance to XBEL would be easy. Yes, I also think that the XBEL DTD should be kept simple. >But in the >end I agree that structuring the DTD so that this would be easy puts >too much strain on the design of XBEL. This all depends on which DTDs you have in mind. Some factors that come to my mind: o The similarity of the structure in the instance and the meta-DTD o Mapping between element content to element content is much easier that mapping from element content to attribute values and vice versa (This is not yet implemented in xmlarch, but will be soon). o Reordering is not possible (as far as I know). Geir O. ================== Geir Ove Grønmo ================== | STEP Infotek as, Gjerdrumsvei 12, 0486 Oslo, Norway | | grove@infotek.no http://www.infotek.no/ | ------------------------------------------------------- From Fred L. Drake, Jr." References: <199809161502.RAA19327@asterix.urc.tue.nl> Message-ID: <13823.56721.237919.41358@weyr.cnri.reston.va.us> Marc van Grootel writes: > So the scope is 'hierarchical storage for bookmarks'?. But is > a lossless conversion between XBEL and Netscape still a goal? If not > I think that 'separator' should go since it serves no real purpose. I'd leave it in; making XBEL specific to bookmarking (rather than bookmark extraction) does not mean that the requirement for supporting everything supported by Navigator and MSIE goes away. If it can't do that, it can't be effectively used as an interchange medium, which I think it should. (That's what the tools offered here provide, after all.) Jack's extension also makes sense within this context. (It needs a name, though, perhaps ? ;-) Greg Stein wrote: > This is kind of silly. XML is intended to encode the "name" as the > actual tag. Why push this down another level? Using an "owner" tag, you ... > XML is enough of an abstraction; you don't want to start creating > additional layers in there. The tendency should be towards additional Marc van Grootel wrote: > I think it can extend the life-time of the DTD. Maybe then at a later > stage common conventions could make it into the DTD as an explicit I think Greg has a very good point. There's no reason that the contents of or or or anything else can't be structured. The instance remains a well-formed XBEL document and can be down-converted to valid XBEL easily if required. > element. This situation is better then defining only a few explicit > elements for info which can lead to tag-abuse by different authors and Actually, the catch-all is a form of tag abuse, whereas introducing new elements for specific applications is not. This doesn't mean that there shouldn't be something like <meta>, only that we should be very clear in the intended use of the element; it may not be as free-form as we've left it at this point. (I still think captuing additional data from Web pages is useful, and <meta> makes a lot of sense as a mirror for data extracted from <meta> elements in the HTML documents.) > think they violate the idea of XML. I rather like one well-crafted DTD > then having multiple DTD's with only minor differences. There should be one well-crafted base to start from, but as information becomes more application-specific, it makes sense to use "subclassed" DTDs. I have no problems with this; I just want to be able to determine that the documents are XBEL documents, even if actually of a "subclass", so that I can load them easily. But maybe an architecture declaration would be just as useful. ;-) > Maybe the 'name' attribute should be declared as NMTOKEN to restrict > it to a name token. With <meta name="..">my data</meta> the content is This is good, if <meta> is kept. > One of the reasons for putting the url itself in an attribute would be > the stricter constraints of CDATA and being able to make it > #REQUIRED. As element content the parser cannot check if the element This is a good reason; I support this. > I looked through my bookmark list and there were several url's that > looked like: > > http://someserver/somepage.html&var=x [URL data discussion...] The appropriate solution is probably to spit out character references for special characters (specifically, "<" and "&"). This is trivial to implement, and the input would have to be handled correctly according to XML rules anyway. There is no need to invoke additional standards here; "URL encoding" is irrelevant in XML, and has everything to do with the HTTP requests. Bookmarks are not limited to the http: scheme, so why should we need that particular encoding? > Do we have to fix a limit for the depth of recursion or should this be > left to every application. Maybe we should say that an XBEL > application should at least be able to handle a depth of x folders. No. The DTD & associated documentation is about a data model, not processing limitations. This issue is strictly an processing issue. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From bwaumg@urc.tue.nl Wed Sep 16 17:16:40 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Wed, 16 Sep 1998 18:16:40 +0200 Subject: [XML-SIG] XBEL DTD as a meta-dtd Message-ID: <199809161616.SAA22462@asterix.urc.tue.nl> Fred L. Drake writes: > Marc van Grootel writes: > > So the scope is 'hierarchical storage for bookmarks'?. But is > > a lossless conversion between XBEL and Netscape still a goal? If not > > I think that 'separator' should go since it serves no real purpose. > > I'd leave it in; making XBEL specific to bookmarking (rather than > bookmark extraction) does not mean that the requirement for supporting > everything supported by Navigator and MSIE goes away. If it can't do > that, it can't be effectively used as an interchange medium, which I > think it should. (That's what the tools offered here provide, after > all.) It's ok with me, leave it in. > > Greg Stein wrote: > > This is kind of silly. XML is intended to encode the "name" as the > > actual tag. Why push this down another level? Using an "owner" tag, you > ... > > XML is enough of an abstraction; you don't want to start creating > > additional layers in there. The tendency should be towards additional > > Marc van Grootel wrote: > > I think it can extend the life-time of the DTD. Maybe then at a later > > stage common conventions could make it into the DTD as an explicit > > I think Greg has a very good point. There's no reason that the > contents of <info> or <description> or <title> or anything else can't > be structured. The instance remains a well-formed XBEL document and > can be down-converted to valid XBEL easily if required. I also didn't mean to banish these more explicit elements if there's a good reason for them to be there. > > > element. This situation is better then defining only a few explicit > > elements for info which can lead to tag-abuse by different authors and > > Actually, the catch-all is a form of tag abuse, whereas introducing > new elements for specific applications is not. This doesn't mean that > there shouldn't be something like <meta>, only that we should be very > clear in the intended use of the element; it may not be as free-form > as we've left it at this point. (I still think captuing additional > data from Web pages is useful, and <meta> makes a lot of sense as a > mirror for data extracted from <meta> elements in the HTML documents.) Agreed. Both situations can lead to tag-abuse. For a first DTD I think the escape should be there (on-parole). > > > think they violate the idea of XML. I rather like one well-crafted DTD > > then having multiple DTD's with only minor differences. > > There should be one well-crafted base to start from, but as > information becomes more application-specific, it makes sense to use > "subclassed" DTDs. I have no problems with this; I just want to be > able to determine that the documents are XBEL documents, even if > actually of a "subclass", so that I can load them easily. But maybe > an architecture declaration would be just as useful. ;-) Right. Why twitch about that? :-) > > Maybe the 'name' attribute should be declared as NMTOKEN to restrict > > it to a name token. With <meta name="..">my data</meta> the content is > > This is good, if <meta> is kept. > > > One of the reasons for putting the url itself in an attribute would be > > the stricter constraints of CDATA and being able to make it > > #REQUIRED. As element content the parser cannot check if the element > > This is a good reason; I support this. > > > I looked through my bookmark list and there were several url's that > > looked like: > > > > http://someserver/somepage.html&var=x > [URL data discussion...] > > The appropriate solution is probably to spit out character > references for special characters (specifically, "<" and "&"). This > is trivial to implement, and the input would have to be handled > correctly according to XML rules anyway. There is no need to invoke > additional standards here; "URL encoding" is irrelevant in XML, and > has everything to do with the HTTP requests. Bookmarks are not > limited to the http: scheme, so why should we need that particular > encoding? Well some sort of encoding was in order. I picked the first one that came to me. What it boils down to that just sticking an url somewhere is not enough. These kind of issues should be clearly documented and belong to the DTD (the informal part). > > > Do we have to fix a limit for the depth of recursion or should this be > > left to every application. Maybe we should say that an XBEL > > application should at least be able to handle a depth of x folders. > > No. The DTD & associated documentation is about a data model, not > processing limitations. This issue is strictly an processing issue. I don't say it's absolutely necessary. But it's a consequence of our datamodel and somehwere there should be a hint about this. The DTD does not consist only of the formal data model but also other aspects that cannot be expressed formally in a DTD. Things like extra constraints on data etc (like the URL stuff). Marc --- Marc van Grootel bwaumg@urc.tue.nl From bwaumg@urc.tue.nl Wed Sep 16 17:21:28 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Wed, 16 Sep 1998 18:21:28 +0200 Subject: [XML-SIG] XBEL DTD as a meta-dtd Message-ID: <199809161621.SAA22530@asterix.urc.tue.nl> Geir Ove Gronmo wrote: > This all depends on which DTDs you have in mind. > TEI-lite and Docbook. > o Mapping between element content to element content is much easier that > mapping from element content to attribute values and vice versa (This is > not yet implemented in xmlarch, but will be soon). I was just about to try that :{ > > o Reordering is not possible (as far as I know). Don't think so either. Marc --- Marc van Grootel bwaumg@urc.tue.nl From akuchlin@cnri.reston.va.us Wed Sep 16 17:34:49 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 16 Sep 1998 12:34:49 -0400 (EDT) Subject: [XML-SIG] Anonymous CVS access, and current status Message-ID: <13823.57389.252087.244902@amarok.cnri.reston.va.us> Anonymous CVS access to the source tree of the Python/XML package is now available. A page with instructions is at http://www.python.org/sigs/xml-sig/anon-cvs.html Briefly: Run the following command to log in (the password is "xmlcvs"): cvs -d :pserver:xmlcvs@cvs.python.org:/projects/cvsroot login To check out the source tree, run: cvs -z3 -d :pserver:xmlcvs@cvs.python.org:/projects/cvsroot co xml That will place everything in a subdirectory named "xml". To update the code, run: cvs -z3 update -d -P Comments on all aspects of the package are welcomed. To propagate changes back into the source tree, post patches or suggestions on the SIG mailing list, send them to me privately, or, if you're maintaining a module, just release a new version and announce it. Other notes on the current status of things: * The CVS tree now also contains version 0.11 of Geir Ove Grønmo's xmlarch module. It would be imported as "from xml.arch import xmlarch". Geir, I've also taken the sample code from your xmlarch Web page and added it to the XML HOWTO. Reference documentation for the classes in xmlarch still has to be written, though. * The demo/ directory has been reorganized, with everything being split up into separate subdirectories instead of being all dumped in the same place. The most interesting new demo is the XBEL code, in demo/xbel/; this is mostly as it was posted by various people, and hacked around by me a bit, to make the {msie,ns,adr}_parse modules read the bookmark file and dump it as XBEL. xbel_parse.py can then read an XBEL file and dump it in various formats. Everything will need to be updated to use the final DTD. * The critical area, to my eyes, is still the DOM implementation; I'm partway through an attempt at matching the Proposed Recommendation, but the code doesn't even run yet, much less function properly. * Better-placed people in the XML community, please correct me on this: besides DOM, I don't see any XML-related technologies or standards that will be finalized any time soon. The first public XSL working draft just got released, and there are various XML-Data/DCD, XSchema, and other things being worked on, but none of those things will be finished within the next 6 months or so. Is my perception correct? Therefore, I draw the conclusion that, once the DOM implementation is updated, there's nothing very significant left to implement for 1.0 of the Python package, so we need only document things, have some nice sample code, and then we're done with XML proper for a while. Wide string support will remain as a problem, but that's a String-SIG problem. That's pretty much the same conclusion as in my last status update. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Not all readers are prepared, at all times, to make independent judgments. But the failure of modern education to equip them to do so even when they have the inclination creates a serious gap in modern culture. -- Robertson Davies, _A Voice from the Attic_ From wunder@infoseek.com Wed Sep 16 17:41:04 1998 From: wunder@infoseek.com (Walter Underwood) Date: Wed, 16 Sep 1998 09:41:04 -0700 Subject: [XML-SIG] XBEL DTD as a meta-dtd In-Reply-To: <199809161502.RAA19327@asterix.urc.tue.nl> Message-ID: <3.0.5.32.19980916094104.00cef710@corp> At 05:02 PM 9/16/98 +0200, Marc van Grootel wrote: > >I looked through my bookmark list and there were several url's that >looked like: > > http://someserver/somepage.html&var=x > > [...] > >The solution might be to url-encode url's. So the above url >becomes: > > http:%3A%2F%2Fsomeserver%2Fsomepage.html%26var%3Dx Use XML entities. Using two different kinds of escaping (XML and HTTP) in the same file is unnecessary and confusing. I've been saving URLs in XML in my product, and entities work fine. It turns out that you need the entities in other text too, since someone might use them in a bookmark name ("Arts & Crafts", "O'Reilly Books"). So just entify them. Here is a snippet of re-hackery to entify a string: # This pattern and replacement function are used to map characters # in a string to XML entities, like this: entities.sub(entsub,s) entities = re.compile('[&<>"\']') def entsub(matchobj): c = matchobj.group() if c == '&': return '&' elif c == '<': return '>' elif c == '>': return '<' elif c == "'": return ''' elif c == '"': return '"' else: return '' # logs a message here in my application Always, always entify strings as you generate XML. If you slip in an unescaped special character, you can lose the a whole file worth of data by making it un-parseable (or make someone manually edit it to get it back). Finally, XBEL is doing things that are also done by the Resource Description Format (RDF). Though the RDF spec is hard to read, and may fail just because it is drowning in AI-speak rather than being useful, it is worth taking a look at. wunder Walter R. Underwood wunder@infoseek.com wunder@best.com (home) http://www.best.com/~wunder/ 1-408-543-6946 From Fred L. Drake, Jr." <fdrake@acm.org Wed Sep 16 18:21:33 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 16 Sep 1998 13:21:33 -0400 (EDT) Subject: [XML-SIG] Anonymous CVS access, and current status In-Reply-To: <13823.57389.252087.244902@amarok.cnri.reston.va.us> References: <13823.57389.252087.244902@amarok.cnri.reston.va.us> Message-ID: <13823.62365.211321.912636@weyr.cnri.reston.va.us> Andrew M. Kuchling writes: > Anonymous CVS access to the source tree of the Python/XML package is > now available. A page with instructions is at Cool; works great! > dumped in the same place. The most interesting new demo is the XBEL > code, in demo/xbel/; this is mostly as it was posted by various > people, and hacked around by me a bit, to make the {msie,ns,adr}_parse This will definately be useful, esp. once the DTD is updated a little. I'm not sure of the current state, actually. (Marc, are you planning to post an update, or would you like me to integrate the most recent discussion? I'm not sure of the state of <meta>; I don't recall any conclusive "it must be this way".) -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Fred L. Drake, Jr." <fdrake@acm.org Wed Sep 16 18:23:49 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 16 Sep 1998 13:23:49 -0400 (EDT) Subject: [XML-SIG] XBEL DTD as a meta-dtd In-Reply-To: <199809161616.SAA22462@asterix.urc.tue.nl> References: <199809161616.SAA22462@asterix.urc.tue.nl> Message-ID: <13823.62501.26409.535399@weyr.cnri.reston.va.us> Marc van Grootel writes: > I don't say it's absolutely necessary. But it's a > consequence of our datamodel and somehwere there should be a hint > about this. The DTD does not consist only of the formal data model but > also other aspects that cannot be expressed formally in a DTD. Things > like extra constraints on data etc (like the URL stuff). Ok, this is good. We should include in the document comments about these processing issues, pointing out that data needs to be "entified" and that the model doesn't restrict the depth of the hierarchy. Has anyone started on the "informal" part of the DTD? -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From akuchlin@cnri.reston.va.us Wed Sep 16 18:55:37 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 16 Sep 1998 13:55:37 -0400 (EDT) Subject: [XML-SIG] XBEL DTD as a meta-dtd In-Reply-To: <3.0.5.32.19980916094104.00cef710@corp> References: <199809161502.RAA19327@asterix.urc.tue.nl> <3.0.5.32.19980916094104.00cef710@corp> Message-ID: <13823.63954.24840.806558@amarok.cnri.reston.va.us> Walter Underwood writes: >Always, always entify strings as you generate XML. If you slip >in an unescaped special character, you can lose the a whole >file worth of data by making it un-parseable (or make someone >manually edit it to get it back). That reminds me of something: quite a lot of XML-related code will need to entify code, so there should really be a utility function available to do this. Other generally useful functions may become apparent, too, so I propose an xml.util module. For now, it'll only have 1 function: def escape(string, entity_dict = {}): Escapes &, ", and ' in string. If entity_dict is specified, it must be a dictionary mapping strings to their entity replacements. For example, passing {'\234': 'ê'} would cause the 8-bit character chr(234) to be replaced with ê. Thoughts? Should there be a way to specific a character range which would be escaped numerically, as * or whatever? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ I was honestly very nervous of Constance Wheatcroft. And I wasn't the only one. Her entire family was afraid of her. Dogs were afraid of her. Bindweed in the hedge would wither as she passed; birds would forget their nesting instincts and fly back to north Africa at the sound of her hideous cries. -- Tom Baker, in his autobiography From larsga@ifi.uio.no Fri Sep 18 10:19:45 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 18 Sep 1998 11:19:45 +0200 Subject: [XML-SIG] XBEL DTD as a meta-dtd References: <199809161502.RAA19327@asterix.urc.tue.nl> <3.0.5.32.19980916094104.00cef710@corp> <13823.63954.24840.806558@amarok.cnri.reston.va.us> Message-ID: <wkbtodu6a6.fsf@ifi.uio.no> * Andrew M. Kuchling | | Thoughts? Should there be a way to specific a character range which | would be escaped numerically, as * or whatever? I think the xml.util module makes perfect sense, as does the escape function. I think we'll also eventually want an XMLWriter class to simplify XML generation as well. I was about to create a module for myself with these two things anyway. Here is the escape function I use for element content: def escape(str): return string.replace(string.replace(str,'&',"&"),"<","<") Here is my XMLWriter (note that it is written for data-oriented documents, not document-like ones): # A simple XML-generator import sys,string class XMLWriter: def __init__(self,out=sys.stdout): self.out=out self.stack=[] def doctype(self,root,pubid,sysid): if pubid==None: self.out.write("<!DOCTYPE %s SYSTEM '%s'>\n" % (root,sysid)) else: self.out.write("<!DOCTYPE %s PUBLIC '%s' '%s'>\n" % (root,pubid, sysid)) def push(self,elem,attrs={}): self.__indent() self.out.write("<"+elem) for (a,v) in attrs.items(): self.out.write(" %s='%s'" % (a,self.__escape_attr(v)) self.out.write(">\n") self.stack.append(elem) def elem(self,elem,content,attrs={}): self.__indent() self.out.write("<"+elem) for (a,v) in attrs.items(): self.out.write(" %s='%s'" % (a,self.__escape_attr(v)) self.out.write(">%s</%s>\n" % (self.__escape_cont(content),elem)) def empty(self,elem,attrs={}): self.__indent() self.out.write("<"+elem) for a in attrs.items(): self.out.write(" %s='%s'" % a) self.out.write("/>\n") def pop(self): elem=self.stack[-1] del self.stack[-1] self.__indent() self.out.write("</%s>\n" % elem) def __indent(self): self.out.write(" "*(len(self.stack)*2)) def __escape_cont(self,str): return string.replace(string.replace(str,'&',"&"),"<","<") def __escape_attr(self,str): str=string.replace(str,'&',"&") return string.replace(string.replace(str,"'","'"),"<","<") --Lars M. From grove@infotek.no Fri Sep 18 13:48:18 1998 From: grove@infotek.no (Geir Ove Gronmo) Date: Fri, 18 Sep 1998 14:48:18 +0200 Subject: [XML-SIG] Anonymous CVS access, and current status In-Reply-To: <13823.57389.252087.244902@amarok.cnri.reston.va.us> Message-ID: <199809181247.OAA24090@mail.infotek.no> At 12:34 16.09.98 -0400, A.M. Kuchling wrote: > * Better-placed people in the XML community, please correct me >on this: besides DOM, I don't see any XML-related technologies or >standards that will be finalized any time soon. The first public XSL >working draft just got released, and there are various XML-Data/DCD, >XSchema, and other things being worked on, but none of those things >will be finished within the next 6 months or so. Is my perception >correct? There are some XML-related technologies technologies that I would like to see implemented/integrated into the Python environment. I wouldn't expect any of them to be included into the version 1.0 of the Python/XML package though. o XLink ( http://www.w3.org/TR/1998/WD-xlink-19980303 ) o The Hytime modules ( http://www.hytime.org/ ) o The SGML Extended Facilies ( http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.html ) o Topic Navigation Maps ( http://www.hightext.com/tnm/ ) o Resource Description Framework ( http://www.w3.org/RDF/ ) o A Python wrapper module for the SP package written by James Clark. ( http://www.jclark.com/sp/ ) o A SAX driver for nsgmls Geir O. ================== Geir Ove Grønmo ================== | STEP Infotek as, Gjerdrumsvei 12, 0486 Oslo, Norway | | grove@infotek.no http://www.infotek.no/ | ------------------------------------------------------- From larsga@ifi.uio.no Fri Sep 18 13:59:35 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 18 Sep 1998 14:59:35 +0200 Subject: [XML-SIG] Anonymous CVS access, and current status In-Reply-To: <199809181247.OAA24090@mail.infotek.no> References: <199809181247.OAA24090@mail.infotek.no> Message-ID: <wklnnhshjc.fsf@ifi.uio.no> * Geir Ove Gronmo | | o A Python wrapper module for the SP package written by James Clark. ( | http://www.jclark.com/sp/ ) We have this already: <URL:http://itrc.uwaterloo.ca:80/~papresco/pysgml/> | o A SAX driver for nsgmls I'm currently working on this. (Or rather, a general ESIS parser, since that can be used with other parsers as well.) However, I have a problem with os.popen in that error messages are written to the console (on Win95) and not to my Python process. Once I've figured that one out we have a SAX driver for nsgmls. Of course, an OLE-based driver would also be nice, but I'm leaving that for someone else (for now at least). It should be pretty easy do to, since Paul Prescods PySGML (see above) has a module that uses OLE to communicate with nsgmls (well, SP). Other than this I whole-heartedly agree with Geir Ove's suggestions. --Lars M. From akuchlin@cnri.reston.va.us Fri Sep 18 16:01:31 1998 From: akuchlin@cnri.reston.va.us (Andrew Kuchling) Date: Fri, 18 Sep 1998 11:01:31 -0400 (EDT) Subject: [XML-SIG] Anonymous CVS access, and current status In-Reply-To: <199809181247.OAA24090@mail.infotek.no> References: <13823.57389.252087.244902@amarok.cnri.reston.va.us> <199809181247.OAA24090@mail.infotek.no> Message-ID: <13826.30133.623948.638103@newcnri.cnri.reston.va.us> Geir Ove Gronmo writes: > o XLink ( http://www.w3.org/TR/1998/WD-xlink-19980303 ) Some form of XLink support seems reasonable. > o The Hytime modules ( http://www.hytime.org/ ) > o The SGML Extended Facilies ( >http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.html ) I couldn't figure out from the above links exactly what you're suggesting. Is it simply the specified architectural form support described at both the above links? > o Topic Navigation Maps ( http://www.hightext.com/tnm/ ) > o Resource Description Framework ( http://www.w3.org/RDF/ ) Topic Navigation Maps seems to be a meta-DTD, and RDF is a DTD, so I'm not sure that they belong in the basic package, but certainly they could be developed and distributed separately. I don't think specific DTDs or meta-DTDs are suitable for the basic package, unless they turn out to be really, *really* common. They're OK for demos, of course, but I don't want to install lots of code that most people won't use much. > o A Python wrapper module for the SP package written by James Clark. ( >http://www.jclark.com/sp/ ) > o A SAX driver for nsgmls The Python code should also work fine in JPython (unless 1.5-isms have crept in), so a driver for Java SAX interfaces would also be useful. Before 1.0 I'll look into that. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ First, you must know what the thing is, and then after learn the use of the same. -- Robert Recorde From akuchlin@cnri.reston.va.us Mon Sep 21 21:04:45 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 21 Sep 1998 16:04:45 -0400 (EDT) Subject: [XML-SIG] DOM: backward compatibility Message-ID: <13830.45088.437497.816628@amarok.cnri.reston.va.us> Working on the DOM code this weekend, I realized that quite a few things will be broken by going to the most recent draft. How important is backwards compatibility with the current DOM code? I'm not sure how much existing DOM code is out there that will be broken by incompatible changes, because I'm not sure if people were seriously using it or not. So if you're using the current DOM code for something, please let me know so I can gauge how important compatibility is. Thanks! -- A.M. Kuchling http://starship.skyport.net/crew/amk/ And as soon as he was sure that he was dead, he got up and shook himself, and looked around, and there waiting for him on the bed was his wife, with long claws out, and her eyes blazing like a green cat ready to spring. -- Magda's story, in SANDMAN #62: "The Kindly Ones:6" From guido@CNRI.Reston.Va.US Mon Sep 21 21:29:27 1998 From: guido@CNRI.Reston.Va.US (Guido van Rossum) Date: Mon, 21 Sep 1998 16:29:27 -0400 Subject: [XML-SIG] DOM: backward compatibility In-Reply-To: Your message of "Mon, 21 Sep 1998 16:04:45 EDT." <13830.45088.437497.816628@amarok.cnri.reston.va.us> References: <13830.45088.437497.816628@amarok.cnri.reston.va.us> Message-ID: <199809212029.QAA07280@eric.CNRI.Reston.Va.US> > Working on the DOM code this weekend, I realized that quite a few > things will be broken by going to the most recent draft. How > important is backwards compatibility with the current DOM code? > > I'm not sure how much existing DOM code is out there that will > be broken by incompatible changes, because I'm not sure if people were > seriously using it or not. So if you're using the current DOM code > for something, please let me know so I can gauge how important > compatibility is. I don't know anything about DOM code or how much it is used, but I would like to relate an anecdote that I once heard about the original Unix "Make" program. This was probably around Unix v6. Someone inside Bell Labs complained to the author about a particular misfeature (I believe it was about requiring an actual tab character, -- instead of any whitespace -- to start a command). The author responded that he agreed that it was a misfeature, but that there were already more than ten users (all inside Bell Labs), so that for reasons of backwards compatibility he couldn't change it. Please, be considerate of the future and make things right! --Guido van Rossum (home page: http://www.python.org/~guido/) From larsga@ifi.uio.no Mon Sep 21 21:30:21 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 21 Sep 1998 22:30:21 +0200 Subject: [XML-SIG] DOM: backward compatibility In-Reply-To: <13830.45088.437497.816628@amarok.cnri.reston.va.us> References: <13830.45088.437497.816628@amarok.cnri.reston.va.us> Message-ID: <wkvhmh2opu.fsf@ifi.uio.no> * Andrew M. Kuchling | | How important is backwards compatibility with the current DOM code? I'd say that conforming to the final DOM recommendation must have priority over backwards compatibility. Those to whom backward compatibility is very important can just keep using the old version anyway. | I'm not sure how much existing DOM code is out there that will be | broken by incompatible changes, because I'm not sure if people were | seriously using it or not. I'm using it, both for personal conversion scripts (the XML tools list, for example) and PyPointers, but I've known right from the start that I would have to update my code to follow later DOM releases. In other words: no complaints from me. --Lars M. From akuchlin@cnri.reston.va.us Mon Sep 21 22:27:32 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 21 Sep 1998 17:27:32 -0400 (EDT) Subject: [XML-SIG] DOM: backward compatibility In-Reply-To: <199809212029.QAA07280@eric.CNRI.Reston.Va.US> References: <13830.45088.437497.816628@amarok.cnri.reston.va.us> <199809212029.QAA07280@eric.CNRI.Reston.Va.US> Message-ID: <13830.50176.16488.120316@amarok.cnri.reston.va.us> Guido van Rossum writes: >Unix "Make" program. This was probably around Unix v6. Someone >inside Bell Labs complained to the author about a particular >misfeature (I believe it was about requiring an actual tab character, >-- instead of any whitespace -- to start a command). The author >responded that he agreed that it was a misfeature, but that there were >already more than ten users (all inside Bell Labs), so that for >reasons of backwards compatibility he couldn't change it. I've heard that story. I also once read an interview somewhere where Dennis Ritchie, when asked if he would have done anything differently in Unix, answered "Spelled creat() correctly." -- A.M. Kuchling http://starship.skyport.net/crew/amk/ In Rome, in Leningrad, in Darwin. "The door flew open, in he ran, the great, long, red-legged scissorman." -- From DOOM PATROL #20 From akuchlin@cnri.reston.va.us Mon Sep 21 23:08:18 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 21 Sep 1998 18:08:18 -0400 (EDT) Subject: [XML-SIG] Utility functions (was: XBEL DTD as a meta-dtd) In-Reply-To: <wkbtodu6a6.fsf@ifi.uio.no> References: <199809161502.RAA19327@asterix.urc.tue.nl> <3.0.5.32.19980916094104.00cef710@corp> <13823.63954.24840.806558@amarok.cnri.reston.va.us> <wkbtodu6a6.fsf@ifi.uio.no> Message-ID: <13830.52482.568860.535654@amarok.cnri.reston.va.us> Lars Marius Garshol writes: >def escape(str): > return string.replace(string.replace(str,'&',"&"),"<","<") According to section 2.4 of the spec, > also needs to be escaped as > when it's preceded by ]] ; ]]> needs to be ]]>. It's probably simplest to always escape > as >, even when it's not necessary. >Here is my XMLWriter (note that it is written for data-oriented >documents, not document-like ones): An interesting class. What do people think: should it be added somewhere? One could obtain similar results by creating a DOM tree and then linearising it, but that's also more complicated to learn, so I don't think the XMLWriter class would be completely redundant. On the other hand, perhaps it should be layered on top of DOM, and if it turns out that most XML users know the DOM API anyway, then XMLWriter really is redundant after all. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ You played me well, mortal. But I have played me for time out of mind. And I do Robin Goodfellow better than anyone. -- Robin Goodfellow, in SANDMAN #19: "A Midsummer Night's Dream" From Jeff.Johnson@stn.siemens.com Wed Sep 23 15:38:00 1998 From: Jeff.Johnson@stn.siemens.com (Jeff.Johnson@stn.siemens.com) Date: Wed, 23 Sep 1998 10:38:00 -0400 Subject: [XML-SIG] DOM - where can we get the new stuff? Message-ID: <85256688.00507A97.00@BI01.boca.ssc.siemens.com> I spoke (via email) to Stefane Fermigier about the parent-child circular references in the old DOM package and she said that someone else was working on it. I've been hoping to get the new package so I could stop leaking memory. Is there a way to get the updates now? * Andrew M. Kuchling | | How important is backwards compatibility with the current DOM code? As far as backwards compatibility, I'll rewrite my code, no complaints from me and thanks for working on it! From akuchlin@cnri.reston.va.us Wed Sep 23 16:51:51 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 23 Sep 1998 11:51:51 -0400 (EDT) Subject: [XML-SIG] DOM - where can we get the new stuff? In-Reply-To: <85256688.00507A97.00@BI01.boca.ssc.siemens.com> References: <85256688.00507A97.00@BI01.boca.ssc.siemens.com> Message-ID: <13833.6128.887744.183677@amarok.cnri.reston.va.us> Jeff.Johnson@stn.siemens.com writes: >I spoke (via email) to Stefane Fermigier about the parent-child circular >references in the old DOM package and she said that someone else was >working on it. I've been hoping to get the new package so I could stop >leaking memory. Is there a way to get the updates now? I haven't yet committed any of the updates to the CVS tree, because they're not finished yet. Because the changes are so extensive, I'll send out an announcement to this list when I actually commit them. (Currently I'm still going through the DOM draft, and implementing all the methods and attributes described.) -- A.M. Kuchling http://starship.skyport.net/crew/amk/ It would take days to catalog your sins, Abbé. I simply don't have the time. -- Sebastian, in SEBASTIAN O #2 From Fred L. Drake, Jr." <fdrake@acm.org Wed Sep 23 17:41:45 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 23 Sep 1998 12:41:45 -0400 (EDT) Subject: [XML-SIG] XBEL revision Message-ID: <13833.7733.555953.186008@weyr.cnri.reston.va.us> In working on the bookmarks support for Grail, I'm taking a hard look at how I do XML processing. To do everything I want with a well-formed (but not necessarily valid) XBEL instance, a fair amount of special treatment may be needed to avoid being destructive of additional content in the file. I'll try and discuss the general issues of programmatic editing of well-formed XML in another message, probably not today. The issue which immediately concerns me in this message is the <info> element. Marc van Grootel has proposed that it simply contain (meta*), for whatever definition of <meta> is decided on. Greg Stein rightly pointed out that there's a level of silliness to specifying a particular construct for potentially ad-hoc data that can be stored in the <info> element if we use well-formed XML rather than valid XML (which is supposedly one advantage for XML over SGML). This is not to say that there isn't a need for something that stores ad-hoc information that has some level of structure. It is reasonable to separate information stored about the resource identified by a <bookmark> and application information which relates to the <bookmark>. I'd like to propose that distinct elements be defined for each, and include an attribute on the application-data element which can be used to specify which processing application it pertains to. This allows each application to locate its own data while more easily avoiding contanimation of other applications' state. Specifically, let's define <metadata> and <appdata>, adjusting <bookmark> and <xbel> accordingly: <!ELEMENT xbel (title?, (bookmark|folder|url|alias|separator)*)> <!ELEMENT bookmark (metadata?, url, desc?, appdata*)> <!ELEMENT metadata (#PCDATA)> <!ELEMENT appdata (#PCDATA)> <!ATTLIST appdata application CDATA #REQUIRED > Structuring it this way and documenting our intentions for <metadata> and <appdata> makes processing XBEL a bit more clear for applications which want more than simple hierarchical bookmarks, while maintaining a fairly simple exchange DTD usable for advanced applications as well. (The original application, as I recall! ;) Note that a name for Jack Jansen's "pass" node still needs to be determined, and the appropriate content-model changes incorporated into the DTD. Jack, if you can come up with a good name, I'll be glad to integrate it into the DTD. "pass" probably isn't clear enough outside the Python community. (Yes, I think we're getting this into shape to be a very usable document type.) -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Jack.Jansen@cwi.nl Thu Sep 24 11:56:33 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Thu, 24 Sep 1998 12:56:33 +0200 Subject: [XML-SIG] XBEL revision In-Reply-To: Message by "Fred L. Drake" <fdrake@cnri.reston.va.us> , Wed, 23 Sep 1998 12:41:45 -0400 (EDT) , <13833.7733.555953.186008@weyr.cnri.reston.va.us> Message-ID: <UTC199809241056.MAA18471.jack@snelboot.cwi.nl> > Note that a name for Jack Jansen's "pass" node still needs to be > determined, and the appropriate content-model changes incorporated > into the DTD. Jack, if you can come up with a good name, I'll be glad > to integrate it into the DTD. "pass" probably isn't clear enough > outside the Python community. (Yes, I think we're getting this into > shape to be a very usable document type.) After a little more thinking I'm not sure whether "pass" is worth it. It can solve the problem of determining whether a node was deleted in one bookmark file or added in the other, but there are various other issues it doesn't solve, such as moves. All in all I'm not sure anymore whether a feature that solves only 90% of the cases is really worth it... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Jack.Jansen@cwi.nl Thu Sep 24 14:14:36 1998 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Thu, 24 Sep 1998 15:14:36 +0200 Subject: [XML-SIG] Converting HTML to XML, advise wanted Message-ID: <UTC199809241314.PAA20323.jack@snelboot.cwi.nl> I have to do a (partial) translation of HTML to an XML-based format (RealText, to be specific), and I'm a bit uncertain as to how to proceed. Half a year ago I would have just used htmllib.py to parse the html and used a formatter.py based class to generate the XML, but nowadays I sort of have the feeling that a DOM based approach might be a better path. But, as I've only glanced at the DOM stuff on this list I'm not 100% convinced that this is indeed the best way to go. It seems PyDOM contains all the needed stuff, but again I'm not completely sure of this. Does anyone have any insights to share? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Fred L. Drake, Jr." <fdrake@acm.org Thu Sep 24 15:50:42 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Thu, 24 Sep 1998 10:50:42 -0400 (EDT) Subject: [XML-SIG] XBEL revision In-Reply-To: <UTC199809241056.MAA18471.jack@snelboot.cwi.nl> References: <fdrake@cnri.reston.va.us> <13833.7733.555953.186008@weyr.cnri.reston.va.us> <UTC199809241056.MAA18471.jack@snelboot.cwi.nl> Message-ID: <13834.23378.9690.598955@weyr.cnri.reston.va.us> Jack Jansen writes: > solve the problem of determining whether a node was deleted in one bookmark > file or added in the other, but there are various other issues it doesn't > solve, such as moves. All in all I'm not sure anymore whether a feature that > solves only 90% of the cases is really worth it... Good point. When you figure out the how to better approach it, we can devise an XBEL 2.0, or something else that includes the required constructs. I think a version/variant that supports the kind of syncing that you seem to be dealing with should be published to allow other browsers to support it as well. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Fred L. Drake, Jr." <fdrake@acm.org Thu Sep 24 19:47:16 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Thu, 24 Sep 1998 14:47:16 -0400 (EDT) Subject: [XML-SIG] XBEL DTD Message-ID: <13834.37812.512942.169983@weyr.cnri.reston.va.us> --/kOlW/3UHa Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit Since I've not heard any comments other than Jack's regarding the changes I've suggested to XBEL, I'm attaching a new (complete) DTD below. I think Andrew really wants to get an updated version into the CVS repository, and I'd like to get it finallized as well. If there are no problems with the DTD over the next several days, I'll start on the documentation. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 --/kOlW/3UHa Content-Type: text/xml Content-Description: Proposed XBEL DTD Content-Disposition: inline; filename="xbel.dtd" Content-Transfer-Encoding: 7bit <!ELEMENT xbel (title?, (bookmark|folder|url|alias|separator)*)> <!ATTLIST xbel version CDATA #FIXED "1.0" > <!ELEMENT title (#PCDATA)> <!--=================== Info blocks ===============================--> <!-- There's an implicit understanding that metadata and appdata will not just be #PCDATA but will contain application-specific elements. There may be some need for multiple metadata elements tagged similarly to the appdata elements. --> <!ELEMENT metadata (#PCDATA)> <!ELEMENT appdata (#PCDATA)> <!ATTLIST appdata id ID #IMPLIED application CDATA #REQUIRED > <!--=================== Folder ====================================--> <!ELEMENT folder (title?,info?,desc?,(bookmark|folder|separator|alias|url)*)> <!ATTLIST folder id ID #IMPLIED added CDATA #IMPLIED folded (yes|no) 'yes' > <!--=================== URL ======================================--> <!ELEMENT url (#PCDATA)> <!ATTLIST url id ID #IMPLIED added CDATA #IMPLIED href CDATA #REQUIRED visited CDATA #IMPLIED modified CDATA #IMPLIED response CDATA #IMPLIED checked CDATA #IMPLIED > <!--=================== Bookmark ==================================--> <!-- a wrapper around an url when it has to contain extra info like a description and info blocks --> <!ELEMENT bookmark (metadata?, url, desc?, appdata*)> <!ELEMENT desc (#PCDATA)> <!--=================== Separator =================================--> <!ELEMENT separator EMPTY> <!--=================== Alias =====================================--> <!ELEMENT alias EMPTY> <!ATTLIST alias ref IDREF #REQUIRED > --/kOlW/3UHa-- From akuchlin@cnri.reston.va.us Thu Sep 24 20:50:31 1998 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Thu, 24 Sep 1998 15:50:31 -0400 (EDT) Subject: [XML-SIG] XBEL DTD In-Reply-To: <13834.37812.512942.169983@weyr.cnri.reston.va.us> References: <13834.37812.512942.169983@weyr.cnri.reston.va.us> Message-ID: <13834.40878.343153.718626@amarok.cnri.reston.va.us> Fred L. Drake writes: >changes I've suggested to XBEL, I'm attaching a new (complete) DTD >below. I think Andrew really wants to get an updated version into the >CVS repository, and I'd like to get it finallized as well. If there >are no problems with the DTD over the next several days, I'll start on >the documentation. I'm interested in seeing the XBEL work completed for a few reasons. First, it makes a pretty good demonstration program for the XML toolkit, because it does some real work, but it's not so complicated that it's difficult to understand. Second, XBEL is something that could be pretty useful. The conversion scripts can be made really useful with a little work; they simply need to find the current user's bookmark file, and automatically dump it out as XBEL. I'll eventually do this for lynx_parse.py and xbel_parse.py, though it'll be a while before I manage to do that; if anyone wants to grab the CVS tree and update the code, feel free. Once the DTD is settled and the software is updated to match, it would then be nice to publicize the DTD a little bit: list it on schema.net, see if the Mozilla people are interested, etc. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ I do not have a psychiatrist and I do not want one, for the simple reason that if he listened to me long enough, he might become disturbed. -- James Thurber, "Carpe Noctem, If You Can", in _Credos and Curios_ (1962) From larsga@ifi.uio.no Thu Sep 24 23:27:17 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 25 Sep 1998 00:27:17 +0200 Subject: [XML-SIG] XBEL DTD In-Reply-To: <13834.40878.343153.718626@amarok.cnri.reston.va.us> References: <13834.37812.512942.169983@weyr.cnri.reston.va.us> <13834.40878.343153.718626@amarok.cnri.reston.va.us> Message-ID: <wkyar9xi2i.fsf@ifi.uio.no> * Andrew M. Kuchling | | Once the DTD is settled and the software is updated to match, it | would then be nice to publicize the DTD a little bit: list it on | schema.net, see if the Mozilla people are interested, etc. xml-dev would be a very logical place to publicize it, I think. --Lars M. From bwaumg@urc.tue.nl Fri Sep 25 21:05:34 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Fri, 25 Sep 1998 22:05:34 +0200 Subject: [XML-SIG] XBEL DTD Message-ID: <199809252005.WAA13905@asterix.urc.tue.nl> Hi, I've got a couple of comments on the latest XBEL DTD. The folder element still had a reference to an 'info' element which didn't exist anymore. However I would vote for leaving 'info' in as a container for the appdata and metadata. This makes it easier to skip the whole block of info at once. Here's some stuff that could go into the beginning of the DTD. <!ENTITY lt "&#60;"> <!ENTITY gt ">"> <!ENTITY amp "&#38;"> <!ENTITY apos "'"> <!ENTITY quot """> What other entities should be included? Or should everything be encoded with &# references. <!ENTITY % SPAMCANS 'bookmark|folder|url|alias|separator'> I would like to suggest the following content models: xbel (title?,info?,desc?,(&SPAMCANS;)* folder (title?,info?,desc?,(&SPAMCANS;)* bookmark (url,info?,desc?) It's not clear to me why there's only one metadata element allowed. How will metadata be used and how should one choose between metadata and appdata? I would guess that appdata is kinda private to a certain app. And metadata is data that one would like to share with other apps (public) like a list of keywords. If there's only one metadata element should new metadata stuff be appended to it's contents. I believe a minimal XML structure even for metadata is better then just declaring it #PCDATA. Marc -- Marc van Grootel bwaumg@urc.tue.nl From Fred L. Drake, Jr." <fdrake@acm.org Fri Sep 25 22:03:30 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Fri, 25 Sep 1998 17:03:30 -0400 (EDT) Subject: [XML-SIG] XBEL DTD In-Reply-To: <199809252005.WAA13905@asterix.urc.tue.nl> References: <199809252005.WAA13905@asterix.urc.tue.nl> Message-ID: <13836.1314.478066.340517@weyr.cnri.reston.va.us> --0d9vCTG2ok Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit Marc van Grootel writes: > The folder element still had a reference to an 'info' element which > didn't exist anymore. However I would vote for leaving 'info' in as a > container for the appdata and metadata. This makes it easier to skip Oops, my fault; I shouldn't drive XEmacs so fast! ;-) I'm fine with using <info> as a container for the <metadata> and <appinfo> elements. > <!ENTITY lt "&#60;"> > <!ENTITY gt ">"> > <!ENTITY amp "&#38;"> > <!ENTITY apos "'"> > <!ENTITY quot """> > > What other entities should be included? Or should everything be > encoded with &# references. Lots of stuff would be reasonable, but this is sufficient given that (I expect) most editing will be done by software other than a text editor. > <!ENTITY % SPAMCANS 'bookmark|folder|url|alias|separator'> > > I would like to suggest the following content models: > > xbel (title?,info?,desc?,(&SPAMCANS;)* > folder (title?,info?,desc?,(&SPAMCANS;)* > bookmark (url,info?,desc?) Ok, this looks good. > It's not clear to me why there's only one metadata element > allowed. How will metadata be used and how should one choose between > metadata and appdata? I would guess that appdata is kinda private to a > certain app. And metadata is data that one would like to share with > other apps (public) like a list of keywords. If there's only one > metadata element should new metadata stuff be appended to it's > contents. I believe a minimal XML structure even for metadata is > better then just declaring it #PCDATA. Hmm, my initial thought was that <metadata> would be (essentially) for things that are provided with the document, perhaps the contents of HTML <meta> and <link> elements, but you bring up a valid point: why the distinction between the two types of "related" data and why just one <metadata>. After taking another (brief) look at the immediate plans for the Dublin Core and the embedding-in-HTML approach those folks are advocating as a first step, I'll revise this stuff a little: <!ELEMENT info (metadata*)> <!ELEMENT metadata (meta*)> <!ATTLIST metadata id ID #IMPLIED scheme CDATA #IMPLIED lang CDATA #IMPLIED > <!ELEMENT meta (#PCDATA)> <!ATTLIST meta name CDATA #REQUIRED > An application that wants its own area to write in can simply use a private value for the scheme attribute. I've left the attribute #IMPLIED instead of #REQUIRED with the presumption that a <metadata> without a name can be used to stash HTML <meta> elements which don't specify a scheme attribute. Alternately, we can simply specify a scheme for this. (Should we register an owner identifier so we can create new FPIs? I think there's an option to use Internet domain names, so we could be "-//IDN python.org//" or something like that. At least we should make a recommendation regarding what a scheme identifier should be (URL, URN), but I think we still want to be able to assign FPIs to any DTDs that come out of our efforts.) I've attached the complete DTD as found in my emacs buffer below. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 --0d9vCTG2ok Content-Type: text/xml Content-Description: Proposed XBEL DTD Content-Disposition: inline; filename="xbel.dtd" Content-Transfer-Encoding: 7bit <!ENTITY lt "&#60;"> <!ENTITY gt ">"> <!ENTITY amp "&#38;"> <!ENTITY apos "'"> <!ENTITY quot """> <!ENTITY % NODES 'bookmark|folder|url|alias|separator'> <!ELEMENT xbel (title?, info?, desc?, (%NODES;)*)> <!ATTLIST xbel version CDATA #FIXED "1.0" > <!ELEMENT title (#PCDATA)> <!--=================== Info blocks ===============================--> <!ELEMENT info (metadata*)> <!ELEMENT metadata (meta*)> <!ATTLIST metadata id ID #IMPLIED scheme CDATA #IMPLIED lang CDATA #IMPLIED > <!ELEMENT meta (#PCDATA)> <!ATTLIST meta name CDATA #REQUIRED > <!--=================== Folder ====================================--> <!ELEMENT folder (title?,info?,desc?,(%NODES;)*)> <!ATTLIST folder id ID #IMPLIED added CDATA #IMPLIED folded (yes|no) 'yes' > <!--=================== URL ======================================--> <!ELEMENT url (#PCDATA)> <!ATTLIST url id ID #IMPLIED added CDATA #IMPLIED href CDATA #REQUIRED visited CDATA #IMPLIED modified CDATA #IMPLIED response CDATA #IMPLIED -- HTTP response code? -- checked CDATA #IMPLIED > <!--=================== Bookmark ==================================--> <!-- a wrapper around an url when it has to contain extra info like a description and info blocks --> <!ELEMENT bookmark (url, info?, desc?)> <!ELEMENT desc (#PCDATA)> <!--=================== Separator =================================--> <!ELEMENT separator EMPTY> <!--=================== Alias =====================================--> <!ELEMENT alias EMPTY> <!ATTLIST alias ref IDREF #REQUIRED > --0d9vCTG2ok-- From lisarein@finetuning.com Fri Sep 25 22:26:54 1998 From: lisarein@finetuning.com (Lisa Rein) Date: Fri, 25 Sep 1998 14:26:54 -0700 Subject: [XML-SIG] Re: XML-SIG digest, Vol 1 #107 - 3 msgs References: <199809251600.MAA22311@python.org> Message-ID: <360C0A9E.3861F940@finetuning.com> > | Once the DTD is settled and the software is updated to match, it > | would then be nice to publicize the DTD a little bit: list it on > | schema.net, see if the Mozilla people are interested, etc. > > xml-dev would be a very logical place to publicize it, I think. > --Lars M. i was thinking about writing about xbel in my book --would anyone object? lisa From digitome@iol.ie Sat Sep 26 10:20:21 1998 From: digitome@iol.ie (Sean Mc Grath) Date: Sat, 26 Sep 1998 10:20:21 +0100 Subject: [XML-SIG] XBEL DTD In-Reply-To: <199809252005.WAA13905@asterix.urc.tue.nl> Message-ID: <3.0.6.32.19980926102021.0092f5c0@gpo.iol.ie> [Marc van Grootel] >Here's some stuff that could go into the beginning of the DTD. > > <!ENTITY lt "&#60;"> > <!ENTITY gt ">"> > <!ENTITY amp "&#38;"> > <!ENTITY apos "'"> > <!ENTITY quot """> > These are built-in in to all conforming XML parsers. There is no need to declare them. ... >I would like to suggest the following content models: > > xbel (title?,info?,desc?,(&SPAMCANS;)* > folder (title?,info?,desc?,(&SPAMCANS;)* > bookmark (url,info?,desc?) This needs to be %SPAMCANS; (A parameter entity rather than a general entity reference. Cheers, Sean Mc Grath def Get_URI_Of_Superlative_Scripting_Language(): return "http://www.python.org" From bwaumg@urc.tue.nl Sat Sep 26 14:33:39 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Sat, 26 Sep 1998 15:33:39 +0200 Subject: [XML-SIG] XBEL DTD Message-ID: <199809261333.PAA26598@asterix.urc.tue.nl> > > [Marc van Grootel] > >Here's some stuff that could go into the beginning of the DTD. > > > > <!ENTITY lt "&#60;"> > > <!ENTITY gt ">"> > > <!ENTITY amp "&#38;"> > > <!ENTITY apos "'"> > > <!ENTITY quot """> > > > Sean Mc Grath wrote: > These are built-in in to all conforming XML parsers. There is no need to > declare them. > Yes I know but the XML Rec. states that: [4.6 Predefined Entities] For interoperability, valid XML documents should declare these entities, like any others, before using them. BTW I think it is a good idea to at least include the entities that HTML includes. Netscape's bookmark file is HTML and maybe there are others that store their bookmarks in HTML. It's silly if an application translates something like Ä into &uml; > .. > >I would like to suggest the following content models: > > > > xbel (title?,info?,desc?,(&SPAMCANS;)* > > folder (title?,info?,desc?,(&SPAMCANS;)* > > bookmark (url,info?,desc?) > > This needs to be %SPAMCANS; (A parameter entity rather than a general > entity reference. Oops. > Cheers, Marc -- Marc van Grootel bwaumg@urc.tue.nl From bwaumg@urc.tue.nl Sat Sep 26 16:43:04 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Sat, 26 Sep 1998 17:43:04 +0200 Subject: [XML-SIG] Re: XML-SIG digest, Vol 1 #107 - 3 msgs. Message-ID: <199809261543.RAA28183@asterix.urc.tue.nl> > > i was thinking about writing about xbel in my book --would anyone > object? > > lisa Nice. What's it about? The book, I mean. Marc -- Marc van Grootel bwaumg@urc.tue.nl From Fred L. Drake, Jr." <fdrake@acm.org Tue Sep 29 20:24:25 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Tue, 29 Sep 1998 15:24:25 -0400 (EDT) Subject: [XML-SIG] ISO 8601 date support Message-ID: <13841.13289.766318.206344@weyr.cnri.reston.va.us> I've just sent Andrew a module that parses and formats ISO 8601 dates, at least as far as the W3C profile supports. See http://www.w3.org/TR/NOTE-datetime for the profile. We're planning on it going into the xml.utils package. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From larsga@ifi.uio.no Tue Sep 29 22:02:44 1998 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 29 Sep 1998 22:02:44 +0100 Subject: [XML-SIG] ISO 8601 date support In-Reply-To: <13841.13289.766318.206344@weyr.cnri.reston.va.us> References: <13841.13289.766318.206344@weyr.cnri.reston.va.us> Message-ID: <wk7lymws23.fsf@ifi.uio.no> * Fred L. Drake | | I've just sent Andrew a module that parses and formats ISO 8601 | dates, at least as far as the W3C profile supports. See | http://www.w3.org/TR/NOTE-datetime for the profile. We're planning | on it going into the xml.utils package. Great! Definitely a highly useful piece for the cabal. I can think of several different places where this might be useful in my current XML stuff. --Lars M. From Fred L. Drake, Jr." <fdrake@acm.org Wed Sep 30 15:56:55 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 30 Sep 1998 10:56:55 -0400 (EDT) Subject: [XML-SIG] <url> checked and response attributes Message-ID: <13842.18103.510049.341490@weyr.cnri.reston.va.us> I seem to recall that the "checked" and "response" attributes of the <url> element were added to XBEL to support stuff that's tracked by MSIE. Could someone please clarify the purpose of these attributes for me? My current understanding is that "checked" should store the time when the browser last attempted to access the resource, regardless of success. Is the "response" attribute expected to store the HTTP response code? The code and the message? Or something else? How is it handled for resources not accessed via HTTP (omitted, perhaps?)? I'm planning to send Andrew the public text of the DTD shortly for updating the repository, but would like some clarification on these attributes before that. The DTD will be assigned the public identifier: -//IDN python.org//DTD XML Bookmark Exchange Language 1.0//EN I'll start working on the documentation this weekend. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From bwaumg@urc.tue.nl Wed Sep 30 17:48:45 1998 From: bwaumg@urc.tue.nl (Marc van Grootel) Date: Wed, 30 Sep 1998 18:48:45 +0200 Subject: [XML-SIG] <url> checked and response attributes Message-ID: <199809301648.SAA19336@asterix.urc.tue.nl> > > I seem to recall that the "checked" and "response" attributes of the > <url> element were added to XBEL to support stuff that's tracked by > MSIE. I suggested them in the very beginning because I thought they would be useful (not because it is tracked by MSIE - maybe it does but, does it?). > Could someone please clarify the purpose of these attributes > for me? My current understanding is that "checked" should store the > time when the browser last attempted to access the resource, > regardless of success. Is the "response" attribute expected to store > the HTTP response code? That was the idea. > The code and the message? Or something else? > How is it handled for resources not accessed via HTTP (omitted, > perhaps?)? Didn't think of that. Attributes for storing the status of a link is still a good idea but I didn't give the choice of attributes for supporting it much thought. > I'm planning to send Andrew the public text of the DTD shortly for > updating the repository, but would like some clarification on these > attributes before that. > The DTD will be assigned the public identifier: > > -//IDN python.org//DTD XML Bookmark Exchange Language 1.0//EN > > I'll start working on the documentation this weekend. > > > -Fred BTW what about the contents of the scheme and language attributes? I don't know the Dublin Core but I understand you got it from there? Could you give an example of an info element? Marc -- Marc van Grootel bwaumg@urc.tue.nl From Fred L. Drake, Jr." <fdrake@acm.org Wed Sep 30 19:07:22 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 30 Sep 1998 14:07:22 -0400 (EDT) Subject: [XML-SIG] <url> checked and response attributes In-Reply-To: <199809301648.SAA19336@asterix.urc.tue.nl> References: <199809301648.SAA19336@asterix.urc.tue.nl> Message-ID: <13842.29530.662350.571366@weyr.cnri.reston.va.us> Marc van Grootel writes: > I suggested them in the very beginning because I thought they > would be useful (not because it is tracked by MSIE - maybe it does > but, does it?). Good question; > Attributes for storing the status of a link is still a good idea > but I didn't give the choice of attributes for supporting it I agree that storing this kind of information is potentially very useful, especially if there's any software that can use it. I think a link-update monitor could very easily use such information. I am concerned about adding several "untested" attributes to a primary element in the hope that someone will actually write enough software that uses it (more than one app.). We might want to drop the attributes from <url> and create a <metadata> profile for it. > BTW what about the contents of the scheme and language attributes? I > don't know the Dublin Core but I understand you got it from there? > Could you give an example of an info element? Good idea. Metadata about objects (including documents) is typically given as a set of key/value pairs. The keys are usually just strings (like RFC 822 headers), and values may be strings (possibly constrained by the definition of the bit of metadata, i.e., it may be boolean, or numeric, or a date), or it may be structured in some way (XML, SGML, or whatever). The catch with metadata is to understand what it means (no AI here, though). To "understand" metadata, you need to understand what schema it conforms to. As an example, consider a library's cataloging system. To understand what a catalog number means, you need to know what kind of number was assigned: Dewey Decimal, U.S. Library of Congress (LOC), or something else. Since there's no reason not to assign catalog numbers for both Dewey Decimal and LOC use, an <info> for a book (with completely made up numbers; I don't remember either system well enough) might look like this: <info> <metadata scheme="Library of Congress"> <meta name="catalog number">TR567 A45.1</meta> </metadata> <metadata scheme="Dewey Decimal"> <meta name="catalog number">Z567 12</meta> </metadata> </info> The Dublin Core is a specific metadata system; particular bits of data about a resource are defined and given identifying keys. It is being used on the Web by some projects and the working group has dealt with issues related to embedding in HTML as well as semantics. An example of storing Dublin Core data in XBEL: <info> <metadata scheme="http://purl.oclc.org/metadata/dublin_core/" lang="en"> <meta name="Creator">Fred L. Drake, Jr. and Roger E. Masse</meta> <meta name="Publisher">Corporation for National Research Initiatives</meta> <meta name="Description">Python interface to the Kerberos V5 security package.</meta> <meta name="Identifier"> URN:hdl:1895.22/1001</meta> <meta name="Identifier"> URL:ftp://ftp.python.org/pub/python/contrib/System/krb5module-0.1.tar.gz</meta> </metadata> </info> Adding the scheme and lang attributes to <metadata> seems to make the most sense; typically, several metadata items from a single scheme will be used, with natural language text in a single language. More information on the Dublin Core is available at <http://purl.oclc.org/metadata/dublin_core/>. I'll try to include some useful examples and links in the XBEL documentation. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Fred L. Drake, Jr." <fdrake@acm.org Wed Sep 30 21:47:39 1998 From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake) Date: Wed, 30 Sep 1998 16:47:39 -0400 (EDT) Subject: [XML-SIG] <url> checked and response attributes In-Reply-To: <199809301648.SAA19336@asterix.urc.tue.nl> References: <199809301648.SAA19336@asterix.urc.tue.nl> Message-ID: <13842.39147.458594.392740@weyr.cnri.reston.va.us> Marc van Grootel writes: > I suggested them in the very beginning because I thought they > would be useful (not because it is tracked by MSIE - maybe it does > but, does it?). I think I sent my message before finishing my response on this. As far as I can tell, it does not. But that doesn't mean MSIE doesn't stash the information somewhere; I could only find the individual files that represent each bookmark. The only information was the title (encoded as the file name, sheesh!), the URL, and the modification time (in some undetermined format). -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191