From dieter@handshake.de Fri Jan 1 12:39:28 1999 From: dieter@handshake.de (Dieter Maurer) Date: Fri, 1 Jan 1999 13:39:28 +0100 Subject: [XML-SIG] Namespace support for DOM In-Reply-To: <13963.51590.804687.473078@amarok.cnri.reston.va.us> References: <13963.51590.804687.473078@amarok.cnri.reston.va.us> Message-ID: <199901011239.NAA00946@lindm.dm> Andrew M. Kuchling writes: > Indeed; I'm frightened of adding some sort of clever, > invalidate-namespaces-on-a-move, scheme and opening the door to lots > of subtle bugs. Also, the PyDOM representation has nodes with a list > of their children, and no parent pointers; this makes the traversing > of ancestors difficult. I'm somewhat tempted to toss the recently > announced WeakDict object into the XML package and add parent > pointers, but it may be too late to undertake such a large change to > the DOM code. Any opinions? If we decide to use the WeakDict module, I could help to adapt the DOM code. There is, however, some subtle semantic difference between the current implementation and a WeakDict based one. This difference shows up, when we get a reference to an internal node of a dom tree and then delete the dom tree (still holding the reference). In the current implementation, the referenced node contains a "_document" ("ownerDocument") attribute which protects the complete tree from being garbage collected. In a WeakDict based implementation, the reference to the "ownerDocument" is nutarally implemented as a week reference (as are the parent references). Deleting the dom tree deletes everything from its root down to the referenced internal node. Thus, this node looses its parent and the "ownerDocument" reference. It can only be used thereafter in a very restricted way. Dieter From spepping@scaprea.hobby.nl Sat Jan 2 19:50:14 1999 From: spepping@scaprea.hobby.nl (Simon Pepping) Date: Sat, 2 Jan 1999 20:50:14 +0100 (MET) Subject: [XML-SIG] Documentation and problems In-Reply-To: <009e01be3367$794b5220$529b90d1@synchrologic.com> Message-ID: On Tue, 29 Dec 1998, Frank McGeough wrote: > Simon, > > In your doc at : > http://www.hobby.nl/~scaprea/XML/t173.html > > I believe the > > 2. Call the parser factory with the name of a known driver module, e.g., > SAXparser=xml.sax.saxexts.make_parser("xml.sax.drivers.drv_xmlproc") > > is incorrect. The saxexts.py has the following code in it: > parser_name = 'xml.sax.drivers.drv_' + parser_name > > therefore you should create the parser with : > > SAXparser=xml.sax.saxexts.make_parser("xmlproc") > > This may have been a recent change. I just started in with > Python XML stuff. I have downloaded the xml-0_5.zip > version. That must indeed be a change from 0.4 to 0.5. I have updated my docs. Thanks for notifying me. > Thanks for putting that doc on-line. I found it very helpful. Good to hear. Simon Pepping email: spepping@scaprea.hobby.nl From dieter@handshake.de Mon Jan 4 19:55:35 1999 From: dieter@handshake.de (Dieter Maurer) Date: Mon, 4 Jan 1999 20:55:35 +0100 Subject: [XML-SIG] Wrong URL: addContentTable Message-ID: <199901041955.UAA00368@lindm.dm> Some days ago, I posted: > Based on our xml-0.5 release, I have made a small tool which adds > a hierarchical content table to HTML documents: > > URL:http://www.handshake.de/~dieter/pyprojects/addContentTable.html Unfortunately, I was unaware that my ISP converts letters in file names to lower case. Thus, the correct URL is: URL:http://www.handshake.de/~dieter/pyprojects/addcontenttable.html Sorry for the inconvenience! Dieter From Jeff.Johnson@icn.siemens.com Thu Jan 7 19:29:59 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Thu, 7 Jan 1999 14:29:59 -0500 Subject: [XML-SIG] XmlWriter update Message-ID: <852566F2.006B115C.00@li01.lm.ssc.siemens.com> --0__=fvpHh2vcDJQ5CEBan2fJcjeWZtCMC0eNns393xvSWedcGNOO0Rg9JzLq Content-type: text/plain; charset=us-ascii Content-Disposition: inline I moved some code from xml.dom.HtmlWriter up to the super class xml.dom.XmlWriter so that it is easier to specify where new lines should be inserted when writing XML. I hope you like it and it gets into the XML package or I'll have to rewrite my code :). This should be fully backwards compatible too. The following is an example of how the change allows us to specify that 'tree' elements should get new lines before and after the start tag and end tag. The 'node' element only gets a new line before the start tag. nl_dict = { 'tree':(1,1,1,1), 'node':(1,0,0,0), } w = XmlWriter(sys.stdout,nl_dict) w.write(doc) The new xml.dom.writer.py is attached. (See attached file: writer.py) Thanks --0__=fvpHh2vcDJQ5CEBan2fJcjeWZtCMC0eNns393xvSWedcGNOO0Rg9JzLq Content-type: application/octet-stream; name="writer.py" Content-Disposition: attachment; filename="writer.py" Content-transfer-encoding: base64 IiIid3JpdGVyOiB3cml0ZXIvbGluZWFyaXNlciBjbGFzc2VzIGZvciBkdW1waW5nIERPTSB0cmVl IHRvIGZpbGUuDQoNCiIiIg0KDQpmcm9tIHhtbC5kb20uY29yZSBpbXBvcnQgKg0KZnJvbSB4bWwu ZG9tLndhbGtlciBpbXBvcnQgV2Fsa2VyDQppbXBvcnQgc3RyaW5nLCByZSwgc3lzDQoNCmZyb20g eG1sLnV0aWxzIGltcG9ydCBlc2NhcGUNCgkNCg0KY2xhc3MgT3V0cHV0U3RyZWFtOg0KCWRlZiBf X2luaXRfXyhzZWxmLCBmaWxlKToNCgkJc2VsZi5maWxlID0gZmlsZQ0KCQlzZWxmLm5ld19saW5l ID0gMQ0KDQoJZGVmIHdyaXRlKHNlbGYsIHMpOg0KCQkjcHJpbnQgJ3dyaXRlJywgYHNgDQoJCXNl bGYuZmlsZS53cml0ZShyZS5zdWIoJ1xuKycsICdcbicsIHMpKQ0KCQlpZiBzIGFuZCBzWy0xXSA9 PSAnXG4nOg0KCQkJc2VsZi5uZXdfbGluZSA9IDENCgkJZWxzZToNCgkJCXNlbGYubmV3X2xpbmUg PSAwDQoNCglkZWYgbmV3TGluZShzZWxmKToNCgkJaWYgbm90IHNlbGYubmV3X2xpbmU6DQoJCQlz ZWxmLndyaXRlKCdcbicpDQoNCglkZWYgX19kZWxfXyhzZWxmKToNCgkJc2VsZi5maWxlLmZsdXNo KCkNCg0KDQpjbGFzcyBYbWxXcml0ZXIoV2Fsa2VyKToNCg0KCWRlZiBfX2luaXRfXyhzZWxmLCBz dHJlYW09c3lzLnN0ZG91dCwgbmxfZGljdD17fSk6DQoJCXNlbGYuc3RyZWFtID0gT3V0cHV0U3Ry ZWFtKHN0cmVhbSkNCgkJc2VsZi5lbXB0aWVzID0gW10NCgkJc2VsZi5zdHJpcCA9IFtdDQoJCXNl bGYueG1sX3N0eWxlX2VuZHRhZ3MgPSAxDQoJCXNlbGYubmV3bGluZV9iZWZvcmVfc3RhcnQgPSBb XQ0KCQlzZWxmLm5ld2xpbmVfYWZ0ZXJfc3RhcnQgPSBbXQ0KCQlzZWxmLm5ld2xpbmVfYmVmb3Jl X2VuZCA9IFtdDQoJCXNlbGYubmV3bGluZV9hZnRlcl9lbmQgPSBbXQ0KCQlzZWxmLm1hcF9hdHRy ID0gc2VsZi5tYXBfdGFnID0gbGFtYmRhIHg6IHgNCgkJc2VsZi5fc2V0TmV3TGluZXMobmxfZGlj dCkNCg0KCWRlZiBfc2V0TmV3TGluZXMoc2VsZixubF9kaWN0KToNCgkJZm9yIGssIHYgaW4gbmxf ZGljdC5pdGVtcygpOg0KCQkJaWYgdlswXToNCgkJCQlzZWxmLm5ld2xpbmVfYmVmb3JlX3N0YXJ0 LmFwcGVuZChrKQ0KCQkJCXNlbGYubmV3bGluZV9iZWZvcmVfc3RhcnQuYXBwZW5kKHN0cmluZy51 cHBlcihrKSkNCgkJCWlmIHZbMV06DQoJCQkJc2VsZi5uZXdsaW5lX2FmdGVyX3N0YXJ0LmFwcGVu ZChrKQ0KCQkJCXNlbGYubmV3bGluZV9hZnRlcl9zdGFydC5hcHBlbmQoc3RyaW5nLnVwcGVyKGsp KQ0KCQkJaWYgdlsyXToNCgkJCQlzZWxmLm5ld2xpbmVfYmVmb3JlX2VuZC5hcHBlbmQoaykNCgkJ CQlzZWxmLm5ld2xpbmVfYmVmb3JlX2VuZC5hcHBlbmQoc3RyaW5nLnVwcGVyKGspKQ0KCQkJaWYg dlszXToNCgkJCQlzZWxmLm5ld2xpbmVfYWZ0ZXJfZW5kLmFwcGVuZChrKQ0KCQkJCXNlbGYubmV3 bGluZV9hZnRlcl9lbmQuYXBwZW5kKHN0cmluZy51cHBlcihrKSkNCg0KCWRlZiB3cml0ZShzZWxm LCB4KToNCgkJaWYgdHlwZSh4KSA9PSB0eXBlKCcnKToNCgkJCXNlbGYuc3RyZWFtLndyaXRlKHgp DQoJCWVsaWYgdHlwZSh4KSBpbiAodHlwZSgoKSksIHR5cGUoW10pKToNCgkJCWZvciB5IGluIHg6 DQoJCQkJc2VsZi53cml0ZSh5KQ0KCQllbHNlOg0KCQkJc2VsZi53YWxrKHgpDQoNCg0KCWRlZiBz dGFydEVsZW1lbnQoc2VsZiwgZWxlbWVudCkgOg0KCQlhc3NlcnQgZWxlbWVudC5nZXRfbm9kZVR5 cGUoKSA9PSBFTEVNRU5UDQoNCgkJcyA9ICc8JXMnICUgc2VsZi5tYXBfdGFnKGVsZW1lbnQuZ2V0 X25vZGVOYW1lKCkgKQ0KCQkNCgkJZm9yIG5hbWUsIHZhbHVlIGluIGVsZW1lbnQuZ2V0X2F0dHJp YnV0ZXMoKS5pdGVtcygpOg0KCQkJcyA9IHMgKyAnICVzPSIlcyInICUgKHNlbGYubWFwX2F0dHIo bmFtZSksDQoJCQkJCSAgICAgIGVzY2FwZSh2YWx1ZS5nZXRfbm9kZVZhbHVlKCkgKSkNCg0KCQlp ZiBzZWxmLnhtbF9zdHlsZV9lbmR0YWdzIGFuZCBub3QgZWxlbWVudC5nZXRfY2hpbGROb2Rlcygp Og0KCQkJcyA9IHMgKyAnLz4nDQoJCWVsc2U6DQoJCQlzID0gcyArICc+Jw0KDQoJCWlmIGVsZW1l bnQuZ2V0X25vZGVOYW1lKCkgaW4gc2VsZi5uZXdsaW5lX2JlZm9yZV9zdGFydDoNCgkJCXNlbGYu c3RyZWFtLm5ld0xpbmUoKQ0KCQlzZWxmLnN0cmVhbS53cml0ZShzKQ0KCQlpZiBlbGVtZW50Lmdl dF9ub2RlTmFtZSgpIGluIHNlbGYubmV3bGluZV9hZnRlcl9zdGFydDoNCgkJCXNlbGYuc3RyZWFt Lm5ld0xpbmUoKQ0KDQoNCglkZWYgZW5kRWxlbWVudChzZWxmLCBlbGVtZW50KToNCgkJYXNzZXJ0 IGVsZW1lbnQuZ2V0X25vZGVUeXBlKCkgPT0gRUxFTUVOVA0KDQoJCXMgPSAnJw0KCQlpZiBlbGVt ZW50LmdldF9ub2RlTmFtZSgpIGluIHNlbGYuZW1wdGllcyA6DQoJCQlwYXNzDQoJCWVsaWYgbGVu KGVsZW1lbnQuZ2V0X2NoaWxkTm9kZXMoKSApID09IDAgYW5kIHNlbGYueG1sX3N0eWxlX2VuZHRh Z3M6DQoJCQlwYXNzDQoJCWVsc2U6DQoJCQlzID0gcyArICc8LyVzPicgJSBzZWxmLm1hcF90YWco ZWxlbWVudC5nZXRfbm9kZU5hbWUoKSApDQoNCgkJaWYgZWxlbWVudC5nZXRfbm9kZU5hbWUoKSBp biBzZWxmLm5ld2xpbmVfYmVmb3JlX2VuZDoNCgkJCXNlbGYuc3RyZWFtLm5ld0xpbmUoKQ0KCQlz ZWxmLnN0cmVhbS53cml0ZShzKQ0KCQlpZiBlbGVtZW50LmdldF9ub2RlTmFtZSgpIGluIHNlbGYu bmV3bGluZV9hZnRlcl9lbmQ6DQoJCQlzZWxmLnN0cmVhbS5uZXdMaW5lKCkNCg0KDQoJZGVmIGRv VGV4dChzZWxmLCB0ZXh0X25vZGUpOg0KCQkjaWYgdGV4dF9ub2RlLmdldFBhcmVudE5vZGUoKS50 YWdOYW1lIGluIHNlbGYuc3RyaXA6DQoJCSMJZGF0YSA9IHN0cmluZy5zdHJpcCh0ZXh0X25vZGUu ZGF0YSkNCgkJI2Vsc2U6DQoJCWRhdGEgPSB0ZXh0X25vZGUuZ2V0X25vZGVWYWx1ZSgpDQoJCXNl bGYuc3RyZWFtLndyaXRlKGVzY2FwZShkYXRhKSkNCg0KCWRlZiBkb0NvbW1lbnQoc2VsZiwgbm9k ZSk6DQoJCXNlbGYuc3RyZWFtLndyaXRlKG5vZGUudG94bWwoKSkNCg0KDQpjbGFzcyBYbWxMaW5l YXJpc2VyKFhtbFdyaXRlcik6DQoNCglkZWYgX19pbml0X18oc2VsZik6DQoJCWltcG9ydCBTdHJp bmdJTw0KCQlzZWxmLmJ1ZmZlciA9IFN0cmluZ0lPLlN0cmluZ0lPKCkNCgkJWG1sV3JpdGVyLl9f aW5pdF9fKHNlbGYsIHNlbGYuYnVmZmVyKQ0KDQoJZGVmIGxpbmVhcmlzZShzZWxmLCBub2RlKToN CgkJc2VsZi53cml0ZShub2RlKQ0KCQlyZXR1cm4gc2VsZi5idWZmZXIuZ2V0dmFsdWUoKQ0KCQ0K DQpjbGFzcyBIdG1sV3JpdGVyKFhtbFdyaXRlcik6DQoJZGVmIF9faW5pdF9fKHNlbGYsIHN0cmVh bT1zeXMuc3Rkb3V0KToNCgkJWG1sV3JpdGVyLl9faW5pdF9fKHNlbGYsIHN0cmVhbSkNCgkJc2Vs Zi5tYXBfYXR0ciA9IHNlbGYubWFwX3RhZyA9IHN0cmluZy51cHBlcg0KCQlzZWxmLnhtbF9zdHls ZV9lbmR0YWdzID0gMA0KDQoJCXNlbGYuZW1wdGllcyA9IFsNCgkJCSdpbWcnLCAnYnInLCAnaHIn LCAnaW5jbHVkZScsICdsaScsICdtZXRhJywgJ2lucHV0JywNCgkJCSdJTUcnLCAnQlInLCAnSFIn LCAnSU5DTFVERScsICdMSScsICdNRVRBJywgJ0lOUFVUJywNCgkJXQ0KCQlzZWxmLnN0cmlwID0g Ww0KCQkJJ2gxJywgJ2gyJywgJ2gzJywgJ2g0JywgJ2g1JywgJ2g2JywgDQoJCQknbGknLCAnYnIn LCAncCcsICdhJywgJ3RpdGxlJywgJ2ZvbnQnLA0KCQkJJ0gxJywgJ0gyJywgJ0gzJywgJ0g0Jywg J0g1JywgJ0g2JywgDQoJCQknTEknLCAnQlInLCAnUCcsICdBJywgJ1RJVExFJywgJ0ZPTlQnLA0K CQldDQoNCgkJbmxfZGljdCA9IHsNCgkJCSdoZWFkJzogKDEsIDEsIDEsIDEpLA0KCQkJJ2JvZHkn OiAoMSwgMSwgMSwgMSksDQoJCQkndGl0bGUnOiAoMSwgMSwgMSwgMSksDQoJCQknbWV0YSc6ICgx LCAxLCAwLCAwKSwNCgkJCSd1bCc6ICgxLCAxLCAxLCAxKSwNCgkJCSdsaSc6ICgxLCAwLCAwLCAw KSwNCgkJCSdoMSc6ICgxLCAwLCAwLCAxKSwNCgkJCSdoMic6ICgxLCAwLCAwLCAxKSwNCgkJCSdo Myc6ICgxLCAwLCAwLCAxKSwNCgkJCSdoNCc6ICgxLCAwLCAwLCAxKSwNCgkJCSdoNSc6ICgxLCAw LCAwLCAxKSwNCgkJCSdoNic6ICgxLCAwLCAwLCAxKSwNCgkJCSdwJzogKDEsIDAsIDAsIDEpLA0K CQkJJ2JyJzogKDEsIDEsIDAsIDApLA0KCQl9DQoJCQ0KCQlzZWxmLl9zZXROZXdMaW5lcyhubF9k aWN0KQ0KDQoNCmNsYXNzIEh0bWxMaW5lYXJpc2VyKEh0bWxXcml0ZXIpOg0KDQoJZGVmIF9faW5p dF9fKHNlbGYpOg0KCQlpbXBvcnQgU3RyaW5nSU8NCgkJc2VsZi5idWZmZXIgPSBTdHJpbmdJTy5T dHJpbmdJTygpDQoJCUh0bWxXcml0ZXIuX19pbml0X18oc2VsZiwgc2VsZi5idWZmZXIpDQoNCglk ZWYgbGluZWFyaXNlKHNlbGYsIG5vZGUpOg0KCQlzZWxmLndyaXRlKG5vZGUpDQoJCXJldHVybiBz ZWxmLmJ1ZmZlci5nZXR2YWx1ZSgpDQoJDQoNCmNsYXNzIEFTUFdyaXRlcihYbWxXcml0ZXIpOg0K CWRlZiBfX2luaXRfXyhzZWxmLCByZXBfZmlsZSk6DQoJCXNlbGYucmVwX2RpY3QgPSB7fQ0KCQlz ZWxmLnBhcnNlUmVwRmlsZShyZXBfZmlsZSkNCgkJDQoNCglkZWYgcGFyc2VSZXBGaWxlKHNlbGYs IHJlcF9maWxlKToNCgkJcyA9ICcnDQoJCWZvciBsIGluIG9wZW4ocmVwX2ZpbGUpLnJlYWRsaW5l cygpOg0KCQkJaWYgbFswXSA9PSAnPCc6DQoJCQkJcGx1c19iZWZvcmUgPSAwDQoJCQkJcGx1c19h ZnRlciA9IDANCgkJCQluID0gc3RyaW5nLmluZGV4KGwsICc+JykNCgkJCQl0YWdfbmFtZSA9IGxb MTpuXQ0KCQkJCXJlcCA9IHN0cmluZy5zdHJpcChsW24rMTpdKQ0KCQkJCWlmIHJlcCBhbmQgcmVw WzBdID09ICcrJzoNCgkJCQkJcGx1c19iZWZvcmUgPSAxDQoJCQkJCXJlcCA9IHN0cmluZy5zdHJp cChyZXBbMTpdKQ0KCQkJCWlmIHJlcCBhbmQgcmVwWy0xXSA9PSAnKyc6DQoJCQkJCXBsdXNfYWZ0 ZXIgPSAxDQoJCQkJCXJlcCA9IHN0cmluZy5zdHJpcChyZXBbOi0xXSkNCgkJCQlpZiByZXA6DQoJ CQkJCXNlbGYucmVwX2RpY3RbdGFnX25hbWVdID0gKHBsdXNfYmVmb3JlLCBwbHVzX2FmdGVyLCBl dmFsKHJlcCkpDQoJCQkJZWxzZToNCgkJCQkJc2VsZi5yZXBfZGljdFt0YWdfbmFtZV0gPSAocGx1 c19iZWZvcmUsIHBsdXNfYWZ0ZXIsICcnKQ0KCQkJCQkNCg0KCWRlZiBsaW5lYXJpc2VfZWxlbWVu dChzZWxmLCBlbGVtZW50KSA6DQoJCWFzc2VydCBlbGVtZW50Lk5vZGVUeXBlID09IEVMRU1FTlQN CgkJcyA9ICcnDQoJCQ0KCQkjIFN0YXJ0IHRhZw0KCQlwbHVzX2JlZm9yZSwgcGx1c19hZnRlciwg cmVwbCA9IHNlbGYucmVwX2RpY3RbZWxlbWVudC5nZXRUYWdOYW1lKCldDQoNCgkJaWYgcyBhbmQg c1stMV0gIT0gJ1xuJyBhbmQgcGx1c19iZWZvcmU6DQoJCQlzID0gcyArICdcbicNCgkJcyA9IHMg KyByZXBsDQoJCWlmIHMgYW5kIHNbLTFdICE9ICdcbicgYW5kIHBsdXNfYWZ0ZXI6DQoJCQlzID0g cyArICdcbicNCg0KCQlzMSA9ICcnDQoJCWZvciBjaGlsZCBpbiBlbGVtZW50LmdldENoaWxkcmVu KCk6DQoJCQlpZiBjaGlsZC5Ob2RlVHlwZSBpcyBFTEVNRU5UOg0KCQkJCXMxID0gczEgKyBzZWxm LmxpbmVhcmlzZV9lbGVtZW50KGNoaWxkKQ0KCQkJZWxpZiBjaGlsZC5Ob2RlVHlwZSBpcyBURVhU Og0KCQkJCSNzMSA9IHMxICsgZXNjYXBlKGNoaWxkLmRhdGEpDQoJCQkJczEgPSBzMSArIGNoaWxk LmRhdGENCgkJCWVsc2UgOg0KCQkJCXMxID0gczEgKyBzdHIoY2hpbGQpDQoJCQ0KCQlzID0gcyAr IHMxDQoNCgkJIyBFbmQgdGFnLg0KCQlwbHVzX2JlZm9yZSwgcGx1c19hZnRlciwgcmVwbCA9IHNl bGYucmVwX2RpY3RbJy8nICsgZWxlbWVudC5nZXRUYWdOYW1lKCldDQoJCWlmIHMgYW5kIHNbLTFd ICE9ICdcbicgYW5kIHBsdXNfYmVmb3JlOg0KCQkJcyA9IHMgKyAnXG4nDQoJCXMgPSBzICsgcmVw bA0KCQlpZiBzIGFuZCBzWy0xXSAhPSAnXG4nIGFuZCBwbHVzX2FmdGVyOg0KCQkJcyA9IHMgKyAn XG4nDQoNCgkJcmV0dXJuIHMNCg0K --0__=fvpHh2vcDJQ5CEBan2fJcjeWZtCMC0eNns393xvSWedcGNOO0Rg9JzLq-- From larsga@ifi.uio.no Thu Jan 7 20:46:57 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 07 Jan 1999 21:46:57 +0100 Subject: [XML-SIG] Documentation and problems In-Reply-To: References: Message-ID: * Simon Pepping | | I have spent quite some time with the XML package, mainly with the | SAX interface and xmlproc. As a result I have written a(nother) | document about the interaction of an application and a SAX parser, | and how to write a SAX application. I also wrote a simple | application to demonstrate it. Great! I think this document is something people have wanted for some time, and I think it complements AMKs documentation nicely. | Pr. SAXParseException.__str__ reads: Thanks! This is now fixed. | Pr. pyexpat does not report the document name with the getSystemId | method: Not so strange, since pyexpat does not (as far as I can tell) make this information available. However, I've now changed the driver to remember the sysId passed to it as an argument to parse() and report that. If no sysId is available (parseFile or reset/feed/close were used) "Unknown" is returned. | Pr. XMLValidator does not use my error handlers: Hmmm. The code you cite does not match my current development version nor the version in the public CVS tree. In fact, I suspect this to be from a quite old release. | Pr. XMLValidator does not accept spaces around #PCDATA as content in | an element type declaration: I cannot replicate this problem with my version and the error message seems to be from version 0.51 or earlier. Can you please check if you're using version 0.52? (Check the source of xmlproc.py.) If not, can you please install 0.52 and try again? | Pr. drv_xmlproc does not implement a getPublicId method: This is because the parser does not keep track of this information at the moment. I've added it to the todo list and hope to get this into version 0.53. | Pr. XMLValidator does not accept the following construction in an | external DTD: This is correct. Parameter entity references inside markup declarations are not yet supported by xmlproc. | | | | ERROR: Didn't match [A-Za-z_:][\-A-Za-z_:.0-9]* at waarnemingen.dtd:22:38 | TEXT: '%tekst;)> | ' | (the declaration of p is line 22) | | I am not sure whether this is allowed. nsgmls gives the warning: | '#PCDATA in nested model group'. It's not. What you've written is equivalent to which does not match the grammar in the XML spec. (See productions 45-46 and 51.) Remove the parentheses around the PE reference and it should be fine. | I hope this is useful. It most certainly was! This makes it abundantly clear that new releases of both saxlib and saxdrv are needed, and I'm currently working on both. This should probably take 2-3 weeks until the first release. (Anyone who needs the new versions before then can email me.) --Lars M. From larsga@ifi.uio.no Thu Jan 7 21:11:16 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 07 Jan 1999 22:11:16 +0100 Subject: [XML-SIG] Documentation and problems In-Reply-To: References: Message-ID: * Simon Pepping | | Check it out at http://www.hobby.nl/~scaprea/XML/index.html. I've now read through your document more thoroughly and have some corrections to it: - the application should _not_ register the driver as a locator. The drivers that provide location information do this themselves before calling the startDocument method. Those that don't simply do not register a locator. In fact, you have no guarantee that the parser and the locator are the same object... - In the last paragraph repeats this. - In you write: "A SAX application must contain the handler classes DocumentHandler, DTDHandler, EntityResolver, and ErrorHandler, which should implement the methods prescribed by the SAX specification." SAX applications don't have to implement any of these at all, and in fact there exists an application that doesn't (saxtimer.py). So the text should say 'can' instead of 'must'. (A nit, I know, but one that would confuse literal-minded people like myself. :) - In it would be nice if you mentioned a little-known fact: - saxutils.py defines two useful error handlers: ErrorPrinter and ErrorRaiser, both of which can be used directly if their behaviour is what your application needs. - In the same page you write on the last line: "At the end of the parse, your application may stop. Or it may continue, especially if it has stored data in memory." Perhaps it's better to say: "At the end of the parse, the SAX driver returns from the parse or parseFile method and your application is free to do whatever it wants." - In you write: "The saxexts module defines a ParserFactory class. Upon import it makes an instance of it, called XMLParserFactory, which lists all known SAX-compliant XML parsers (actually it lists their driver modules). [It also makes instances with known validating XML parsers, HTML parsers and SGML parsers.]" A consequence of this is that you are wrong when you write on the previous page that "SAXparser=xml.sax.saxexts.make_parser()" is always the best method. It's not if you have special parser requirements. Despite this I think this is a very useful document and that it definitely fills a need. I've linked to it from the saxlib home page. (The link may not become visible before tomorrow.) --Lars M. From mss@transas.com Fri Jan 8 16:42:33 1999 From: mss@transas.com (Michael Sobolev) Date: Fri, 8 Jan 1999 19:42:33 +0300 Subject: [XML-SIG] Unicode stuff in XML package Message-ID: <19990108194233.A4170@anguish.transas.com> I have a small suggestion. The original package (intl??) contained a nice utility called process_charmap, which helps to deal with charmap files. Unfortunately, I could not find it in my python-xml package (under Debian, version 0.5). I believe it would be a nice addition for xml.unicode subpackage. In case this sounds interesting, I could provide a modified version of the program (this can be used as a function). Cheers, -- Mike From michael@graphion.com Mon Jan 11 21:53:12 1999 From: michael@graphion.com (Michael Sanborn) Date: Mon, 11 Jan 1999 13:53:12 -0800 Subject: [XML-SIG] Getting a slice from a NodeList? Message-ID: <369A72C8.3A5D659B@graphion.com> This is probably really basic, but I'm not understanding an error message. What I'm trying to do is to alter an attribute from the first two members of a NodeList (called "wott_list") returned by getElementsByTagName. But I don't seem to be able to get a slice of it. The innermost part of the error message is: File "fed.py", line 155, in startElement for i in wott_list[0:2]: File "C:\Program Files\Python\Lib\UserList.py", line 22, in __getslice__ userlist = self.__class__() TypeError: not enough arguments; expected 4, got 1 What expected 4 arguments? __class__()? I'm using Python 1.5.2b1 and 0.5 of the XML package on Win95. Michael Sanborn Graphion Typesetting From akuchlin@cnri.reston.va.us Mon Jan 11 22:04:11 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 11 Jan 1999 17:04:11 -0500 (EST) Subject: [XML-SIG] Getting a slice from a NodeList? In-Reply-To: <369A72C8.3A5D659B@graphion.com> References: <369A72C8.3A5D659B@graphion.com> Message-ID: <13978.30012.569587.130370@amarok.cnri.reston.va.us> Michael Sanborn writes: >What expected 4 arguments? __class__()? >I'm using Python 1.5.2b1 and 0.5 of the XML package on Win95. Yes; this looks like a bug (might be fixed in the CVS tree). I'll look into it tonight and post a patch. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Nothing is built on stone; all is built on sand, but we must build as if the sand were stone. -- Jorge Luis Borges From larsga@ifi.uio.no Mon Jan 11 22:42:04 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 11 Jan 1999 23:42:04 +0100 Subject: [XML-SIG] Documentation and problems In-Reply-To: <009e01be3367$794b5220$529b90d1@synchrologic.com> References: <009e01be3367$794b5220$529b90d1@synchrologic.com> Message-ID: * Frank McGeough | | I believe the | | 2. Call the parser factory with the name of a known driver module, e.g., | SAXparser=xml.sax.saxexts.make_parser("xml.sax.drivers.drv_xmlproc") | | is incorrect. The saxexts.py has the following code in it: | parser_name = 'xml.sax.drivers.drv_' + parser_name | | therefore you should create the parser with : | | SAXparser=xml.sax.saxexts.make_parser("xmlproc") It now turns out that the version of saxexts.py in the CVS tree had this change made by mistake. In other words, the behaviour that is described here is not correct and so the document should remain as it is (or was). I believe the error is fixed in the CVS tree now, but can't check because of a local problem. --Lars M. From spepping@scaprea.hobby.nl Wed Jan 13 19:05:32 1999 From: spepping@scaprea.hobby.nl (Simon Pepping) Date: Wed, 13 Jan 1999 20:05:32 +0100 (MET) Subject: [XML-SIG] Documentation and problems In-Reply-To: Message-ID: On 7 Jan 1999, Lars Marius Garshol wrote: > | Pr. XMLValidator does not use my error handlers: > > Hmmm. The code you cite does not match my current development version > nor the version in the public CVS tree. In fact, I suspect this to be > from a quite old release. > > | Pr. XMLValidator does not accept spaces around #PCDATA as content in > | an element type declaration: > > I cannot replicate this problem with my version and the error message > seems to be from version 0.51 or earlier. > > Can you please check if you're using version 0.52? (Check the source > of xmlproc.py.) If not, can you please install 0.52 and try again? Now using version 0.52. The error handlers are still not mine, e.g.: ERROR: Not a valid name at waarnemingen.dtd:22:38 TEXT: '%tekst;)> ' The other problem has gone. Simon Pepping email: spepping@scaprea.hobby.nl From spepping@scaprea.hobby.nl Wed Jan 13 19:05:43 1999 From: spepping@scaprea.hobby.nl (Simon Pepping) Date: Wed, 13 Jan 1999 20:05:43 +0100 (MET) Subject: [XML-SIG] Documentation and problems In-Reply-To: Message-ID: On 7 Jan 1999, Lars Marius Garshol wrote: > I've now read through your document more thoroughly and have some > corrections to it: > > - the application should _not_ register the driver as a locator. > The drivers that provide location information do this themselves > before calling the startDocument method. Those that don't simply > do not register a locator. > > In fact, you have no guarantee that the parser and the locator are > the same object... I had missed that, and I will modify my document as per your suggestion. Note, however, that drv_pyexpat.py does not register a locator, while, if one registers the parser as the locator, it does implement the locator methods (except for the fact that it does not report the document, as noted before). Do I understand correctly that the availability of a locator is not guaranteed, so that the application should test for this? Or should every SAX parser provide at least dummy locator methods so that calls to them do not generate errors, e.g. by inheriting from saxlib.Locator? Then dvr_pyexpat.py should register the parser as the locator. Currently it generates an attribute error if one tries to use the locator methods. > - In the last > paragraph repeats this. > > - In you write: > > "A SAX application must contain the handler classes > DocumentHandler, DTDHandler, EntityResolver, and ErrorHandler, > which should implement the methods prescribed by the SAX > specification." > > SAX applications don't have to implement any of these at all, and > in fact there exists an application that doesn't (saxtimer.py). > > So the text should say 'can' instead of 'must'. (A nit, I know, but > one that would confuse literal-minded people like myself. :) I mean to say that the application in principle should have such methods, because the parser expects them and makes calls to them.. The following paragraphs explain that these methods can be provided by inheriting the dummy methods from the provided sax library. I still feel that I state this correctly. I will follow your other suggestions. Thanks for your critical comments. Simon Pepping email: spepping@scaprea.hobby.nl From fredrik@pythonware.com Thu Jan 14 23:26:19 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 15 Jan 1999 00:26:19 +0100 Subject: [XML-SIG] XML-RPC client library Message-ID: <016901be4015$4bee1e60$f29b12c2@pythonware.com> Don't recall if someone else has done something similar, but I just whipped together a small client library for Frontier's XML-RPC protocol: http://www.pythonware.com/madscientist/ This one is completely self-contained (works on top of any standard 1.5 installation). I'm sure Dave Winer would be really happy if this made it into the XML SIG distribution some day ;-) From akuchlin@cnri.reston.va.us Fri Jan 15 03:51:37 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Thu, 14 Jan 1999 22:51:37 -0500 Subject: [XML-SIG] Getting a slice from a NodeList? In-Reply-To: <369A72C8.3A5D659B@graphion.com> References: <369A72C8.3A5D659B@graphion.com> Message-ID: <199901150351.WAA00812@207-172-45-23.s23.tnt5.brd.erols.com> Michael Sanborn writes: > What I'm trying to do is to alter an attribute from the first two > members of a NodeList (called "wott_list") returned by > getElementsByTagName. But I don't seem to be able to get a slice of it. The fix is to replace the __getslice__ method of the NodeList class with this, correct, function: def __getslice__(self, i, j): userlist = NodeList([], self._document, self._parent) userlist.data[:] = self.data[i:j] return userlist -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Nature is beneficent. I praise her and all her works. She is silent and wise. She is cunning, but for good ends. She has brought me here and will also lead me away. She may scold me, but she will not hate her work. I trust her. -- Goethe From gwachob@findlaw.com Fri Jan 15 08:57:07 1999 From: gwachob@findlaw.com (Gabe Wachob) Date: Fri, 15 Jan 1999 00:57:07 -0800 Subject: [XML-SIG] XML Product for Zope Message-ID: <369F02E3.30330C36@findlaw.com> Python XMLers- I put together a simple Product for Zope which encapsulates an XML file and an XSL file and renders the XML into HTML using the XSL file. Since there is no XSL processor in Python that I could find, I use an external one written in (gasp) Java. Perhaps writing an XSL engine in python should be my next task. Anyway, you can get it at http://www.aimnet.com/~gwachob/software.html It should install like a normal Zope product (fingers crossed - my first real Product). -Gabe P.S. "Product" is Zope's terminology for the encapsulation of an application (on a small scale). I'm not going marketing-droid-beserk here.... ;-) From dieter@handshake.de Fri Jan 15 21:57:19 1999 From: dieter@handshake.de (Dieter Maurer) Date: Fri, 15 Jan 1999 22:57:19 +0100 Subject: [XML-SIG] ANN: XSL-Pattern (and minor DOM patch) Message-ID: <199901152157.WAA00981@lindm.dm> This is a multi-part MIME message. --------------FC5583E803777E8ABB8C4995 Content-Type: text/plain On top of our PyDom, I have implemented XSL-Pattern, the pattern sublanguage of the XSL working draft (16-December-1998). Patterns are used extensively in the XSL transformation language and its control structures. They can be used outside XSL, too, for e.g. querying/selecting/matching parts of HTML/SGML/XML documents. To build the pattern parser, I have used Scott Hassan's PyBison package. More information and download at URL:http://www.handshake.de/~dieter/pyprojects/xslpattern.html A small patch to "xml.dom.core" was needed (attached) to fix a missing "len(...)" in the DOMs attribute handling. Dieter --------------FC5583E803777E8ABB8C4995 Content-Type: application/x-patch; name="attr.pat" Content-Description: Patch to "xml.dom.core" fixing missing "len(...)" --- :core.py-1 Tue Dec 29 14:59:35 1998 +++ core.py Tue Jan 12 21:38:25 1999 @@ -203,7 +203,7 @@ def values(self): L = self.data.values() - for i in range(L): + for i in range(len(L)): n = L[i] L[i] = NODE_CLASS[ n.type ](n, None, self._document ) return L --------------FC5583E803777E8ABB8C4995-- From larsga@ifi.uio.no Mon Jan 18 14:17:37 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 18 Jan 1999 15:17:37 +0100 Subject: [XML-SIG] Documentation and problems In-Reply-To: References: Message-ID: * Simon Pepping | | Note, however, that drv_pyexpat.py does not register a locator, | while, if one registers the parser as the locator, it does implement | the locator methods (except for the fact that it does not report the | document, as noted before). Ah, that's a bug. Thanks for reporting this! I'm afraid the pyexpat driver saw very little testing, since I did not have pyexpat available before I released it. (I do now.) I've fixed this now, so the next release of the driver package will have this. | Do I understand correctly that the availability of a locator is not | guaranteed, so that the application should test for this? Yes. Not all parsers provide location information. | Or should every SAX parser provide at least dummy locator methods so | that calls to them do not generate errors, e.g. by inheriting from | saxlib.Locator? I think it's better for the driver to be frank about this and not register a locator if it doesn't actually have any location information. | [Re ] | | I mean to say that the application in principle should have such | methods, because the parser expects them and makes calls to them. | The following paragraphs explain that these methods can be provided | by inheriting the dummy methods from the provided sax library. I | still feel that I state this correctly. I like the way you've written it now better. The only thing I really would like to see changed is that you don't make it clear that registering handlers is not required. It's implied now, but not stated directly. (Yes, I am a nit-chaser. Feel free to ignore this.) --Lars M. From coma@korea.com Thu Jan 21 01:43:37 1999 From: coma@korea.com (coma@korea.com) Date: Wed, 20 Jan 1999 18:43:37 PDT Subject: [XML-SIG] Vol3.- 01/19/1999 - Korea.com News Message-ID: <199901201036.FAA26187@python.org> This is Korea.Com Newspaper in English. Click here to Read Remove my Address from DB. ---------------------------------------------------- ÇÑ±Û Korea.Com News ServiceÀÔ´Ï´Ù. ´º½º Àб⠴ÙÀ½ºÎÅÍ ¹ÞÁö ¾Ê°ÚÀ½
Free News Service-Korea.Com News
From tismer@appliedbiometrics.com Wed Jan 20 16:47:47 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Wed, 20 Jan 1999 17:47:47 +0100 Subject: [XML-SIG] XmlWriter update References: <852566F2.006B115C.00@li01.lm.ssc.siemens.com> Message-ID: <36A608B3.F5D6F61C@appliedbiometrics.com> Jeff.Johnson@icn.siemens.com wrote: > > I moved some code from xml.dom.HtmlWriter up to the super class > xml.dom.XmlWriter so that it is easier to specify where new lines should be > inserted when writing XML. I hope you like it and it gets into the XML > package or I'll have to rewrite my code :). This should be fully backwards > compatible too. > > The following is an example of how the change allows us to specify that > 'tree' elements should get new lines before and after the start tag and end > tag. The 'node' element only gets a new line before the start tag. > > nl_dict = { > 'tree':(1,1,1,1), > 'node':(1,0,0,0), > } > w = XmlWriter(sys.stdout,nl_dict) > w.write(doc) Hi XMLers! I found this one quite useful. Will it make it into the lib? Furthermore, is anybody interested in a prettyprint mode, (with some indentation), or has that been done already? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From akuchlin@cnri.reston.va.us Wed Jan 20 17:03:13 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Wed, 20 Jan 1999 12:03:13 -0500 (EST) Subject: [XML-SIG] XmlWriter update In-Reply-To: <36A608B3.F5D6F61C@appliedbiometrics.com> References: <852566F2.006B115C.00@li01.lm.ssc.siemens.com> <36A608B3.F5D6F61C@appliedbiometrics.com> Message-ID: <13990.3023.577218.223562@amarok.cnri.reston.va.us> Christian Tismer writes about Jeff Johnson's XmlWriter patch: >Hi XMLers! >I found this one quite useful. >Will it make it into the lib? I haven't gotten around to looking at the patch, but definitely would like to include it; Jeff's code submissions have been fine in the past, so it'll probably wind up applied to the CVS tree. (I've been thinking of issuing a 0.5.1 updated release, but haven't had time for XML hacking lately.) >Furthermore, is anybody interested in a prettyprint mode, >(with some indentation), or has that been done already? That would be very useful. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ The life of Man is a struggle with Nature and a struggle with the Machine; when Nature and the Machine link forces against him, Man hasn't a chance. -- Robertson Davies, _The Diary of Samuel Marchbanks_ From gwachob@aimnet.com Wed Jan 20 19:49:41 1999 From: gwachob@aimnet.com (Gabe Wachob) Date: Wed, 20 Jan 1999 11:49:41 -0800 (PST) Subject: [XML-SIG] XmlWriter update In-Reply-To: <13990.3023.577218.223562@amarok.cnri.reston.va.us> Message-ID: On Wed, 20 Jan 1999, Andrew M. Kuchling wrote: > Christian Tismer writes about Jeff Johnson's XmlWriter patch: > >Hi XMLers! > >I found this one quite useful. > >Will it make it into the lib? > > I haven't gotten around to looking at the patch, but > definitely would like to include it; Jeff's code submissions have been > fine in the past, so it'll probably wind up applied to the CVS tree. > (I've been thinking of issuing a 0.5.1 updated release, but haven't > had time for XML hacking lately.) > > >Furthermore, is anybody interested in a prettyprint mode, > >(with some indentation), or has that been done already? > > That would be very useful. Quick-n-ugly (and I do mean ugly) prettyprint into HTML -- if someone wants to make it better: from xml.sax import saxlib from xml.sax import saxexts import sys class XMLPrettyPrint(saxlib.HandlerBase): """ Pretty print an XML source tree in HTML with colors, etc """ def __init__(self): totalstring="" def startElement(self, name, attrs): string= "
    <"+name if attrs.getLength() > 0: for key in attrs.keys(): string=string+ " "+key+"=\""+attrs[key]+"\"" self.totalstring=self.totalstring+(string+">") def characters(self, ch, start, length): self.totalstring=self.totalstring+"
      "+(ch[start:start+length])+"
    " def endElement(self, name): self.totalstring=self.totalstring+ "</"+name+">
" def startDocument(self): self.totalstring=self.totalstring+ "" def endDocument(self): self.totalstring=self.totalstring+ "" def processingInstruction(self, target, data): self.totalstring=self.totalstring+ "<?"+target+" "+data+"?>" if __name__=="__main__": myparser=saxexts.make_parser() myxpp=XMLPrettyPrint() myparser.setDocumentHandler(myxpp) myparser.parseFile(sys.stdin) print myxpp.totalstring ------------------------------------------------------------------------ Gabe Wachob - http://www.findlaw.com - http://www.aimnet.com/~gwachob As of today, the U.S. Constitution has been in force for 76,913 days When this message was sent, there were 29,851,818 seconds before Y2K From akuchlin@cnri.reston.va.us Thu Jan 21 03:45:46 1999 From: akuchlin@cnri.reston.va.us (A.M. Kuchling) Date: Wed, 20 Jan 1999 22:45:46 -0500 Subject: [XML-SIG] Pretty-printing DOM trees Message-ID: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> The format() function below pretty-prints a DOM tree. It strips away all the whitespace, and then inserts Text nodes containing white space, producing output like this: xmlproc: A Python XML parser

xmlproc: A Python XML parser

Should this be left as just a black-box function, or should it be implemented as a subclass of the writer.XmlWriter() class? I suppose it depends on the envisioned application for this; if it's just to make output a little bit more readable for debugging purposes, then customizability isn't very important. On the other hand, if people will want to do careful indenting of the output, indenting some tags and not others, then the XmlWriter solution is the way to go. My inclination is to the former view, but then, that's also easier for me. :) Thoughts? -- A.M. Kuchling http://starship.skyport.net/crew/amk/ We have first raised a dust and then complain we cannot see. -- Bishop Berkeley from xml.dom import utils, core d = utils.FileReader() dom = d.readFile( '/scratch/xsademo.xml' ) def format(node, indent=4): """Pretty-print a DOM tree""" utils.strip_whitespace( node ) if node.nodeType == core.DOCUMENT_NODE: node = node.documentElement stack = [ (0,node) ] document = node.get_ownerDocument() # Add a newline before the opening and closing tags of the root element parent = node.get_parentNode() parent.insertBefore( document.createTextNode('\n'), node ) node.appendChild( document.createTextNode('\n') ) while (stack): # get the top node from the stack depth, node = stack[-1] # walk this node's list of children, deleting those that are # all whitespace and saving the rest to be pushed onto the stack children = [] for child in node.childNodes[:] : if child.nodeType == core.ELEMENT_NODE: spacing = '\n' + (' '*(depth+1)*indent) # Add spacing before the child element; this space goes before # the start tag. text = document.createTextNode( spacing ) node.insertBefore( text, child ) # Check if the child element has any element children; if so, # we'll add whitespace before the closing tag. has_element_children = 0 for n in child.get_childNodes(): if n.nodeType == core.ELEMENT_NODE: has_element_children=1 if has_element_children: # Add spacing as the last child of the child element; this # will go before the closing tag. text = document.createTextNode( spacing ) child.appendChild( text ) if child.hasChildNodes(): children.append ( (depth+1,child) ) children.reverse() stack[-1:] = children # end: while stack not empty format(dom) print dom.toxml() From gwachob@aimnet.com Thu Jan 21 05:57:06 1999 From: gwachob@aimnet.com (Gabe Wachob) Date: Wed, 20 Jan 1999 21:57:06 -0800 (PST) Subject: [XML-SIG] Pretty-printing DOM trees In-Reply-To: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> Message-ID: On Wed, 20 Jan 1999, A.M. Kuchling wrote: > Should this be left as just a black-box function, or should it be > implemented as a subclass of the writer.XmlWriter() class? I suppose > it depends on the envisioned application for this; if it's just to > make output a little bit more readable for debugging purposes, then > customizability isn't very important. On the other hand, if people > will want to do careful indenting of the output, indenting some tags > and not others, then the XmlWriter solution is the way to go. > My inclination is to the former view, but then, that's also easier for > me. :) Thoughts? My feeling is that most purposes of writing out a DOM tree (or tree representation of an XML tree) will either be 1) for debugging purposes or 2) highly stylized, for a pariticular purpose (like an editor or something). In other words, the prettyprint either has to be *really* flexible or not very useful outside of debugging. How many applications print out XML directly? Even an XML source browser would want to add features like highlighting/tagging, hiding/exposing branches, filtering, etc. Unless you plan on including a lot of these features (or at least hooks for them), I don't see any reason to do anything more than a black-box solution. (I would like to see an HTML rendering like my black-box SAX-driven script i posted earlier today). -Gabe ------------------------------------------------------------------------ Gabe Wachob - http://www.findlaw.com - http://www.aimnet.com/~gwachob As of today, the U.S. Constitution has been in force for 76,914 days When this message was sent, there were 29,815,374 seconds before Y2K From larsga@ifi.uio.no Thu Jan 21 10:36:58 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 21 Jan 1999 11:36:58 +0100 Subject: [XML-SIG] SAX status Message-ID: As you probably know, I have been fixing bugs, writing new drivers (including a JPython one) and generally working on improving the SAX libraries, preparing for a new release, which I hoped should not be too far into the future. However, David Megginson (the coordinator of the original SAX design) has just started the discussion of the next SAX version. This means a couple of things: - Unless someone complains I will wait with issuing new versions of the packages until new versions of the Java ones come out. - Anyone who has strong opinions about how the SAX design should be should participate in the xml-dev discussions. (Email with subject "subscribe xml-dev" or "subscribe xml-dev-digest" to majordomo@ic.ac.uk.) - I will probably translate the Java design by hand, and possibly also extend it in some cases (as I did last time, clearly separating the extensions from the core). Again, Java and JPython compatibility will be considered very important (unless someone screams really really loudly.) I will also keep the XML-SIG informed and attempt to start discussions here about the design and translation. - If you don't have time to participate fully, but still want to voice your opinion, do so here, and I will bear it in mind in the xml-dev discussions. For those who don't have the time to follow xml-dev, David basically proposed three new extensions: - parser filter facilities - lexical events - namespace handling My original proposal for parser filters is still at The code is really simple and there are a couple of demos, including a namespace one. --Lars M. From tismer@appliedbiometrics.com Fri Jan 22 18:00:55 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Fri, 22 Jan 1999 19:00:55 +0100 Subject: [XML-SIG] Pretty-printing DOM trees References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> Message-ID: <36A8BCD7.1F992E61@appliedbiometrics.com> A.M. Kuchling wrote: > > The format() function below pretty-prints a DOM tree. It strips away > all the whitespace, and then inserts Text nodes containing white > space, producing output like this: > > > > > > xmlproc: A Python XML parser > > > >

> xmlproc: A Python XML parser >

> > > > Should this be left as just a black-box function, or should it be > implemented as a subclass of the writer.XmlWriter() class? I suppose > it depends on the envisioned application for this; if it's just to > make output a little bit more readable for debugging purposes, then > customizability isn't very important. On the other hand, if people > will want to do careful indenting of the output, indenting some tags > and not others, then the XmlWriter solution is the way to go. > My inclination is to the former view, but then, that's also easier for > me. :) Thoughts? Well, thank you - this was exactly what I wanted. Just readable output. I took it as is, named it "format.py", perfect. I don't think that customization is such an issue. Maybe it could be a drawback that applying format to a dom was about three or four times slower than creating the dom at all, but nevermind. Would this function belong to xml.dom.utils, besides print_tree? But it is actually a function wich happens to use DOM for its work, so it seems to be a more general function for all xml modules, so xml.utils may be better. Then I could also think of recoding it as an sgmlop app. Thanks again - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From akuchlin@cnri.reston.va.us Thu Jan 21 23:16:12 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Thu, 21 Jan 1999 18:16:12 -0500 (EST) Subject: [XML-SIG] Pretty-printing DOM trees In-Reply-To: <36A8BCD7.1F992E61@appliedbiometrics.com> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BCD7.1F992E61@appliedbiometrics.com> Message-ID: <13991.45828.86417.505172@amarok.cnri.reston.va.us> Christian Tismer writes: >Maybe it could be a drawback that applying format to a dom was >about three or four times slower than creating the dom at all, >but nevermind. Hmm... wonder why it's so slow. One reason might be that, for every element, it checks whether any of its children are also elements, in order to format the two cases differently. (As in: Text It's not formatted as Text >Would this function belong to xml.dom.utils, besides print_tree? >But it is actually a function wich happens to use DOM for its >work, so it seems to be a more general function for all xml >modules, so xml.utils may be better. But it requires that you already have a DOM tree created, so it seems best left in xml.dom.utils. Indenting a document using SAX or sgmlop might be best implemented as a specialized handler, not by the expensive process of creating a DOM tree. I'll try to recast it into a subclass of XmlWriter, and have utils.indent_tree() as shorthand to create and use an instance of that subclass. That gives both flexibility and quick-and-dirty convenience. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ It is true greatness to have in one the frailty of a man and the security of a god. -- Seneca From tismer@appliedbiometrics.com Fri Jan 22 12:41:10 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Fri, 22 Jan 1999 13:41:10 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BCD7.1F992E61@appliedbiometrics.com> <13991.45828.86417.505172@amarok.cnri.reston.va.us> Message-ID: <36A871E6.89033191@appliedbiometrics.com> This is a multi-part message in MIME format. --------------322B2ABE145DE09B7F9C2D0A Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I've tested the formatter with one of my XML workfiles. Astonishingly, it breaks. I could not find the error, but it happens when I use utils.FileReader to build the dom. >>> d=utils.FileReader() >>> dom = d.readFile( r'H:\pns\Projekte\SRZ\RoteLi\birgit\SGML\praep.xml') Traceback (innermost last): File "", line 1, in ? File "D:\Python\xml\dom\utils.py", line 140, in readFile dom = self.readStream(file,type) File "D:\Python\xml\dom\utils.py", line 148, in readStream dom = self.readXml(stream) File "D:\Python\xml\dom\utils.py", line 164, in readXml p.feed(stream.read()) File "i:\cvsroot\xml\sax\drivers\drv_xmlproc.py", line 132, in feed self.parser.feed(data) File "i:\cvsroot\xml\parsers\xmlproc\xmlutils.py", line 189, in feed self.do_parse() File "i:\cvsroot\xml\parsers\xmlproc\xmlproc.py", line 278, in do_parse self.parse_end_tag() File "i:\cvsroot\xml\parsers\xmlproc\xmlproc.py", line 532, in parse_end_tag self.app.handle_end_tag(name) File "i:\cvsroot\xml\sax\drivers\drv_xmlproc.py", line 64, in handle_end_tag self.doc_handler.endElement(name) File "xml\dom\builder.py", line 53, in endElement assert name == self.current_element.get_nodeName() AssertionError: >>> The XML file is well-formed, so there must be a bug in the dom builder. When I let builder.py ignore the assertion error and avoid popping the tree, it works! I hope this helps the author to find the bug, I don't understand everything well enough to find this. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home --------------322B2ABE145DE09B7F9C2D0A Content-Type: text/plain; charset=iso-8859-1; name="praep.xml" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="praep.xml" Flutide® = / Flutide® NGlaxo Wellcome/Cascan
Flutide® Junior 25 Dosier-Aerosol, Suspension und Treibmittel
1 Dosier-Aero= sol (N1) mind. 120 Sprühstöße, Flutide = Junior 251 Sprühstoß (≈ 85 = ;mg)Fluticason-17-propionat,02= 5 mgTrichlorfluorm= ethanDichlordifluor= methanLecithin
Flu= tide® 250 Rotadisk®, Pulver zum Inhalieren
60 Plv. (N1) Flutide 250 Rotadisk60 Plv. (N1) Flutide 250 Rotadisk + D= iskhaler1 EinzeldosisFluticason-17-propionat0,25 mgin 25 m= g PulverLactose 1H2O=
Flutide® mite 100 Diskus®, Pulver zum Inhalieren60 Plv. (N1) mite 100 Diskus1 EinzeldosisFluticason-17-pr= opionat0,1 mgin 12,5 mg PulverLactose 1H2O
Flutide® fort= e 500 Diskus®, Pulver zum Inhalieren
60 Plv. (N1) forte 500 Diskus1 = EinzeldosisFluticason-17-propionat0,5 mgin 12,5 mg PulverLactose 1H2O
Bronchialasthma aller Schweregrade.Akutbehandl. eines Asthmaanfalles. Behandl. bei aktiver od. inaktiver Lungentuberkulose gleichz. mit einem gegen die Tuberkulose wirksamen AM.Flutide: Kdr. unter 4 J. (zur Zeit keine ausreichenden Erfahrunge= n).`O Flutide N: Kdr. u. Jugendl. unter 16 J.G 14Sehr selten paradoxer Bronchospasmus= mit rasch einsetzender Atemnot. Die Nebennierenrindenfunkt. bleibt im allg. während der Inhalationsbehandl. mit Fluticason-17-propionat im Normalbereich. Bei einzelnen Pat., vor allem wenn sie über längere Zeit mit hohen= Tagesdosen behandelt werden, kann es zu einer Einschränkung der Nebennierenrin= denfunktion kommen. Auch nach Umstellung von and. inhalativen od. oralen Kortikoiden= kann die Nebennierenrindenfunkt. noch für längere Zeit eingesc= hränkt sein. Selten Überempfindlichkeitsreakt. mit Hautbeteiligung. Erhöhte= Blutzuckerspiegel u. in Einzelf. eine Zuckerausscheidung in den Urin.Flutide Junior 25 Dosier-Aerosol: Kdr. über 4=  J., Jugendl. u. Erw.: 2mal tgl. 2 Sprühstöße.`O Flutide N 125 Dosier-Aerosol: = Jugendl. über 16 J. u. Erw.: 2mal tgl. 2 Sprühstöße.`O Flutide N forte 250= Dosier-Aerosol: Jugendl. über 16 J. u. Erw.: 2mal tgl. 2-4 Sprühstöße.`= O Flutide Junior 50 Rotadisk/- Junior 50 Diskus/mite 100 Diskus: Kdr. über 4 J., Jugendl. u. Erw.: 2mal tgl. 1 Pulverinhal= ation.`O Flutide 250 Rotadisk/-250 Diskus: Jugendl. > 16 J. u. Erw.: 2mal tg= l. = 1 Pulverinhalation. forte 500 Diskus: Jugendl. über 16 J.= u. Erw.: 2mal tgl. = 1-2 Pulverinhalationen.`O Die Dosis sollte für jeden Pat. so angepaßt werden, daß eine Kontr. der Beschw. erreicht werden kann. Danach sollte di= e individuelle Erhaltungsdosis durch schrittweise Verringerung der Gesamttagesdosis ermittelt werden. Näheres s. Packungsbeilage.Lagerungshinweis! Verfalldatum!
Gastrosil® / -akut / -10, -20 / -re= tard / -retard mite/ -50Heumann
Gastrosil® Lösung
Gastrosil® Tabletten20 Tbl. (N1)50 Tbl.= (N2)100 Tbl. (N3)1 Tbl.Metoclopramid-H= Cl 1H2O10,54 mg= Metoclopramid-HCl10 mg<= Hilfsstoff code=3D"029345">Mikrokristalline CellulosePoly(O-carboxymethyl)stär= ke-NatriumsalzMagne= siumstearathochdisp= erses Siliciumdioxid<= Darreichung zulassungsnummer=3D"13045.00.01" code=3D"077680" datum=3D"010= 192" status=3DF>
Gastrosil® akut Lösung
15.00 ml (N1) Lsg. akut1 mlMetoclopramid-HCl 1H2O5,97 mgMetoclopramid-HCl5,67 mgSorbitol-Lsg. 70`p(nicht kristallisiert)gereinigtes Wasser= PropylenglycolNatriumchlorid
Gastrosil® retard= Retardkapseln
Gastrosi= l® retard mite Retardkapseln
Gastrosil® Injektionslösung
Gastrosil® 50 Injektionslösung
Z 3 (Lsg.)Motilitätsstö= ;rungen des oberen Magen-Darmtraktes, z. B. Refluxösophagitis, Gastritis, Sodbrennen; Ulcus ventriculi et duodeni; Übelkeit, Brechreiz u. Erbrechen bei M= igräne, Leber- und Nierenerkrankungen, Schädel- und Hirnverletzungen, Arzneimittelunverträglichkeit, Reisekrankheiten; bei anhaltendem Sc= hluckauf = ist ein Therapieversuch angezeigt.`O Gastrosil akut: Motilitätsstörungen des oberen Magen-Darmtrakt= es, funktionell bedingte Pylorusstenose, Übelkeit, Brechreiz und Erbrec= hen, zur unterstützenden, symptomatischen Behandlung bei Magen- u. Zwölffingerdarmgeschwüren. Diabetische Gastroparese.`O Gastrosil 50: Hochdosierte Metoclopramidtherapie bei Übelkeit und E= rbrechen = durch das Zytostatikum Cisplatin.Lsg.: Sorbitolintoleranz. Lsg. akut:= Sgl. u. Kleinkdr. bis zu = 2 J.D 70Lsg. akut: Ältere Kinder. sD 7050 Inj= ektionslsg.: Bradykardie, = Blutdruckanstieg, -abfall.D 70<= /Signatur>Bei gleichzeitiger Einnahme von Sympathikomimetika kann der Blutdruck erh&ou= ml;ht werden. Die Aufnahme von Digoxin aus dem Darm kann vermindert, die Aufnahme von Paracetamol und versch. Antibiotika sowie von Alkohol kann beschleunigt werden. Lsg. akut zusätzl.: Bei gleichzeitiger Gabe von Phenothiazinen und Sympathomimetika kön= nen b. empfindl. Pat. extrapyramidale Reaktionen auftreten.#W(V)Lsg.: Erw. u. Jugendl. ab 14 Jahren 3mal = tgl. 15-30 Tr., Kdr. 7-14 Jahre 10-20 Tr., Kdr. 3-6 Jahre 8-12 Tr. vor den Mahlz.`O Tbl.: Erw. u. Jugendl. ab 14 Jahren 3mal tgl. 1 Tbl. vor den Mahlz.`O Zäpf. f. E.: Erw. u. Kdr. ab 14 Jahren bis zu 3mal tgl. 1 Zäpf= ch.`O Zäpf. f. = K.: Kdr. zwischen 3 u. 14 Jahren bis zu 3mal tgl. 1 Zäpf.`O Lsg. akut: Erw. u. Jugendl. ab 14 J.: 0,1 mg/kg KG als Einzeld= osis; max. Tagesdosis 0,5 mg/kg KG. Bei eingeschränkter Nierenfkt. ist di= e Dosis d. Funktionsstörung anzupassen. Pat. m. einer Kreatininclearance bis 1= 0 ml/min 1mal tgl. 10 mg (30 Tr.). Pat. m. einer Kreatininclearance von= 11-60 ml/min 1mal tgl. 10 mg (30 Tr.) u. 1mal tgl. 5 mg (15 Tr.).= Einnahme m. etw. Flüssigk. vor den Mahlzeiten.`O Retardkps.: Erw. u. Jugendl. ab 14 Jahren morgens 1 Retardkps. vor dem Essen mit etwas Flüssigke= it. Bei = nächtlichem Aufsteigen von Magensäure in die Speiseröhre = u. Sodbrennen soll die Retardkps. abends eingenommen werden.`ORetard= kps. mite: Erw. u. Jugendl. ab 14 Jahren 2mal tgl. 1 Retardkps. mite vor dem Essen, zur symptomatischen= = Behandlung bei der diabetischen Gastroparese etwa #2 Std. vor dem Essen. Die Einnah= men erfolgen im Abstand von 12 Std. Kdr. von 8-14 Jahren, Patienten mit eingeschränkter Nierenfunktion u. Alterspatienten sowie Patienten m= it einem KG unter 60 kg 1mal tgl. 1 Retardkps. mite morgens od. abends. Einnahmen in= = diesen Fällen im Abstand von 24 Std. Die tgl. Dosis sollte 0,5 mg Met= oclopramid/kg KG = nicht überschreiten.`O Injektionslsg.: Erw. u. Jugendl. ab 14 Jahren 1-3mal tgl. 1 Am= p. i.m. oder = i.v.; Kdr. von 3-14 Jahren eine Tagesdosis von 0,5 mg/kg KG i.m.`O Gastrosil 50 kann nach folgenden 3 Schemata appliziert werden: a) 2 = ;mg = Metoclopramid-HCl (≈ 0,4 ml Gastrosil 50) pro kg KG als Kurzinfu= sion über = 15 Min. Zytostatikum 30 Min. nach Therapiebeginn. Jeweils 2 mg Meto= clopramid- HCl (≈ 0,4 ml Gastrosil 50) pro kg KG werden als weitere Kurzinfusionen über 15 M= in. nach 2, 4, 6 und 9 Std. appliziert.`Ob) 1 mg Meto= clopramid-HCl (≈ 0,2 ml Gastrosil 50) pro kg KG als Kurzinfusion über 15 Min. Zytostatikum 30 Min. nach Therapiebeginn.= Jeweils = 1 mg = Metoclopramid-HCl (≈ 0,2 ml Gastrosil 50) pro kg KG werden als w= eitere = Kurzinfusionen über 15 Min. nach 2, 4, 7, 10 und 13 Std. appliziert= =2E`O c) 2 mg Metoclopramid-HCl (≈ 0,4 ml Gastrosil 50) pro kg KG= als Kurzinfusion über 15 Min. Zytostatikum 2 Std. nach Therapiebeginn (= während der = Dauerinfusion). Anschließend werden 5 mg Metoclopramid-HCl (&= ap; 1 ml Gastrosil 50) pro kg KG als Dauerinfusion über 12 Std. appliziert.`= O Bei Niereninsuffizienz sollte die Dosis von Gastrosil 50 Injektionsl&oum= l;sung auf #3 der normalen Dosis = reduziert bzw. das Dosierungsintervall zwischen den einzelnen Gaben entsprechend erhöht werden.`O Die tgl. Tagesdosis von 0,5 mg/kg KG sollte i. a. nicht ü= berschritten werden. Behandlungsdauer 4-6 Wochen. In Einzelfällen kann Gastrosil= = auch über mehrere Monate, wenn erforderlich, angewendet werden.Tramadolor®Hexal
Tramadolor= ®, Kapseln
Tramadolor® tabs, Tabletten
10 Tbl. (N1) tabs30 Tbl. (N2) tabs50 Tbl. (N3) tabs1 Tbl.Tramadol-HCl50 mgCelluloseLactoseMacrogol 4000MagnesiumstearatPovidonSac= charin-NatriumSilic= iumdioxidAromastoff= e
Tramadolor® Zäpfchen10 Zäpf. (N1)20 Zäpf. (N2)1 Zäpf.Tramadol-HCl100 mgHartfett
Tramadolor® 100 ID Retardtable= tten
10 Retardtbl. (N1)
30 Retardtbl. (N2)50 Retardtbl. (N3)<= Zusammensetzung>1 Retardtbl.Tramadol-HCl100 mgCa= -hydrogenphosphatCe= lluloseLactoseMg-stearatMaisstärkeHypromelloseNa-carboxymethylstärkePovidonhydriertes RizinusölSiliciumdioxid<= /Zusammensetzung>
Tramadolo= r® 50/100 Injektionslösung
5 Amp. (N1) 1 ml 50 mg5 Amp. (N1) 2 ml 100 mg10 Amp. (N2) 2 ml 100 mg1 Amp. 1/2 mlTramado= l-HCl50 mg/100 mgNatriumacetatWasser f. Inj.-zweckeMäßig starke bis starke Schmerzen.Akute Intoxikat. durch Alkohol, Schmerz-, Schlafm., = Opioid, Pat., die MAO-Hemmer erhalten od. innerhalb d. letzten 14 Tage angewendet hab= en. -Brause/-100 Brause/-Kps./-tabs: Kdr.; 100 ID: Kdr. #X 12 = ;J.; -Lsg./-50/-100: Kdr. #X 1 J.; -Zäpfchen: Kdr. #X 14 J. Psychopharmaka. Kdr. unter 14 J. Lsg./Inj.-Lsg.: Kdr. #X 1 J. Drogensubst= itution.A 85 a-e, kKopfverletzung, Schock.= #`K Gr 4, Gr 9. (Chron. Anw.). Gabe von Einzeldos= en mögl.A 85 a-l, n-p, s, v, = w, x#`K La 2. Bei 1mal. Anw. Abstillen jedoch nich= t erforderl.A 85 a, b, dEp= ileptische Krampfanfälle, Blutdruckanstieg, Appetitänderungen, allergische Reaktionen bis zum anaphylakt. Schoc= k, Verschlimmerung von Asthma.A 85= Abschwächung der Wirkung bei Verwendung von Agonisten/Ant= agonisten. Hemmung durch CYP3A4-hemmende Substanzen. Das krampfauslösende Pote= ntial von selektiven Serotonin-Reuptake-Inhibitoren, trizykl. Antidepressiva, Antipsychotika u. andere die Krampfschwelle herabsetzende AM wird erh&ou= ml;ht. Neuroleptika: Krampfanfälle, Carbamazepin: vermindert analget. Effe= kt. MAO-Hemmstoffe: innerh. v. 14 Tagen vor Anw. v. Pethidin: lebensbed= roh. Wechselwirkungen (ZNS, Atmungs- u. Kreislauffkt.), die für Tramadol= nicht auszuschließen sind.#W(V) B. Übersc= hreit. d. empf. Dos. u. gleichz. Anw. and. zentraldämpf. Medik. atemdämpfende Wirk. berücksichtigen. Pat. mit Leber- u.= Nierenfunktionsstör.: Dosierungsintervall verlängern! Abhä= ;ngigkeitspotential. B. längerem Gebr.: Toleranz u. Abhängigk. Erfahrungsgem. trete= n Nebenwirkungen starker Analgetika bes. unter körperl. Belastung auf= =2E Weitere Hinw. s. Fachinfo.-Brause/-Kps./-tabs: B. mä&s= zlig;ig starken Schmerzen: Erw. u. Jugendl. ab 12 J. als ED 50 mg, entspr. 1 Brausetbl./Kps./Tbl. Tritt = innerh. v. = 30-60 Min. keine Schmerzbefr. ein, Wiederh. mögl. B. starken Schmer= zen als ED 100 mg. Nicht > 400 mg/Tag. -100 Brause: B. mä&sz= lig;ig starken Schmerzen: Erw. u. = Jugendl. ab 12 J. als ED 50 mg, entspr. #2 Brausetbl. Tritt innerh. v. 30-60 Min. = keine Schmerzbefr. ein, Wiederh. mögl. B. starken Schmerzen als ED 100 mg. Nicht = > 400 mg/Tag. -100 ID: Erw. u. Jugendl. ab 12 J. ED 200-400 mg, entspr. 1-2&= nbsp;Retardtbl. 2mal tgl. Nicht > 400 mg/Tag. -Lsg.: B. mäßig st= arken Schmerzen: Erw. u. Jugendl. ab 12 J. als ED 50 mg, entspr. 20 Tr. Tritt inne= rh. v. 30-60 Min. keine Schmerzbefr. ein, Wiederh. mögl. B. starken Schmerzen 100&nbs= p;mg, entspr. 40 Tr. Nicht > 400 mg/Tag. -50/-100: B. mäß= ig starken Schmerzen: Erw. u. Jugendl. ab 14 J. als ED 50 mg, entspr. 1 ml. Tritt inner= h. v. 30-60 Min. keine Schmerzbefr. ein, nochmal 1 ml. B. starken Schmerzen als ED 2=  ml. Nicht > 400 mg/Tag. -Zäpf.: Erw. u. Jugendl. ab 14 J. a= ls ED 100 mg, entspr. 1 Zäpf. Nicht > 400 mg/Tag. Dos. b. Tumorschmerze= n, starken Schmerzen n. Operat., Dos. b. Kdr.je 50 (5x10), 100 (10x= 10)Voltaren®Novartis Pharma23.1= =2EB.1.
Voltaren® 25 magensaftresis= tente Dragees
20 Drg. (N1)= Voltaren 2550 Drg. (N= 2) Voltaren 25100 Drg.= (N3) Voltaren 251 Drg.= Diclofenac-Natrium25 mgEisenoxidgelb (E 172)
Lactose
MacrogolMagnesiumstearatMaisstärkeHypromellose= Poly(methacrylat, ethylacrylat) CopolymerisatPovidonPolysorbat 80Poly(O-carboxymethyl)stärkeTalkumTitandioxid (E 171)=
Voltaren® retard Retarddragees20 Retarddrg. (N1)50 Retarddrg. (N2)100 Retarddrg. (N3)1 Retarddrg.Diclofenac-Natrium100 mgE= isenoxidrot (E 172)<= Name>CetylalkoholMa= crogol 8000Magnesiu= mstearatPovidonHypromellosePolysorbat 80SaccharoseTalkumTitandioxid (E 171)Voltaren® 100 Zäpfchen<= /Form>10 Supp. (N1) Voltaren 100= Zäpfchen50 Supp.= (N3) Voltaren 100 Zäpfchen1 Supp.Diclofenac-Natrium100 mg= HartfettVoltaren® für = Kdr. Zäpfchen10 Supp= =2E (N1) Voltaren f. Kdr.50 Supp. (N3) Voltaren f. Kdr.1 Supp.Diclofenac-Natrium25 mg= HartfettN 30Entz&uum= l;ndliche, entzündl. aktivierte degenerative u. extraartikuläre= rheumatische Erkrankungen. Akuter Gichtanfall. Nichtrheumat. schmerzh. Schwellungen u. Entzündungen. Voltaren 50/-100 Zäpfchen/-retar= d zusätzlich: = primäre Dysmenorrhö, Schmerzen bei akuter u. subakuter Adnexitis, Tumorschm= erzen. Voltaren für Kdr. u. Voltaren für Kleinkdr.: juvenile chron. P= olyarthritis u. nichtrheumatische entzündliche Schmerzzustände.= Analgetika-Intoleranz. = (Suppos.: Proktitis), Z 6 (Amp.). Kdr., Kleinkdr. s. Fachinfo. Voltaren Amp., -ret., -100: Kdr. u. Jugendliche.N 30 a-f, h-jPat. m. Colitis ulceros= a, M. Crohn, Pat. unter Diuretika-Therapie u. n. größ. chirurg. Eingriffen sorgfältig überwach= en. Für Kdr. u. Kleinkdr. nur pädiat. Formen anwenden. sN 30 a-h, k, m-pSelten Alopezie. In Einzelf.: Herzinsuff., Vaskulitis u. Pneumoniti= s, = aphthöse Stomatitis, Glossitis, Ösophagusläsionen, Pankreatitis, Photosensibilisierung, Herzkl= opfen, Schmerzen i. d. Brust, Hypertonie. Vorübergehende Hemmung d. Thrombozytenaggregation.N 30;Nephrotox. v. Cyclosporin erh&oum= l;ht. Chinolon-Antibiotika (Krampfneigung erhöht).s. Fachinfo.#W(V) Bei Langzeitbehdlg. sollen als vorsorgl. Maßnahme Kontrollen des Blutbildes, d. Leber- u. Nierenf= unktion = durchgeführt werden. Weit. Einzelh. s. Fachinfo.Erw. initial 150 mg= =2E Erhaltungsdosis 100 mg, ggf. 75 od. 50 mg. Kleinkdr. ab 1 Lebensjahr 0,5-2 mg/kg KG pro Tag, bei juveniler chr= on. = Polyarthritis Erhöh. auf max. 3 mg/kg KG pro Tag. Ältere = Kdr.: 2-3 mg/kg KG pro Tag. Einzelheiten s. Fachinfo.Verfalldatum! (Amp.), Lagerungshinweis!je 6= 00 Drg. Voltaren 25, Voltaren 50 u. Voltaren retard; je 300 Supp. Voltaren 50 u. Voltaren 100; 30 u. 150 Voltaren Amp.gzt, nzt, wzt: Durch Lit. belegt. nzt, wzt: Auflage Stammhaus, z. T. durch Lit. belegt.=
--------------322B2ABE145DE09B7F9C2D0A-- From larsga@ifi.uio.no Fri Jan 22 13:30:09 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 22 Jan 1999 14:30:09 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: <36A871E6.89033191@appliedbiometrics.com> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BCD7.1F992E61@appliedbiometrics.com> <13991.45828.86417.505172@amarok.cnri.reston.va.us> <36A871E6.89033191@appliedbiometrics.com> Message-ID: * Christian Tismer | | The XML file is well-formed, so there must be a bug in the dom | builder. When I let builder.py ignore the assertion error and avoid | popping the tree, it works! The assertion in question is one that compares the element type name of an end tag to the name of the current element. Looks rather strange, since xmlproc (which you apparently use) maintains its own element stack and checks this internally. Unless xmlproc swallows an event somewhere somehow, the error is probably in the DOM. Running saxdemo.py and XMLTest.java to get two canonized versions of the document should show conclusively whether the problem is xmlproc or the DOM. --Lars M. From a.eyre@optichrome.com Fri Jan 22 13:59:27 1999 From: a.eyre@optichrome.com (Adrian Eyre) Date: Fri, 22 Jan 1999 13:59:27 -0000 Subject: [XML-SIG] Pretty-printing DOM trees In-Reply-To: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> Message-ID: <003501be460f$6d3dc110$2bcbd9c2@mars.optichrome.com> --MimeMultipartBoundary Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit > from xml.dom import utils, core Am I using the right XML library here, as mine does not appear to have a file called utils.py in the xml.dom directory. I'm using: http://www.python.org/sigs/xml-sig/files/xml-0.5.tgz > def format(node, indent=4): > """Pretty-print a DOM tree""" I also find passing in an xml.dom.core.Document instance causes the routine to fall over. What am I doing wrong? +------------------------------------------+ | BFN: Adrian Eyre | +------------------------------------------+ --MimeMultipartBoundary-- From Fred L. Drake, Jr." References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> Message-ID: <13992.35977.925124.769186@weyr.cnri.reston.va.us> A.M. Kuchling writes: > Should this be left as just a black-box function, or should it be > implemented as a subclass of the writer.XmlWriter() class? I suppose Andrew, I'd actually like a subclassable version. This doesn't mean you need to write the code, though. ;-) A simple black-box function like yours can be written on top of the basic pretty-printer. I very much like the fact that it operates on a DOM tree, but a SAX-based version might also be nice, especially for large documents. (I'd probably only use the DOM version myself, though). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From tismer@appliedbiometrics.com Fri Jan 22 15:00:07 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Fri, 22 Jan 1999 16:00:07 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BCD7.1F992E61@appliedbiometrics.com> <13991.45828.86417.505172@amarok.cnri.reston.va.us> <36A871E6.89033191@appliedbiometrics.com> Message-ID: <36A89277.ADF9A784@appliedbiometrics.com> Lars Marius Garshol wrote: > > * Christian Tismer > | > | The XML file is well-formed, so there must be a bug in the dom > | builder. When I let builder.py ignore the assertion error and avoid > | popping the tree, it works! > > The assertion in question is one that compares the element type name > of an end tag to the name of the current element. Looks rather > strange, since xmlproc (which you apparently use) maintains its own > element stack and checks this internally. > > Unless xmlproc swallows an event somewhere somehow, the error is > probably in the DOM. Running saxdemo.py and XMLTest.java to get two > canonized versions of the document should show conclusively whether > the problem is xmlproc or the DOM. I ran the file through saxdemo.py and it works. Further I tried readXml with the sgmlop and sgmllib, giving the same result. Lets me say: it must be DOM. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From jim.fulton@digicool.com Fri Jan 22 15:16:47 1999 From: jim.fulton@digicool.com (Jim Fulton) Date: Fri, 22 Jan 1999 10:16:47 -0500 Subject: [XML-SIG] [Fwd: [Zope] - XML-RPC] Message-ID: <36A8965F.A7BE0EEF@digicool.com> This is a multi-part message in MIME format. --------------6EB8E500C25FCF6AB5718FC5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I meant to CC the XML SIG mailing list in this message, but forgot to. Jim --------------6EB8E500C25FCF6AB5718FC5 Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline Received: from albert.digicool.com ([206.156.192.156]) by gandalf.digicool.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.1960.3) id D1YQ5JFN; Fri, 22 Jan 1999 09:27:13 -0500 Received: from albert.digicool.com (localhost [127.0.0.1]) by albert.digicool.com (8.9.1/8.9.1) with ESMTP id JAA14284; Fri, 22 Jan 1999 09:19:43 -0500 Received: from digicool.com (glebe.digicool.com [206.156.192.148]) by albert.digicool.com (8.9.1/8.9.1) with ESMTP id JAA14261; Fri, 22 Jan 1999 09:19:24 -0500 Message-ID: <36A88774.DDF32151@digicool.com> Date: Fri, 22 Jan 1999 09:13:08 -0500 From: Jim Fulton Organization: Digital Creations, Inc. X-Mailer: Mozilla 4.5 [en] (WinNT; I) X-Accept-Language: en MIME-Version: 1.0 To: Skip Montanaro CC: Pavlos Christoforou , zope@zope.org Subject: Re: [Zope] - XML-RPC References: <13991.34599.787787.461627@dolphin.calendar.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: zope-admin@zope.org List-Id: Zope -- The Z Object Publishing Environment Errors-To: zope-admin@zope.org X-BeenThere: zope@zope.org X-Mailman-Version: 1.0b6 I think that XML-RPC would almost certainly be a cool thing to support in Zope, and Zope would be a cool server for XML RPC. IMO, the right way to do it would be to add support for it to ZPublisher. XML-RPC (http://www.scripting.com/frontier5/xml/code/rpc.html) uses POST requests with content type "text/xml". (Does anyone but me think that this content type is a bit too broad?) It would be straightforward for ZPublisher to recognize this case and: - Add the method supplied in the body to the request path, - Get method parameters (positionally) from the body. I'm in favor of this but doubt that anyone here at DC will have time to do this for some time. I'd gladly accept patches though, and would be willing to discuss details with anyone working on such patches. ;) In fact, if anyone does work on this, I'd prefer to discuss it with them before they get too far. Jim -- Jim Fulton mailto:jim@digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. --------------6EB8E500C25FCF6AB5718FC5-- From tismer@appliedbiometrics.com Fri Jan 22 18:14:05 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Fri, 22 Jan 1999 19:14:05 +0100 Subject: [XML-SIG] SAX prettyprinter (was:Pretty-printing DOM trees) References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> Message-ID: <36A8BFED.BE6C3EF6@appliedbiometrics.com> This is a multi-part message in MIME format. --------------2FF7655D01AF62D3F1B5AD1E Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit A.M. Kuchling wrote: > > The format() function below pretty-prints a DOM tree. It strips away > all the whitespace, and then inserts Text nodes containing white > space, producing output like this: > > > > > > xmlproc: A Python XML parser > > > >

> xmlproc: A Python XML parser >

> > I wrote something similar for the SAX interface. indenter.py is appended. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home --------------2FF7655D01AF62D3F1B5AD1E Content-Type: text/plain; charset=us-ascii; name="indenter.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="indenter.py" # pretty printer for SAX # CT990122 # based upon the saxutils.Canonizer code from xml.sax import saxexts, saxlib, saxutils import sys class Indenter(saxlib.HandlerBase): "A SAX document handler that produces indented XML output." def __init__(self,writer=sys.stdout, indent=2): self.elem_level=0 self.writer=writer self.indent=indent self.last_level=-1 def processingInstruction (self,target, remainder): if not target=="xml": self.writer.write("\n") def startElement(self,name,amap): self.writer.write("\n"+self.indent*self.elem_level*" "+"<"+name) a_names=amap.keys() a_names.sort() for a_name in a_names: self.writer.write(" "+a_name+"=\"") self.write_data(amap[a_name]) self.writer.write("\"") self.writer.write(">") self.last_level = self.elem_level self.elem_level=self.elem_level+1 def endElement(self,name): self.elem_level=self.elem_level-1 if self.last_level < self.elem_level: self.writer.write("\n"+self.indent*self.elem_level*" "+"") else: self.writer.write("") self.last_level = -1 def ignorableWhitespace(self,data,start_ix,length): # we drop white space here. # self.characters(data,start_ix,length) def characters(self,data,start_ix,length): if self.elem_level>0: self.write_data(data[start_ix:start_ix+length]) def write_data(self,data): "Writes datachars to writer." data=string.replace(data,"&","&") data=string.replace(data,"<","<") data=string.replace(data,"\"",""") data=string.replace(data,">",">") # data=string.replace(data,chr(9)," ") # data=string.replace(data,chr(10)," ") # data=string.replace(data,chr(13)," ") # data = string.strip(data) self.writer.write(data) def endDocument(self): self.writer.write("\n") try: pass #self.writer.close() except NameError: pass # It's OK, if the method isn't there we probably don't need it --------------2FF7655D01AF62D3F1B5AD1E-- From tismer@appliedbiometrics.com Fri Jan 22 20:26:48 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Fri, 22 Jan 1999 21:26:48 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> Message-ID: <36A8DF08.C0E13776@appliedbiometrics.com> This is a multi-part message in MIME format. --------------76DB1FFD28AC0CB493D29956 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi again, the appended version of Indenter.py can use sgmlop to format large XML files. It then processes a few megabytes in a few seconds. BTW - is sgmlop deprecated? It still has some flaws, like not allowing "_" in tagnames. Is Fredrik no longer supporting it, or what is the current preferred fast parser for all platforms? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home --------------76DB1FFD28AC0CB493D29956 Content-Type: text/plain; charset=us-ascii; name="indenter.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="indenter.py" # pretty printer for SAX # CT990122 # based upon the saxutils.Canonizer code # V.0.2 support for sgmlop which doesn't give ignorableWhitespace info from xml.sax import saxexts, saxlib, saxutils import string, sys class Indenter(saxlib.HandlerBase): "A SAX document handler that produces indented XML output." def __init__(self,writer=sys.stdout, indent=2): self.elem_level=0 self.writer=writer self.indent=indent self.last_level=-1 self.buffer = "" # lazy buffer for whitespace stripping def processingInstruction (self,target, remainder): #if not target=="xml": self.writer.write("\n") def startElement(self,name,amap): if self.buffer: self.write_buffer() self.writer.write("\n"+self.indent*self.elem_level*" "+"<"+name) a_names=amap.keys() a_names.sort() for a_name in a_names: self.writer.write(" "+a_name+"=\"") self.write_data(amap[a_name], 1) self.writer.write("\"") self.writer.write(">") self.last_level = self.elem_level self.elem_level=self.elem_level+1 def endElement(self,name): if self.buffer: self.write_buffer() self.elem_level=self.elem_level-1 if self.last_level < self.elem_level: self.writer.write("\n"+self.indent*self.elem_level*" "+"") else: self.writer.write("") self.last_level = -1 def ignorableWhitespace(self,data,start_ix,length): # we drop white space here. # self.characters(data,start_ix,length) pass def characters(self,data,start_ix,length): if self.elem_level>0: self.put_buffer(data[start_ix:start_ix+length]) def put_buffer(self, txt): self.buffer = self.buffer+txt def write_buffer(self): if self.buffer: self.write_data(string.strip(self.buffer)) self.buffer = "" def write_data(self,data, quotes=0): "Writes datachars to writer." data=string.replace(data,"&","&") data=string.replace(data,"<","<") if quotes: data=string.replace(data,"\"",""") data=string.replace(data,">",">") self.writer.write(data) def endDocument(self): self.write_buffer() self.writer.write("\n") try: pass #self.writer.close() except NameError: pass # It's OK, if the method isn't there we probably don't need it """ Example to format a DOM: >>> i=Indenter() >>> p=saxexts.make_parser() >>> p.setErrorHandler(saxutils.ErrorPrinter()) >>> p.setDocumentHandler(i) >>> p.parseFile(cStringIO.StringIO(dom.toxml())) Example to format a file to a file, with sgmlop as parser: >>> f=open(r'd:\tmp\test.xml',"w") >>> i=Indenter(f) >>> p=saxexts.make_parser("xml.sax.drivers.drv_sgmlop") >>> p.setErrorHandler(saxutils.ErrorPrinter()) >>> p.setDocumentHandler(i) >>> p.parseFile(r"h:\pns\projekte\srz\roteli\birgit\sgml\praep.sgm.umgebrochen.xml") >>> f.close() """ --------------76DB1FFD28AC0CB493D29956-- From tismer@appliedbiometrics.com Fri Jan 22 20:27:58 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Fri, 22 Jan 1999 21:27:58 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> Message-ID: <36A8DF4E.2D3852D7@appliedbiometrics.com> This is a multi-part message in MIME format. --------------F46600A1D3B1D2BC0AA2B68F Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi again, the appended version of Indenter.py can use sgmlop to format large XML files. It then processes a few megabytes in a few seconds. sgmlop does not support ignorableWhitespace, so I supported this alone, by delayed writing and postprocessing. BTW - is sgmlop deprecated? It still has some flaws, like not allowing "_" in tagnames. Is Fredrik no longer supporting it, or what is the current preferred fast parser for all platforms? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home --------------F46600A1D3B1D2BC0AA2B68F Content-Type: text/plain; charset=us-ascii; name="indenter.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="indenter.py" # pretty printer for SAX # CT990122 # based upon the saxutils.Canonizer code # V.0.2 support for sgmlop which doesn't give ignorableWhitespace info from xml.sax import saxexts, saxlib, saxutils import string, sys class Indenter(saxlib.HandlerBase): "A SAX document handler that produces indented XML output." def __init__(self,writer=sys.stdout, indent=2): self.elem_level=0 self.writer=writer self.indent=indent self.last_level=-1 self.buffer = "" # lazy buffer for whitespace stripping def processingInstruction (self,target, remainder): #if not target=="xml": self.writer.write("\n") def startElement(self,name,amap): if self.buffer: self.write_buffer() self.writer.write("\n"+self.indent*self.elem_level*" "+"<"+name) a_names=amap.keys() a_names.sort() for a_name in a_names: self.writer.write(" "+a_name+"=\"") self.write_data(amap[a_name], 1) self.writer.write("\"") self.writer.write(">") self.last_level = self.elem_level self.elem_level=self.elem_level+1 def endElement(self,name): if self.buffer: self.write_buffer() self.elem_level=self.elem_level-1 if self.last_level < self.elem_level: self.writer.write("\n"+self.indent*self.elem_level*" "+"") else: self.writer.write("") self.last_level = -1 def ignorableWhitespace(self,data,start_ix,length): # we drop white space here. # self.characters(data,start_ix,length) pass def characters(self,data,start_ix,length): if self.elem_level>0: self.put_buffer(data[start_ix:start_ix+length]) def put_buffer(self, txt): self.buffer = self.buffer+txt def write_buffer(self): if self.buffer: self.write_data(string.strip(self.buffer)) self.buffer = "" def write_data(self,data, quotes=0): "Writes datachars to writer." data=string.replace(data,"&","&") data=string.replace(data,"<","<") if quotes: data=string.replace(data,"\"",""") data=string.replace(data,">",">") self.writer.write(data) def endDocument(self): self.write_buffer() self.writer.write("\n") try: pass #self.writer.close() except NameError: pass # It's OK, if the method isn't there we probably don't need it """ Example to format a DOM: >>> i=Indenter() >>> p=saxexts.make_parser() >>> p.setErrorHandler(saxutils.ErrorPrinter()) >>> p.setDocumentHandler(i) >>> p.parseFile(cStringIO.StringIO(dom.toxml())) Example to format a file to a file, with sgmlop as parser: >>> f=open(r'd:\tmp\test.xml',"w") >>> i=Indenter(f) >>> p=saxexts.make_parser("xml.sax.drivers.drv_sgmlop") >>> p.setErrorHandler(saxutils.ErrorPrinter()) >>> p.setDocumentHandler(i) >>> p.parseFile(r"h:\pns\projekte\srz\roteli\birgit\sgml\praep.sgm.umgebrochen.xml") >>> f.close() """ --------------F46600A1D3B1D2BC0AA2B68F-- From dieter@handshake.de Fri Jan 22 20:19:15 1999 From: dieter@handshake.de (Dieter Maurer) Date: Fri, 22 Jan 1999 21:19:15 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: <36A871E6.89033191@appliedbiometrics.com> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> Message-ID: <199901222019.VAA00392@lindm.dm> Hello Christian Using the PDB, I got the following sequence of parser events: START: Praeparate START: Praeparat START: Name END: /Name START: Firma END: /Firma END: /Name The last event, obviously, is wrong. It seems, "xmlproc" does something wrong. I append the PDB log. Dieter ---------------------------------------------------------------------------- >>> d.run("p.parse('ct.xml')") > (0)?() (Pdb) b {'/usr/local/lib/python1.5/site-packages/xml/dom/builder.py': [44, 53]} (Pdb) c > (1)?() (Pdb) > /usr/local/lib/python1.5/site-packages/xml/dom/builder.py(44)startElement() -> def startElement(self, name, attrs = {}): (Pdb) p name 'Praeparate' (Pdb) c > /usr/local/lib/python1.5/site-packages/xml/dom/builder.py(44)startElement() -> def startElement(self, name, attrs = {}): (Pdb) p name 'Praeparat' (Pdb) c > /usr/local/lib/python1.5/site-packages/xml/dom/builder.py(44)startElement() -> def startElement(self, name, attrs = {}): (Pdb) p name 'Name' (Pdb) c > /usr/local/lib/python1.5/site-packages/xml/dom/builder.py(53)endElement() -> def endElement(self, name): (Pdb) p name 'Name' (Pdb) c > /usr/local/lib/python1.5/site-packages/xml/dom/builder.py(44)startElement() -> def startElement(self, name, attrs = {}): (Pdb) p name 'Firma' (Pdb) c > /usr/local/lib/python1.5/site-packages/xml/dom/builder.py(53)endElement() -> def endElement(self, name): (Pdb) p name 'Firma' (Pdb) c > /usr/local/lib/python1.5/site-packages/xml/dom/builder.py(53)endElement() -> def endElement(self, name): (Pdb) p name 'Form' From tismer@appliedbiometrics.com Fri Jan 22 20:59:10 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Fri, 22 Jan 1999 21:59:10 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> <199901222019.VAA00392@lindm.dm> Message-ID: <36A8E69E.D90FDB0@appliedbiometrics.com> Dieter Maurer wrote: > > Hello Christian > > Using the PDB, I got the following sequence of parser events: > > START: Praeparate > START: Praeparat > START: Name > END: /Name > START: Firma > END: /Firma > END: /Name > > The last event, obviously, is wrong. > It seems, "xmlproc" does something wrong. > > I append the PDB log. Thank you! Actually I claimed that the XML file was right, but it wasn't completely. This one was not closed: But after that change, FileReader still barfs. With my SAS prettyprinter everything works fine, with sgmlop, xmlproc, whatever I used. So I doubt xmlproc is wrong. There must be something deeper. Did you recognize the incompatibility of SAX and DOM? After playing with several SAX tools, it was impossible to import xml.dom any longer. Something is wrong, deep in the classes which are already a little complicated for my small brain. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From akuchlin@cnri.reston.va.us Fri Jan 22 21:08:01 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 22 Jan 1999 16:08:01 -0500 (EST) Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP In-Reply-To: <36A8DF4E.2D3852D7@appliedbiometrics.com> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> <36A8DF4E.2D3852D7@appliedbiometrics.com> Message-ID: <13992.59381.973689.430970@amarok.cnri.reston.va.us> Christian Tismer writes: >BTW - is sgmlop deprecated? >It still has some flaws, like not allowing "_" in tagnames. >Is Fredrik no longer supporting it, or what is the current >preferred fast parser for all platforms? I haven't heard anything about sgmlop being deprecated; as far as I know it's still being supported, and there is no preferred fast parser; use sgmlop or PyExpat as you wish. A while back Fredrik told me that he had still had some small fixes for sgmlop, but he's been busy since then, and I haven't heard anything more; perhaps the _ problem you report is one of them. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ Not everybody knows that looking at people in 'a funny way' is the commonest cause of sudden murder. I happen to know that because I read a Home Office brochure once. -- Tom Baker, in his autobiography From akuchlin@cnri.reston.va.us Fri Jan 22 21:26:27 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 22 Jan 1999 16:26:27 -0500 (EST) Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: <36A8E69E.D90FDB0@appliedbiometrics.com> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> <199901222019.VAA00392@lindm.dm> <36A8E69E.D90FDB0@appliedbiometrics.com> Message-ID: <13992.60573.126910.674312@amarok.cnri.reston.va.us> Christian Tismer writes: >So I doubt xmlproc is wrong. There must be something deeper. >Did you recognize the incompatibility of SAX and DOM? >After playing with several SAX tools, it was impossible >to import xml.dom any longer. That's bizarre, and I don't see how that would be possible in Python. What were the symptoms? What happened when the import failed? (I'll look into the problem with FileReader tonight; no time to do it at work.) -- A.M. Kuchling http://starship.skyport.net/crew/amk/ I spent a busy day today, but got little done. This is because I am at last becoming perfect in the art of seeming busy, even when very little is going on in my head or under my hands. This is an art which every man learns, if he does not intend to work himself to death. -- Robertson Davies, _The Table Talk of Samuel Marchbanks_ From dieter@handshake.de Fri Jan 22 21:44:09 1999 From: dieter@handshake.de (Dieter Maurer) Date: Fri, 22 Jan 1999 22:44:09 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: <36A871E6.89033191@appliedbiometrics.com> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> Message-ID: <199901222144.WAA00911@lindm.dm> Hello Christian I have investigated the problem further: "xmlproc" requires *ALL* attribute values to be enclosed in either single or double quotes. The problem is caused by your more precisely, the "status=F", where the "F" is not enclosed in quotes. "xmlproc" sees the problem and reports an error "3016" (you will see it, if you install an error handler). Then it skips beyond the closing '>'. However, it is still in attribute processing for "Darreichung" -- an "xmlproc" bug. In this mode, it cannot understand "
" and its content and keeps skipping until the "
" which is reported as end tag -- an end tag without corresponding start tag. Dieter From larsga@ifi.uio.no Sat Jan 23 10:21:08 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 23 Jan 1999 11:21:08 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP In-Reply-To: <36A8DF4E.2D3852D7@appliedbiometrics.com> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> <36A8DF4E.2D3852D7@appliedbiometrics.com> Message-ID: * Christian Tismer | | the appended version of Indenter.py can use sgmlop to format large | XML files. It then processes a few megabytes in a few seconds. How is the performance when you use sgmlop directly compared to when you use it's SAX driver? | BTW - is sgmlop deprecated? If it works with your XML it should be OK, but it does not conform very closely to the standard, unlike expat. --Lars M. From larsga@ifi.uio.no Sat Jan 23 10:43:59 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 23 Jan 1999 11:43:59 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: <199901222144.WAA00911@lindm.dm> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> <199901222144.WAA00911@lindm.dm> Message-ID: * Dieter Maurer | | I have investigated the problem further: | | "xmlproc" requires *ALL* attribute values to be enclosed in either | single or double quotes. This is correct, simply because the XML standard also requires this. A document that does not quote all it's attribute values is not well-formed and thus not considered an XML document at all. In fact, by continuing to report data events to the client after a well-formedness error, xmlproc violates the standard. Since it appears to be so common to not use error handlers, perhaps I should make it conform. What do people think? | The problem is caused by your | | | | more precisely, the "status=F", where the "F" is not enclosed in | quotes. | | "xmlproc" sees the problem and reports an error "3016" Almost correct, it reports 3004: "One of ' or " expected". | (you will see it, if you install an error handler). Just a tip: my experience is that if you don't always install error handlers little nitty problems with your XML will cause you a lot of headaches that you can't figure out at first. xml.sax.saxutils contains two default error handlers that you can plug in and use directly. One prints errors, the other raises exceptions. | Then it skips beyond the closing '>'. This is correct. This is xmlproc in 'panic mode'. Since it doesn't do tokenization it has no clues as to what is coming up next, and tries to skip to the end of the start tag. | However, it is still in attribute processing for "Darreichung" -- an | "xmlproc" bug. So it is. Even though the application has no right to expect correct information about the document any more, it is pointless not to get this right when it is so easy to do it. We'll pay a slight performance penalty for it, though. Thank you very much for diagnosing the problem so clearly. I'll fix this now so that the problem does not occur in 0.60. (0.60, by the way, should have full support for parameter entities.) --Lars M. From fredrik@pythonware.com Sat Jan 23 11:48:10 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sat, 23 Jan 1999 12:48:10 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP Message-ID: <00b701be46c6$40ea7a10$f29b12c2@pythonware.com> >BTW - is sgmlop deprecated? >It still has some flaws, like not allowing "_" in tagnames. >Is Fredrik no longer supporting it, or what is the current >preferred fast parser for all platforms? the XML session on the Houston conference decided to lobby for sgmlop to be included in a future Python release. don't know if anyone is actually doing some- thing about that, though... sgmlop was intentionally designed to have a very efficient Python interface, be small enough to ship with the standard distribution without anyone noticing, and to be compatible with both sgmllib and xmllib. it's currently somewhat sloppy (that is, you can use it to parse most xml data, but you shouldn't use it to verify that your xml writing code creates perfectly portable xml). one big problem with it s that it's being ignored by the sgmllib and xmllib maintainers, so keeping things in sync is pretty hard. on the other hand, looks like people don't care about back- wards compatibility any more. xmllib in 1.5.2b1 silently broke ALL our existing XML code (including xmlrpclib). I'm seriously considering to just ignore the standard stuff, and stick to our proprietary XML hacks in future applications... patches to sgmlop.c are still welcome, though. Cheers /F From larsga@ifi.uio.no Sat Jan 23 15:03:38 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 23 Jan 1999 16:03:38 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP In-Reply-To: <00b701be46c6$40ea7a10$f29b12c2@pythonware.com> References: <00b701be46c6$40ea7a10$f29b12c2@pythonware.com> Message-ID: * Fredrik Lundh | | on the other hand, looks like people don't care about back- wards | compatibility any more. xmllib in 1.5.2b1 silently broke ALL our | existing XML code (including xmlrpclib). Did you complain? It's still in beta, so they might revert back, no? | I'm seriously considering to just ignore the standard stuff, and | stick to our proprietary XML hacks in future applications... Maybe you could use SAX? It won't break things if it is at all possible to avoid it, and at least you'll a chance to voice your opinion first here on the XML-SIG. And with mllib you can even get an xmllib-like interface. (I qualify this because two things may change: EntityResolver and setLocale. Hopefully nobody will veto EntityResolver.) --Lars M. From tismer@appliedbiometrics.com Sat Jan 23 15:04:31 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sat, 23 Jan 1999 16:04:31 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> <199901222144.WAA00911@lindm.dm> Message-ID: <36A9E4FF.76B8E3D5@appliedbiometrics.com> Dieter Maurer wrote: > > Hello Christian > > I have investigated the problem further: > > "xmlproc" requires *ALL* attribute values to be enclosed > in either single or double quotes. > > The problem is caused by your > > > > more precisely, the "status=F", where the "F" is not enclosed in quotes. Aaahh, oh, whow, thanks. Maybe xmlproc should be a little more forgiving for this case and not skip beyond ">" but just skip (or repair) the attribute. XMLers, please take my excuse, it was not DOM but a faulty Python script from my course. My XMLpro viewer didn't complain, so I thought it was correct. thanks for all the support - chris (und besonders an Dieter) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From tismer@appliedbiometrics.com Sat Jan 23 15:14:54 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sat, 23 Jan 1999 16:14:54 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> <199901222019.VAA00392@lindm.dm> <36A8E69E.D90FDB0@appliedbiometrics.com> <13992.60573.126910.674312@amarok.cnri.reston.va.us> Message-ID: <36A9E76E.48EF2A46@appliedbiometrics.com> Andrew M. Kuchling wrote: > > Christian Tismer writes: > >So I doubt xmlproc is wrong. There must be something deeper. > >Did you recognize the incompatibility of SAX and DOM? > >After playing with several SAX tools, it was impossible > >to import xml.dom any longer. > > That's bizarre, and I don't see how that would be possible in Python. > What were the symptoms? What happened when the import failed? it was simply impossible to import xml.dom any longer. xml.sax was still working. I closed my PyWin session, started over and it was alive again. I don't know how to reproduce this yet, but it happened the second time. Some combination of trying this driver and that one... I need to turn my session logger on forever, then I can see what I did. > (I'll look into the problem with FileReader tonight; no time > to do it at work.) I think there is something more. When I pass a file object to my Indenter (which is basically similar to Normalizer) and later delete that file, also delete all instances of handlers which I created, the file doesn't get closed. There is something sticky which keeps references alive. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From tismer@appliedbiometrics.com Sat Jan 23 15:44:02 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sat, 23 Jan 1999 16:44:02 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> <36A8DF4E.2D3852D7@appliedbiometrics.com> Message-ID: <36A9EE42.78F166D5@appliedbiometrics.com> Lars Marius Garshol wrote: > > * Christian Tismer > | > | the appended version of Indenter.py can use sgmlop to format large > | XML files. It then processes a few megabytes in a few seconds. > > How is the performance when you use sgmlop directly compared to when > you use it's SAX driver? I didn't try yet since I was very happy with the speed. > | BTW - is sgmlop deprecated? > > If it works with your XML it should be OK, but it does not conform > very closely to the standard, unlike expat. I could no use pyexpat yet, since a pyexpat dll is missing. I will build one for Windows (as I also did before with sgmlop, the binary in the CVS was broke). I just wasn't aware that I need to get an extra tar file for that. When I find the time, I will also provide a patch for sgmlop for a couple of things. What I need to find is the fastest acceptable parser which allows me to turn masses of XML data into Python structures. We don't work with complicated but smaller documents, but we are processing XML encoded database records which are quite irregular (useless to use a relational database) and quite simple, but the standard size is some 50MB. This is why I'm after speed, much more than conformance. A general question (comes up because I had to hack my Indenter especially for sgmlop): Is a SAX parser required to report ignorableWHitespace events? Or is it also allowed to never call this method, as sgmlop does? If so, then the interface doesn't make too much sense since I have to collect all data and handle whitespace when the next tag appears. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From larsga@ifi.uio.no Sat Jan 23 15:54:30 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 23 Jan 1999 16:54:30 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP In-Reply-To: <36A9EE42.78F166D5@appliedbiometrics.com> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> <36A8DF4E.2D3852D7@appliedbiometrics.com> <36A9EE42.78F166D5@appliedbiometrics.com> Message-ID: * Lars Marius Garshol | | How is the performance when you use sgmlop directly compared to when | you use it's SAX driver? * Christian Tismer | | I didn't try yet since I was very happy with the speed. Would be interesting to know, though, since it will tell us something about what the penalty of using SAX is, compared to doing it directly. | I could no use pyexpat yet, since a pyexpat dll is missing. I will | build one for Windows (as I also did before with sgmlop, the binary | in the CVS was broke). Both the pyexpat and the sgmlop DLLs are in CVS and both of them work for me. Maybe you should try a 'cvs update'? :) | Is a SAX parser required to report ignorableWHitespace events? No, and in fact non-validating parsers cannot tell the difference if they haven't read the DTD. (AElfred reads the DTD to be able to provide this information, but does not validate.) See | Or is it also allowed to never call this method, as sgmlop does? If | so, then the interface doesn't make too much sense since I have to | collect all data and handle whitespace when the next tag appears. I agree that this is suboptimal, but the problem springs from the design of XML itself. Most parsers simply do not have the information required to know when to call this method. --Lars M. From tismer@appliedbiometrics.com Sat Jan 23 16:30:05 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sat, 23 Jan 1999 17:30:05 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> <36A8DF4E.2D3852D7@appliedbiometrics.com> <36A9EE42.78F166D5@appliedbiometrics.com> Message-ID: <36A9F90D.B6759872@appliedbiometrics.com> Lars Marius Garshol wrote: > > * Lars Marius Garshol > | > | How is the performance when you use sgmlop directly compared to when > | you use it's SAX driver? > > * Christian Tismer > | > | I didn't try yet since I was very happy with the speed. > > Would be interesting to know, though, since it will tell us something > about what the penalty of using SAX is, compared to doing it directly. I will provide timings when I have time, also with expat. > | I could no use pyexpat yet, since a pyexpat dll is missing. I will > | build one for Windows (as I also did before with sgmlop, the binary > | in the CVS was broke). > > Both the pyexpat and the sgmlop DLLs are in CVS and both of them work > for me. Maybe you should try a 'cvs update'? :) :)) it is *my* dll which is in the cvs now. But you are right, the (py)expat dlls are all there. I just cannot import pyexpat. The dlls are not found. sgmlop works off-the-shelf. Is it necessary to adjust path variables for pyexpat? If so, then I'll change the layout for Windows a little to make this unnecessary. Until now, I could simply plug the whole package into my Python dir and use it. And thanks about the info concerning whitespace. ciao - chris p.s.: now busy building an ultra-light DOM which needs less memory than its XML string representation. It's becoming fun :-) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From gstein@lyra.org Sun Jan 24 00:48:52 1999 From: gstein@lyra.org (Greg Stein) Date: Sat, 23 Jan 1999 16:48:52 -0800 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> <199901222144.WAA00911@lindm.dm> <36A9E4FF.76B8E3D5@appliedbiometrics.com> Message-ID: <36AA6DF4.662C7ED5@lyra.org> Christian Tismer wrote: > > Dieter Maurer wrote: > > > > Hello Christian > > > > I have investigated the problem further: > > > > "xmlproc" requires *ALL* attribute values to be enclosed > > in either single or double quotes. > > > > The problem is caused by your > > > > > > > > more precisely, the "status=F", where the "F" is not enclosed in quotes. > > Aaahh, oh, whow, thanks. > Maybe xmlproc should be a little more forgiving for this case > and not skip beyond ">" but just skip (or repair) the attribute. It should *NOT* repair the attribute. That will simply encourage poor XML authoring. It should report the error properly (or, alternatively, the error should be responded to properly). Cheers, -g -- Greg Stein, http://www.lyra.org/ From larsga@ifi.uio.no Sun Jan 24 11:28:17 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 24 Jan 1999 12:28:17 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: <36AA6DF4.662C7ED5@lyra.org> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> <199901222144.WAA00911@lindm.dm> <36A9E4FF.76B8E3D5@appliedbiometrics.com> <36AA6DF4.662C7ED5@lyra.org> Message-ID: * Christian Tismer | | Maybe xmlproc should be a little more forgiving for this case and | not skip beyond ">" but just skip (or repair) the attribute. * Greg Stein | | It should *NOT* repair the attribute. That will simply encourage | poor XML authoring. It should report the error properly (or, | alternatively, the error should be responded to properly). The error is reported properly as it is and the attribute is not repaired, but subsequent data events are wrong. That's now fixed (data events, not the attribute), but the question remains whether the parser should follow the XML recommendation and stop reporting data events after a well-formedness bug. I'm inclined to make that default behaviour, but behaviour it is possible to turn off. Opinions are welcome. --Lars M. From gstein@lyra.org Sun Jan 24 11:39:35 1999 From: gstein@lyra.org (Greg Stein) Date: Sun, 24 Jan 1999 03:39:35 -0800 (PST) Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: Message-ID: On 24 Jan 1999, Lars Marius Garshol wrote: > The error is reported properly as it is and the attribute is not > repaired, but subsequent data events are wrong. That's now fixed (data > events, not the attribute), but the question remains whether the > parser should follow the XML recommendation and stop reporting data > events after a well-formedness bug. > > I'm inclined to make that default behaviour, but behaviour it is > possible to turn off. Opinions are welcome. Sounds good -- default is to "abort" on bad input. Cheers, -g -- Greg Stein, http://www.lyra.org/ From larsga@ifi.uio.no Sun Jan 24 12:20:09 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 24 Jan 1999 13:20:09 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: References: Message-ID: * Lars Marius Garshol | | [...] the question remains whether the parser should follow the XML | recommendation and stop reporting data events after a | well-formedness bug. | | I'm inclined to make that default behaviour, but behaviour it is | possible to turn off. Opinions are welcome. * Greg Stein | | Sounds good -- default is to "abort" on bad input. I know, but the user might want to know if there are more errors, to avoid having to run the parser n times for n well-formedness errors. So I prefer not reporting more data events, but keep sending error events. The application can stop the parse at any time by throwing an exception, anyway. Thanks for the opinion. Once I get a couple more of those I'll do the necessary patch. --Lars M. From fredrik@pythonware.com Sun Jan 24 12:37:37 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sun, 24 Jan 1999 13:37:37 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP Message-ID: <006001be4796$54879230$f29b12c2@pythonware.com> >A general question (comes up because I had to hack my Indenter >especially for sgmlop): >Is a SAX parser required to report ignorableWHitespace events? >Or is it also allowed to never call this method, as sgmlop does? >If so, then the interface doesn't make too much sense since I have >to collect all data and handle whitespace when the next tag appears. If I understand things correctly, sgmlop cannot figure out what's ignorable and not; you need to have access to the DTD to handle that. Our internal xml libraries allows the user to indicate whether a resource is "xml text" or "xml data". the latter doesn't allow elements to contain both text and other elements, which means that it's easy to figure out what to ignore. Cheers /F fredrik@pythonware.com http://www.pythonware.com From tismer@appliedbiometrics.com Sun Jan 24 12:29:19 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sun, 24 Jan 1999 13:29:19 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> <199901222144.WAA00911@lindm.dm> <36A9E4FF.76B8E3D5@appliedbiometrics.com> <36AA6DF4.662C7ED5@lyra.org> Message-ID: <36AB121F.7620E5D@appliedbiometrics.com> Greg Stein wrote: > > Christian Tismer wrote: ... > > Aaahh, oh, whow, thanks. > > Maybe xmlproc should be a little more forgiving for this case > > and not skip beyond ">" but just skip (or repair) the attribute. > > It should *NOT* repair the attribute. That will simply encourage poor > XML authoring. It should report the error properly (or, alternatively, > the error should be responded to properly). Well, I agree. It should not encourage bad authoring. But I, as a complete newbie to a SIG which is very evolving, was kind of struggling with a lot of code, many parsers, and so on. I think, others will get into at least as much trouble as I had. Furthermore, the file which I wanted to inspect wasn't mine. What should I do if I'm confronted with foreign XML files which have some flaws, and the parser doesn't make it through it. The argument is fine for me, but in this case I have no chance. For my custom work, it would be best to have a parser which *does* complain about an error, but also repairs easy cases like this. This gives me a chance to work with the file, inspect it and complain to my customer. This is easy after all since I now know enough of the XML package and can help myself. The remaining qeustion is: How should faulty XML be handled at all? There are enough examples where you cannot simply reject the document. You need to read it. Does it make sense to think of a "correcting" parser which turns a bad document into something well-formed which can be inspected with an XML browser, together with some error-annotation tags? cheers - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From gstein@lyra.org Sun Jan 24 12:41:14 1999 From: gstein@lyra.org (Greg Stein) Date: Sun, 24 Jan 1999 04:41:14 -0800 (PST) Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: <36AB121F.7620E5D@appliedbiometrics.com> Message-ID: On Sun, 24 Jan 1999, Christian Tismer wrote: > Well, I agree. It should not encourage bad authoring. > But I, as a complete newbie to a SIG which is very evolving, > was kind of struggling with a lot of code, many parsers, and so > on. I think, others will get into at least as much trouble > as I had. Well, that was simply because the errors weren't reported properly. That can be fixed. > Furthermore, the file which I wanted to inspect wasn't mine. > What should I do if I'm confronted with foreign XML files > which have some flaws, and the parser doesn't make it through > it. The argument is fine for me, but in this case I have > no chance. Push back against where the file came from. What if somebody sent you a bad executable? Do you try to correct it? What if they send a bad MSFT Word file? Do you try to correct it? Makefiles with spaces instead of tabs? crontab files with a missing column? etc. etc. Well, the same for XML. If it is bad, then you ask for a correct one. Why should XML be any different than the multitude of documents that you deal with every day? > For my custom work, it would be best to have a parser which > *does* complain about an error, but also repairs easy cases > like this. This gives me a chance to work with the file, > inspect it and complain to my customer. > This is easy after all since I now know enough of > the XML package and can help myself. By default, it should not correct it. That simply continues to encourage poor XML authoring. As a programmer, if you want to try to auto-correct, then okay, but I would not recommend it. > The remaining qeustion is: How should faulty XML be handled > at all? There are enough examples where you cannot simply > reject the document. You need to read it. > Does it make sense to think of a "correcting" > parser which turns a bad document into something well-formed > which can be inspected with an XML browser, together with > some error-annotation tags? No. No. No. No.... HTML is a huge mess because people started writing parsers that were flexible and would correct things for you. Go try to write an HTML parser that works against all the stuff out on the Internet. It is frightening how difficult that is. There is just so much crap out there because people said, "well, we can just correct that for them." Mismatched tags. Missing quotes. Illegal characters. Missing close brackets. Simply crap. With XML, the designers said, "No way. The document has to be correct, or it gets rejected. Tough shit for the authors of bad documents." Yes, I'm rather fascist on this one :-). I simply cannot condone or recommend *any* allowance of flexibility in parsers. That will just lead us back to the horrible situation that we are in now with HTML. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tismer@appliedbiometrics.com Sun Jan 24 13:22:55 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sun, 24 Jan 1999 14:22:55 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) References: Message-ID: <36AB1EAF.7F0273E5@appliedbiometrics.com> Greg Stein wrote: [but the file was from my course, and I'm correcting their homework] > Push back against where the file came from. What if somebody sent you a > bad executable? Do you try to correct it? What if they send a bad MSFT > Word file? Do you try to correct it? Makefiles with spaces instead of > tabs? crontab files with a missing column? etc. etc. :-) Of course, I usually don't correct them. No exes. Word files: Sometimes, if they come to me, whining about their single copy of a Word file which is broke. I can give them the plain text back in most cases, and this is ok. > Well, the same for XML. If it is bad, then you ask for a correct one. Why > should XML be any different than the multitude of documents that you deal > with every day? I'd say, since XML is not binary but very redundant ascii which I can read, and also most often understand and correct by hand, it is not so simple. You could also throw a faulty C program away since ti is no proper C. Instead, I correct it. Well, this was a bit far off, but somewhere between is the truth. ... > By default, it should not correct it. That simply continues to encourage > poor XML authoring. As a programmer, if you want to try to auto-correct, > then okay, but I would not recommend it. 150% agreed. [correcting parser] > No. No. No. No.... > > HTML is a huge mess because people started writing parsers that were > flexible and would correct things for you. Go try to write an HTML parser > that works against all the stuff out on the Internet. It is frightening > how difficult that is. There is just so much crap out there because people > said, "well, we can just correct that for them." Mismatched tags. Missing > quotes. Illegal characters. Missing close brackets. Simply crap. Yes, I also don't want this again. You are right. > With XML, the designers said, "No way. The document has to be correct, or > it gets rejected. Tough shit for the authors of bad documents." > > Yes, I'm rather fascist on this one :-). I simply cannot condone or > recommend *any* allowance of flexibility in parsers. That will just lead > us back to the horrible situation that we are in now with HTML. Ok, let me name it different since my thought was different. I don't want bad XML to be corrected automatically. Instead, when it is rejected, I thought of generating a different document, say an "error document" which gives a description of the errors. This is a new (well-formed:) XML document which wraps the source, inserts comments or anything where the parsing broke, leaves correct passages intact so far, but of course does not try to produce correct XML from wrong XML. I'd apply this tool to a file after I know it is wrong, for debuging purposes. A little like a compiler listing. Maybe it would suffice to escape the wrong parts and add the XML error code and message to the error doc. This was my reason to write the little indenter - debugging. Thanks for your commitment, we're on the same side - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From tismer@appliedbiometrics.com Sun Jan 24 13:34:12 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sun, 24 Jan 1999 14:34:12 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP References: <006001be4796$54879230$f29b12c2@pythonware.com> Message-ID: <36AB2154.658633AD@appliedbiometrics.com> Fredrik Lundh wrote: > > >A general question (comes up because I had to hack my Indenter > >especially for sgmlop): > >Is a SAX parser required to report ignorableWHitespace events? > >Or is it also allowed to never call this method, as sgmlop does? > >If so, then the interface doesn't make too much sense since I have > >to collect all data and handle whitespace when the next tag appears. > > If I understand things correctly, sgmlop cannot figure > out what's ignorable and not; you need to have access > to the DTD to handle that. Well, I understand. Lars also mentioned that without a DTD and a parser which understands it, this event is useless. > Our internal xml libraries allows the user to indicate > whether a resource is "xml text" or "xml data". the > latter doesn't allow elements to contain both text > and other elements, which means that it's easy to > figure out what to ignore. That sounds good, this is exactly what we need to distinguish, too. How do you indicate this without a DTD? A list of tags which are treated as raw data? (kind of a sub-sub-DTD?) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From fredrik@pythonware.com Sun Jan 24 13:50:02 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sun, 24 Jan 1999 14:50:02 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP Message-ID: <003f01be47a0$71978f10$f29b12c2@pythonware.com> >> Our internal xml libraries allows the user to indicate >> whether a resource is "xml text" or "xml data". the >> latter doesn't allow elements to contain both text >> and other elements, which means that it's easy to >> figure out what to ignore. > >That sounds good, this is exactly what we need to distinguish, >too. How do you indicate this without a DTD? the caller must tell the library what to do based on his/her knowledge of the DTD in question. (in my experience, most data-oriented DTD's are "xml data" in the sense that values are only stored in leaf elements. That's definitely true for every- thing we design). Cheers /F fredrik@pythonware.com http://www.pythonware.com From digitome@iol.ie Sun Jan 24 14:15:52 1999 From: digitome@iol.ie (Sean Mc Grath) Date: Sun, 24 Jan 1999 14:15:52 +0000 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: References: <36AB121F.7620E5D@appliedbiometrics.com> Message-ID: <3.0.6.32.19990124141552.009262f0@gpo.iol.ie> [Greg Stein] > >Push back against where the file came from. What if somebody sent you a >bad executable? Do you try to correct it? What if they send a bad MSFT >Word file? Do you try to correct it? Makefiles with spaces instead of >tabs? crontab files with a missing column? etc. etc. > >Well, the same for XML. If it is bad, then you ask for a correct one. Why >should XML be any different than the multitude of documents that you deal >with every day? > Some "document" types such as C++ source code for example benefit, in my opinion, from error recovery parsing. Nobody wants a C++ compiler to generate executable code in the face of errors but getting a listing of more than one error increases your chances of fixing more than one error in a single edit-compile cycle. I belive an analogy with XML here is valid. In production use, it makes total sense for an XML parser to stop stone dead on error. For development use, an XL parser that can recover from certain types of error is a darned useful thing. To give a concrete example, an XML parser with optional error recovery would be wonderful for XML up-translation work. There are many occasions when you have automated the creation of pseudo-XML and you want to cut code to get it the rest of the way to full XML. Stop dead parsers are useless for this type of work. So, I would like to see xmlproc having some optional error recovery functionality that I could turn on for up-translation parsing. I realize that this is a contentious opinion:-) From larsga@ifi.uio.no Sun Jan 24 15:00:32 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 24 Jan 1999 16:00:32 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP In-Reply-To: <36AB2154.658633AD@appliedbiometrics.com> References: <006001be4796$54879230$f29b12c2@pythonware.com> <36AB2154.658633AD@appliedbiometrics.com> Message-ID: * Christian Tismer | | [ignorableWhitespace] | | Well, I understand. Lars also mentioned that without a DTD and a | parser which understands it, this event is useless. Not useless, just impossible to fire as distinguished from the characters event. * Fredrik Lundh | | Our internal xml libraries allows the user to indicate whether a | resource is "xml text" or "xml data". the latter doesn't allow | elements to contain both text and other elements, which means that | it's easy to figure out what to ignore. This sounds like a good approach to me. The XML recommendation (sensibly) requires parsers to report all whitespace to the application, but an application-specific layer on top of that sounds good to me. * Christian Tismer | | That sounds good, this is exactly what we need to distinguish, | too. How do you indicate this without a DTD? A list of tags which | are treated as raw data? (kind of a sub-sub-DTD?) Why not make a simple SAX parser filter that reads in such a list of element type names and then filters characters events into characters and ignorableWhitespace, possibly also doing whitespace normalization? Sounds like something that is both simple to develop and eminently reusable. --Lars M. From tismer@appliedbiometrics.com Sun Jan 24 15:18:55 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sun, 24 Jan 1999 16:18:55 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP References: <003f01be47a0$71978f10$f29b12c2@pythonware.com> Message-ID: <36AB39DF.9AB57131@appliedbiometrics.com> Fredrik, Playing a little more with sgmlop, I realized that it doesn't resolve entities when run under SAX. What's the problem? Is there any but the necessary time? Should I try to add this, or forget about SAX and use sgmlop directly? I'm still very happy with this and would like to work on it, but need advice. If no entityresolver is defined, should'nt the standard entities < > & be resolved internally? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From tismer@appliedbiometrics.com Sun Jan 24 15:30:45 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sun, 24 Jan 1999 16:30:45 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP References: <006001be4796$54879230$f29b12c2@pythonware.com> <36AB2154.658633AD@appliedbiometrics.com> Message-ID: <36AB3CA5.8D95AAD3@appliedbiometrics.com> Lars Marius Garshol wrote: > > * Christian Tismer > | > | [ignorableWhitespace] > | > | Well, I understand. Lars also mentioned that without a DTD and a > | parser which understands it, this event is useless. > > Not useless, just impossible to fire as distinguished from the > characters event. But after all, I'm baffled. I got whitespace events when I didn't specify the parser. It was using xmlproc as it looks like. xmlproc reported whitespace to me I think between a closing tag of a sublevel, before the next closing tag. I.E between these I got witespace, ignored it and handled my own indentation, and everything looked pretty. Is this correct behavior, then? ... > Why not make a simple SAX parser filter that reads in such a list of > element type names and then filters characters events into characters > and ignorableWhitespace, possibly also doing whitespace normalization? > > Sounds like something that is both simple to develop and eminently > reusable. Well, good idea. For many simple data applications, it makes also sense to simply default to keep whitespace at leaf nodes, as Fredrik pointed out. But before, I have to understand that topic above :-) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From paul@prescod.net Sun Jan 24 16:25:26 1999 From: paul@prescod.net (Paul Prescod) Date: Sun, 24 Jan 1999 10:25:26 -0600 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> <199901222144.WAA00911@lindm.dm> <36A9E4FF.76B8E3D5@appliedbiometrics.com> <36AA6DF4.662C7ED5@lyra.org> Message-ID: <36AB4976.1C0983CA@prescod.net> Lars Marius Garshol wrote: > > The error is reported properly as it is and the attribute is not > repaired, but subsequent data events are wrong. That's now fixed (data > events, not the attribute), but the question remains whether the > parser should follow the XML recommendation and stop reporting data > events after a well-formedness bug. > > I'm inclined to make that default behaviour, but behaviour it is > possible to turn off. Opinions are welcome. I think that optional error recovery is a good idea. There are legitimate uses for it and also the potential for serious abuse. If I ever used an XML editor that refused to load half of a document because of missing quotes I would dump it Pretty Damn Quick. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "You have the wrong number." "Eh? Isn't that the Odeon?" "No, this is the Great Theater of Life. Admission is free, but the taxation is mortal. You come when you can, and leave when you must. The show is continuous. Good-night." -- Robertson Davies, "The Cunning Man" From larsga@ifi.uio.no Sun Jan 24 17:17:08 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 24 Jan 1999 18:17:08 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP In-Reply-To: <36AB39DF.9AB57131@appliedbiometrics.com> References: <003f01be47a0$71978f10$f29b12c2@pythonware.com> <36AB39DF.9AB57131@appliedbiometrics.com> Message-ID: * Christian Tismer | | Playing a little more with sgmlop, I realized that it doesn't | resolve entities when run under SAX. | | [...] Should I try to add this, or forget about SAX and use sgmlop | directly? If it's possible, I'd very much like either you or me to add it to the driver. As far as I can see one must set a handle_entity handler that does this somehow. Don't know the exact details, though. | If no entityresolver is defined, should'nt the standard entities | < > & be resolved internally? Yes. This is part of the XML recommendation. However, EntityResolver is only used for external entities, not internal ones. --Lars M. From tismer@appliedbiometrics.com Sun Jan 24 18:12:06 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Sun, 24 Jan 1999 19:12:06 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP References: <003f01be47a0$71978f10$f29b12c2@pythonware.com> <36AB39DF.9AB57131@appliedbiometrics.com> Message-ID: <36AB6276.6098AD89@appliedbiometrics.com> Lars Marius Garshol wrote: > > * Christian Tismer > | > | Playing a little more with sgmlop, I realized that it doesn't > | resolve entities when run under SAX. > | > | [...] Should I try to add this, or forget about SAX and use sgmlop > | directly? > > If it's possible, I'd very much like either you or me to add it to the > driver. As far as I can see one must set a handle_entity handler that > does this somehow. Don't know the exact details, though. Fredrik handled this different, he has an extra mode for SAX where he does not use his callback for entities. I have no idea why, must wait for his answer. > | If no entityresolver is defined, should'nt the standard entities > | < > & be resolved internally? > > Yes. This is part of the XML recommendation. However, EntityResolver > is only used for external entities, not internal ones. Aha! And sgmlop didn't do this, so that's the reason why I got &lt in my attributes which contained "<" encoded as < So this is funny: If I just do some reformatting and juggling, the process is this: The parser gives me characters and tags and entities and whatsoever, strips the encodings off, and I have to insert them back. What a mess. It appears to me that XML parsers are already doing quite much, also in cases where I don't need it. In my case, I would have been comfortable with kinda XML scanner which just recognizes tokens, makes no attempt to resolve anything, to parse and reorder the parameters (which is ok but I hate it) and gives the plain text to me. From that point of view, my basic simple parser building block would something which can correctly recognize tags and doesn't change anything, just give me indices into the text. Marc Lemburg's tagging engine springs into mind... Anyway, if sgmlop doesn't resolve external entities but handles the standards internally, this is ok with me. Again, I need advice form /F. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From larsga@ifi.uio.no Sun Jan 24 20:48:39 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 24 Jan 1999 21:48:39 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: <36AB4976.1C0983CA@prescod.net> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A871E6.89033191@appliedbiometrics.com> <199901222144.WAA00911@lindm.dm> <36A9E4FF.76B8E3D5@appliedbiometrics.com> <36AA6DF4.662C7ED5@lyra.org> <36AB4976.1C0983CA@prescod.net> Message-ID: * Lars Marius Garshol wrote: | | The error is reported properly as it is and the attribute is not | repaired, but subsequent data events are wrong. That's now fixed | (data events, not the attribute), but the question remains whether | the parser should follow the XML recommendation and stop reporting | data events after a well-formedness bug. | | I'm inclined to make that default behaviour, but behaviour it is | possible to turn off. Opinions are welcome. Since Christian, Greg, Paul and Sean all seem to be in agreement that this is a good idea I've now made this change. It will appear in 0.60 together with a lot of other stuff. --Lars M. From Jack.Jansen@cwi.nl Sun Jan 24 20:55:26 1999 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Sun, 24 Jan 1999 21:55:26 +0100 Subject: [XML-SIG] Big Bug? (was:Pretty-printing DOM trees) In-Reply-To: Message by Lars Marius Garshol , 24 Jan 1999 12:28:17 +0100 , Message-ID: Recently, Lars Marius Garshol said: > The error is reported properly as it is and the attribute is not > repaired, but subsequent data events are wrong. That's now fixed (data > events, not the attribute), but the question remains whether the > parser should follow the XML recommendation and stop reporting data > events after a well-formedness bug. > > I'm inclined to make that default behaviour, but behaviour it is > possible to turn off. Opinions are welcome. This sounds like the right way to go. Most applications should stop on non-well-formed documents, but there are definitely applications that should be able to continue (like applications that try to repair documents). It would be a bit silly to have to hand-craft code for these if it could be an optional feature of the standard parser. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@cwi.nl | ++++ if you agree copy these lines to your sig ++++ http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From uche.ogbuji@fourthought.com Sun Jan 24 21:06:36 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 24 Jan 1999 14:06:36 -0700 Subject: [XML-SIG] xmlproc, SAX and EntityResolver Message-ID: <199901242106.OAA00826@malatesta.local> According to the (Java) SAX docs, """ public interface EntityResolver Basic interface for resolving entities. If a SAX application needs to implement customized handling for external entities, it must implement this interface and register an instance with the SAX parser using the parser's setEntityResolver method. The parser will then allow the application to intercept any external entities (including the external DTD subset and external parameter entities, if any) before including them. """ And this is how the xmlproc in xml-0.4 used to work. If I implemented entityResolver in a handler, and registered it, I'd get the entity events for the external DTD declaration as well as any other entities declared. This no longer appears to work in xml-0.5. Unfortunately, my current code it pretty complex, and I first of all want to make sure this wasn't an intentional change. I'm pretty sure I've narrowed it to xmlproc, but if I'm told this should _not_ be so, I'll work on a stripped-down test-case. Thanks. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From larsga@ifi.uio.no Sun Jan 24 21:14:23 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 24 Jan 1999 22:14:23 +0100 Subject: [XML-SIG] Re: xmlproc, SAX and EntityResolver In-Reply-To: <199901242106.OAA00826@malatesta.local> References: <199901242106.OAA00826@malatesta.local> Message-ID: * uche ogbuji | | [reporting of external DTD subset and external parameter entities] | | This no longer appears to work in xml-0.5. Unfortunately, my | current code it pretty complex, and I first of all want to make sure | this wasn't an intentional change. I'm pretty sure I've narrowed it | to xmlproc, but if I'm told this should _not_ be so, I'll work on a | stripped-down test-case. As I recall, you fixed the reporting of the external DTD subset in xmlproc (the version in xml-0.5). However, you didn't do it correctly, so in my development code I have the correct patch (which is not released yet). Adding this to drv_xmlproc.py should do the trick for the external subset: def resolve_doctype_pubid(self,pubid,sysid): return self.ent_handler.resolveEntity(pubid,sysid) This will not affect external parameter entities. If you need those as well, let me know. They will be reported by 0.60, but that may still be a couple of weeks into the future. A worse problem is that, as Paul pointed out, Python SAX EntityResolvers return the system identifier of the entity rather than an object from which the entity contents can be read. I intend to fix this when we release SAX 2.0 unless someone screams loudly. --Lars M. From uche.ogbuji@fourthought.com Sun Jan 24 22:24:19 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 24 Jan 1999 15:24:19 -0700 Subject: [XML-SIG] Re: xmlproc, SAX and EntityResolver In-Reply-To: Your message of "24 Jan 1999 22:14:23 +0100." Message-ID: <199901242224.PAA00906@malatesta.local> > > * uche ogbuji > | > | [reporting of external DTD subset and external parameter entities] > | > | This no longer appears to work in xml-0.5. Unfortunately, my > | current code it pretty complex, and I first of all want to make sure > | this wasn't an intentional change. I'm pretty sure I've narrowed it > | to xmlproc, but if I'm told this should _not_ be so, I'll work on a > | stripped-down test-case. > > As I recall, you fixed the reporting of the external DTD subset in > xmlproc (the version in xml-0.5). However, you didn't do it correctly, > so in my development code I have the correct patch (which is not > released yet). Oh yeah. I actually fixed this for xml-0.4, but it was so long ago that I forgot. I recently discovered that I had the wrong sym-link, and I've been using xml-0.4 instead of xml-0.5, even after installing the latter. So when I fixed the link and started using xml-0.5, my patch to report the external DTD subset wasn't there. Duh! You'd mentioned before that this patch of mine was "wrong", but I didn' know how to do it the "right" way, so thanks for the code snippet below. > Adding this to drv_xmlproc.py should do the trick for the external > subset: > > def resolve_doctype_pubid(self,pubid,sysid): > return self.ent_handler.resolveEntity(pubid,sysid) Unfortunately, on preliminary testing it doesn't appear to work. I'll work on an isolated test case and get back to you. > This will not affect external parameter entities. If you need those as > well, let me know. They will be reported by 0.60, but that may still > be a couple of weeks into the future. It looks as if 0.60 will be very helpful to me when it's released. > A worse problem is that, as Paul pointed out, Python SAX > EntityResolvers return the system identifier of the entity rather than > an object from which the entity contents can be read. I intend to fix > this when we release SAX 2.0 unless someone screams loudly. For my selfish purposes (constructing DOM trees from SAX events), this doesn't affect me, so it's okay with me to wait until SAX 2.0. The only thing I'd mention is that discussion of SAX 2.0 on XML-DEV appears to be going at a pretty deliberate pace (a good thing!), and so 2.0 might be a ways off. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From larsga@ifi.uio.no Sun Jan 24 22:35:36 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 24 Jan 1999 23:35:36 +0100 Subject: [XML-SIG] Re: xmlproc, SAX and EntityResolver In-Reply-To: <199901242224.PAA00906@malatesta.local> References: <199901242224.PAA00906@malatesta.local> Message-ID: * uche ogbuji | | Oh yeah. I actually fixed this for xml-0.4, but it was so long ago | that I forgot. I recently discovered that I had the wrong sym-link, | and I've been using xml-0.4 instead of xml-0.5, even after | installing the latter. So when I fixed the link and started using | xml-0.5, my patch to report the external DTD subset wasn't there. | Duh! Ah, that explains it. | Unfortunately, on preliminary testing it doesn't appear to work. | I'll work on an isolated test case and get back to you. Arh! I'm getting confused by all the different versions here. Sorry, you need this in xmlproc.py as well to have it call that method (just replace the existing method with this, hopefully this does not depend on other changes): def parse_doctype(self): "Parses the document type declaration." if self.seen_doctype: self.report_error(3032) if self.seen_root: self.report_error(3033) self.skip_ws(1) rootname=self._get_name() self.skip_ws(1) (pub_id,sys_id)=self.parse_external_id() self.skip_ws() if self.now_at("["): self.parse_internal_dtd() elif not self.now_at(">"): self.report_error(3005,">") # External subset must be parsed _after_ the internal one if pub_id!=None or sys_id!=None: # Was there an external id at all? sys_id=self.pubres.resolve_doctype_pubid(pub_id,sys_id) self.app.handle_doctype(rootname,pub_id,sys_id) self.dtd.prepare_for_parsing() self.seen_doctype=1 # Has to be at the end to avoid block trouble | It looks as if 0.60 will be very helpful to me when it's released. Hopefully it will be to a lot of people. And after I decided to delay DDML support (the standard previously known as XSchema) and DTD caching to 0.61 what remains is mainly upgrading the regression test, documentation, running test etc Just the mechanics of getting out a new version. | The only thing I'd mention is that discussion of SAX 2.0 on XML-DEV | appears to be going at a pretty deliberate pace (a good thing!), and | so 2.0 might be a ways off. It may, yes. I guess it all depends on David Megginson and how much time he can devote to this. If the discussion turns out to take too long we'll just have to put out a SAX 1.1 in the meantime. --Lars M. From larsga@ifi.uio.no Sun Jan 24 22:59:47 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 24 Jan 1999 23:59:47 +0100 Subject: [XML-SIG] XSA 1.0 specification released Message-ID: XSA is an XML-based system that allows anyone who is interested to automatically discover new versions of software products as they are released by polling XML documents describing the products. It is mainly intended to help software index maintainers keep their indexes up to date. I have now finalized the XSA 1.0 specification and XSA is thus ready for use. The accompanying software is still being tested, but will be released as soon as possible, probably in a week or so. I will announce it here when it is ready. What this means is that we now have an XML application for publishing structured information on the web that is ready for use. I am using it (via a cron job on my Linux machine) to keep track of new releases on my XML tools list[1], and I'm confident that other software list maintainers will start using the system as well once I release the software. So, to all you developers of XML software: please make yourself an XSA document and publish it on the web. That way we can both keep the software indexes updated and demonstrate that XML can actually be used. The more people who do this, the more useful the system will be. The XSA site contains both a wizard for making documents, an online validator and a form for registering new XSA documents. --Lars M. [1] From uche.ogbuji@fourthought.com Sun Jan 24 23:48:41 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 24 Jan 1999 16:48:41 -0700 Subject: [XML-SIG] Re: xmlproc, SAX and EntityResolver In-Reply-To: Your message of "24 Jan 1999 23:35:36 +0100." Message-ID: <199901242348.QAA01110@malatesta.local> This is a multipart MIME message. --==_Exmh_-20107729380 Content-Type: text/plain; charset=us-ascii > | Unfortunately, on preliminary testing it doesn't appear to work. > | I'll work on an isolated test case and get back to you. > > Arh! I'm getting confused by all the different versions here. Sorry, > you need this in xmlproc.py as well to have it call that method (just > replace the existing method with this, hopefully this does not depend > on other changes): Thanks, but maybe there is yet a dependency, because it still doesn't work. Here's a small test case. The SAX app, xml file and DTD are attached. I get results from the startElement (of course), but not from any of the other events. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org --==_Exmh_-20107729380 Content-Type: text/plain ; name="addr_book.dtd"; charset=us-ascii Content-Description: addr_book.dtd Content-Disposition: attachment; filename="addr_book.dtd" --==_Exmh_-20107729380 Content-Type: text/plain ; name="addr_book1.xml"; charset=us-ascii Content-Description: addr_book1.xml Content-Disposition: attachment; filename="addr_book1.xml" ]> Pieter Aaron
404 Error Way
404-555-1234 404-555-4321 404-555-5555 pieter.aaron@inter.net
Emeka Ndubuisi
42 Spam Blvd
767-555-7676 767-555-7642 800-SKY-PAGEx767676 endubuisi@spamtron.com
Vasia Zhugenev
2000 Disaster Plaza
000-987-6543 000-000-0000 vxz@magog.ru
--==_Exmh_-20107729380 Content-Type: text/plain; name="test_doctype.py"; charset=us-ascii Content-Description: test_doctype.py Content-Disposition: attachment; filename="test_doctype.py" Content-Transfer-Encoding: quoted-printable import sys from xml.sax import saxlib, saxexts, drivers class test_doctype(saxlib.HandlerBase): def unparsedEntityDecl (self, publicId, systemId, notationName): print "unparsedEntityDecl", publicId, systemId, notationName def resolveEntity (self, name, publicId, systemId): print "entity", name, publicId, systemId def startElement(self, name, attribs): print "element", name, attribs def warning(self, exception): raise exception def error(self, exception): raise exception def fatalError(self, exception): raise exception if __name__ =3D=3D "__main__": parser =3D saxexts.XMLValParserFactory.make_parser() handler =3D test_doctype() parser.setDocumentHandler(handler) parser.setDTDHandler(handler) parser.setEntityResolver(handler) parser.setErrorHandler(handler) parser.parseFile(open("addr_book1.xml")) --==_Exmh_-20107729380-- From uche.ogbuji@fourthought.com Sun Jan 24 23:53:21 1999 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 24 Jan 1999 16:53:21 -0700 Subject: [XML-SIG] xmlproc and parameter entities in external DTD subsets Message-ID: <199901242353.QAA01124@malatesta.local> I know that xmlproc 0.52 doesn't support parameter entities in external DTD subsets within declarations yet, but is there a chance that they will supported in 0.6? We are working with the xsl.dtd, and it requires _many_ parameter entities to avoid being of near-infinite length. Thanks. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org From larsga@ifi.uio.no Sun Jan 24 23:53:20 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 25 Jan 1999 00:53:20 +0100 Subject: [XML-SIG] Re: xmlproc and parameter entities in external DTD subsets In-Reply-To: <199901242353.QAA01124@malatesta.local> References: <199901242353.QAA01124@malatesta.local> Message-ID: * uche ogbuji | | I know that xmlproc 0.52 doesn't support parameter entities in | external DTD subsets within declarations yet, but is there a chance | that they will supported in 0.6? They are already. :) I have implemented this, and it works. Better testing remains, but, yes, I am confident that this will be in 0.6. | We are working with the xsl.dtd, and it requires _many_ parameter | entities to avoid being of near-infinite length. Hmmm. Well, if you want to be a beta tester on xmlproc, just send me an email and I'll put out a zip of my current version with the current SAX driver. (That won't happen until tomorrow, though, since I'm going to sleep now.) --Lars M. From Fred L. Drake, Jr." References: <00b701be46c6$40ea7a10$f29b12c2@pythonware.com> Message-ID: <13996.34860.568582.467756@weyr.cnri.reston.va.us> Fredrik Lundh writes: > being ignored by the sgmllib and xmllib maintainers, so keeping > things in sync is pretty hard. Not ignored; I for one am simply swamped with some other concerns at the moment. I plan to update sgmllib when I can, I just can't promise when that will be. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From wunder@infoseek.com Mon Jan 25 18:17:09 1999 From: wunder@infoseek.com (Walter Underwood) Date: Mon, 25 Jan 1999 10:17:09 -0800 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP In-Reply-To: <36A9EE42.78F166D5@appliedbiometrics.com> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> <36A8DF4E.2D3852D7@appliedbiometrics.com> Message-ID: <3.0.5.32.19990125101709.00ca7580@corp> At 04:44 PM 1/23/99 +0100, Christian Tismer wrote: >What I need to find is the fastest acceptable parser which allows >me to turn masses of XML data into Python structures. [...] we are >processing XML encoded database records which are quite irregular >(useless to use a relational database) and quite simple, but the >standard size is some 50MB. This is why I'm after speed, much more than >conformance. I'm using pyexpat for the XML support in our search engine. At this point in development, I'm collecting text and associating it with *every* enclosing element. So this is worst-case for parsing time. Parsing Jon Bosak's tagged "Old Testament" (3.4 megabytes) takes 30 seconds. That document is pretty heavily tagged, with an element for each verse, each chapter, each book, the body, etc. Collecting less information would probably be faster. If you need a lot more speed than this (integer factors faster) you might need to do all the parsing in C. Remember that there is a difference between a paser that implements all of XML and a parser that extracts the data you need from your XML documents. If you can trust the documents to be legal (perhaps they are checked when generated), then a hard-coded parser may be the answer. wunder Walter R. Underwood wunder@infoseek.com wunder@best.com (home) http://www.best.com/~wunder/ 1-408-543-6946 From tismer@appliedbiometrics.com Mon Jan 25 20:50:30 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Mon, 25 Jan 1999 21:50:30 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> <36A8DF4E.2D3852D7@appliedbiometrics.com> <3.0.5.32.19990125101709.00ca7580@corp> Message-ID: <36ACD916.2F97E1FF@appliedbiometrics.com> Walter Underwood wrote: > > At 04:44 PM 1/23/99 +0100, Christian Tismer wrote: > >What I need to find is the fastest acceptable parser which allows > >me to turn masses of XML data into Python structures. [...] we are > >processing XML encoded database records which are quite irregular > >(useless to use a relational database) and quite simple, but the > >standard size is some 50MB. This is why I'm after speed, much more than > >conformance. > > I'm using pyexpat for the XML support in our search engine. > At this point in development, I'm collecting text and associating > it with *every* enclosing element. So this is worst-case for > parsing time. > > Parsing Jon Bosak's tagged "Old Testament" (3.4 megabytes) takes > 30 seconds. That document is pretty heavily tagged, with an element > for each verse, each chapter, each book, the body, etc. > > Collecting less information would probably be faster. Interesting. I tested my Indenter with this file (what a nice example). It takes 11.75 seconds to indent this through SAX, using sgmlop. With xmlproc, it takes 30.87 seconds. Running the whole text through sgmlop without any associated events ran in below one second. > If you need a lot more speed than this (integer factors faster) > you might need to do all the parsing in C. Remember that there > is a difference between a paser that implements all of XML and > a parser that extracts the data you need from your XML documents. > If you can trust the documents to be legal (perhaps they are > checked when generated), then a hard-coded parser may be the > answer. Well, both is true. I want to validate small amounts of newly added data "records" which are in XML format, but then kept in a special repository, and I want to be able to re-import large amounts of XML which were exported by my app before. This means, I need a validating parser of acceptable speed, where I think xmlproc is very good? And I need something that simply eats large amounts of approved data. But I won't go so far to code this all in C since these imports will not be so frequent. I would even prefer to do it all in Python if possible. There are also cases where even sgmlop does much more than I need. There are applications where I just want to know where the tags start and end, and I don't want substitutions, no parsing and reordering of parameters, just to be able to juggle with unmodified pieces of XML. Therefore I proposed an XML scanner which just provides the tools to build up what you actually need. Maybe I overlooked it and we have that already somewhere. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From larsga@ifi.uio.no Mon Jan 25 21:18:29 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 25 Jan 1999 22:18:29 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP In-Reply-To: <36ACD916.2F97E1FF@appliedbiometrics.com> References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> <36A8DF4E.2D3852D7@appliedbiometrics.com> <3.0.5.32.19990125101709.00ca7580@corp> <36ACD916.2F97E1FF@appliedbiometrics.com> Message-ID: * Christian Tismer | | [About ot.xml] | | Interesting. I tested my Indenter with this file (what a nice | example). A rather misleading one, I'm afraid, since it doesn't use entities, comments, PIs, marked sections or attributes, only elements and PCDATA. | It takes 11.75 seconds to indent this through SAX, using sgmlop. | With xmlproc, it takes 30.87 seconds. Interesting. (And pleasing. :) | Running the whole text through sgmlop without any associated events | ran in below one second. It's worth noting that this is just the time for the raw parse. As far as I know, sgmlop will not call handlers if there aren't any and so this entire second will be spent in C source. | I want to validate small amounts of newly added data "records" which | are in XML format, but then kept in a special repository, and I want | to be able to re-import large amounts of XML which were exported by | my app before. This means, I need a validating parser of acceptable | speed, where I think xmlproc is very good? I think the Java parsers are probably faster, but xmlproc should be acceptable, yes. When I release 0.60 the DTD parser and DTD objects are separated from the XML parser. This means that provided you can get the external and internal DTD subsets from expat it's possible to build an expat-based validator using the xmlproc sources. This will require a bit of work, though. With DTD caching (scheduled for 0.61 in my current plans) you won't have to keep reparsing the DTD for each document either, thus saving even more speed. (Parse times for large DTDs such as TEI-XML take substantial amounts of time.) --Lars M. From tismer@appliedbiometrics.com Mon Jan 25 21:58:04 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Mon, 25 Jan 1999 22:58:04 +0100 Subject: [XML-SIG] SAX prettyprinter V2 and SGMLOP References: <199901210345.WAA29899@207-172-49-200.s200.tnt14.ann.erols.com> <36A8BFED.BE6C3EF6@appliedbiometrics.com> <36A8DF4E.2D3852D7@appliedbiometrics.com> <3.0.5.32.19990125101709.00ca7580@corp> <36ACD916.2F97E1FF@appliedbiometrics.com> Message-ID: <36ACE8EC.6130BD08@appliedbiometrics.com> Lars Marius Garshol wrote: > > * Christian Tismer > | > | [About ot.xml] > | > | Interesting. I tested my Indenter with this file (what a nice > | example). > > A rather misleading one, I'm afraid, since it doesn't use entities, > comments, PIs, marked sections or attributes, only elements and > PCDATA. Right, very simple. > | It takes 11.75 seconds to indent this through SAX, using sgmlop. > | With xmlproc, it takes 30.87 seconds. > > Interesting. (And pleasing. :) And then I wrote a simple plain vanilla indenter in pure Python which does the same in 5 seconds. Just splitting away, finding tags correctly, counting levels, and doing nothing else at all. I think this will not become much faster by using sgmlop, so the test which you mentioned a while ago is obsolete. 5 seconds is the need for indentation, the rest is gymnastics which is useless in this case. > | Running the whole text through sgmlop without any associated events > | ran in below one second. > > It's worth noting that this is just the time for the raw parse. As far > as I know, sgmlop will not call handlers if there aren't any and so > this entire second will be spent in C source. Right, this is the "naked" time. > | I want to validate small amounts of newly added data "records" which > | are in XML format, but then kept in a special repository, and I want > | to be able to re-import large amounts of XML which were exported by > | my app before. This means, I need a validating parser of acceptable > | speed, where I think xmlproc is very good? > > I think the Java parsers are probably faster, but xmlproc should be > acceptable, yes. > > When I release 0.60 the DTD parser and DTD objects are separated from > the XML parser. This means that provided you can get the external and > internal DTD subsets from expat it's possible to build an expat-based > validator using the xmlproc sources. This will require a bit of work, > though. > > With DTD caching (scheduled for 0.61 in my current plans) you won't > have to keep reparsing the DTD for each document either, thus saving > even more speed. (Parse times for large DTDs such as TEI-XML take > substantial amounts of time.) I'm happy to hear this. cheers - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From jae@kavi.com Tue Jan 26 02:12:40 1999 From: jae@kavi.com (John Eikenberry) Date: Mon, 25 Jan 1999 18:12:40 -0800 (PST) Subject: [XML-SIG] Bug or Delusion Message-ID: Hello, I'm in the process of writing my first DTD, and am having bit of a problem. I'm attempting to create valueless attributes (like
in html). Now my XML book has this statement: For an XML document to be valid, whenever an element type with an #IMPLIED attribute appears and does not have a value, the XML procesor must report the missing value and continue processing. In addition, in the ibtwsh.dtd (Itsy Bitsy Teeny Weeny Simple Hypertext DTD), they have the 'compact' attribute defined like this: When I try something like this in my DTD... And run the xvcmd over a test xml document. I get these errors: xmysql.xml:4:10: Document root element 'package' does not match declared root element xmysql.xml:40:9: '=' expected xmysql.xml:40:11: One of '>' or '/>' expected Parse complete, 3 error(s) and 0 warning(s) The first error I've been getting, and just haven't gotten around to tracking it down (the package element seems fine to me... but I don't think this is relevant to the problem at hand). Are these errors the systems way of reporting the missing value (as the paragraph from my book states)? I thought that errors were fatal, and things to be avoided. I was expecting mabey a warning. BTW, Here's line 40: This seems to either be a mistake in xmlproc, or I'm not understanding this very well (probably the latter). If this is a mistake on my part, I'd appreciate any tips/advice. Thanks, --- John Eikenberry [jae@taos.kavi.com - http://taos.kavi.com/~jae/] ______________________________________________________________ "A society that will trade a little liberty for a little order will deserve neither and lose both." --B. Franklin From hiren@infoseek.com Tue Jan 26 04:42:59 1999 From: hiren@infoseek.com (Hirendra Hindocha) Date: Mon, 25 Jan 1999 20:42:59 -0800 (PST) Subject: [XML-SIG] ampersand in name how to parse Message-ID: Hi, I've just started working with the xml package and I was trying to parse a document which looks like this - the & in the name above seems to cause an exception . I have the following code fragment - class TaxonomyHandler(saxlib.DocumentHandler): def startElement(self, name, attrs): nodename = attrs['name'] id = attrs['id'] print nodename,id If I use the BaseHandler to inherit from , the second node is silently ignored. When I use the DocumentHandler as above, an exception is generated. If I drop the "&" then everything works. What do I need to do to be able to accept the & in the name ? Any help is appreciated, Thanks, Hiren -------------------------------------------------------- USER ERROR: replace user and press any key to continue. From larsga@ifi.uio.no Tue Jan 26 07:30:11 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 26 Jan 1999 08:30:11 +0100 Subject: [XML-SIG] ampersand in name how to parse In-Reply-To: References: Message-ID: * Hirendra Hindocha | | | | | | | the & in the name above seems to cause an exception . As it should, since the document above is not well-formed. (XML is much stricter than HTML.) | What do I need to do to be able to accept the & in the name ? Write it as & instead. :) --Lars M. From larsga@ifi.uio.no Tue Jan 26 07:36:48 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 26 Jan 1999 08:36:48 +0100 Subject: [XML-SIG] Bug or Delusion In-Reply-To: References: Message-ID: * John Eikenberry | | For an XML document to be valid, whenever an element type with an | #IMPLIED attribute appears and does not have a value, the XML procesor | must report the missing value and continue processing. This quote is rather misleading. What it's trying to say (or should be trying to say) is that #IMPLIED attributes are optional. Not the value, but the whole attribute. | In addition, in the ibtwsh.dtd (Itsy Bitsy Teeny Weeny Simple | Hypertext DTD), they have the 'compact' attribute defined like this: | | | | | | or
| And run the xvcmd over a test xml document. I get these errors: | | xmysql.xml:4:10: Document root element 'package' does not match declared | root element This means that you have ' or '/>' expected | Parse complete, 3 error(s) and 0 warning(s) | | Are these errors the systems way of reporting the missing value (as the | paragraph from my book states)? I thought that errors were fatal, and | things to be avoided. Not really, this is the systems way of reporting that your document is not well-formed. All XML attributes _must_ have a value if they are present in the start tag. The XML grammar shows this clearly: | This seems to either be a mistake in xmlproc, or I'm not | understanding this very well (probably the latter). If this is a | mistake on my part, I'd appreciate any tips/advice. Set instead and it will work. --Lars M. From jae@kavi.com Tue Jan 26 08:10:08 1999 From: jae@kavi.com (John Eikenberry) Date: Tue, 26 Jan 1999 00:10:08 -0800 (PST) Subject: [XML-SIG] Bug or Delusion In-Reply-To: Message-ID: On 26 Jan 1999, Lars Marius Garshol wrote: > | And run the xvcmd over a test xml document. I get these errors: > | > | xmysql.xml:4:10: Document root element 'package' does not match declared > | root element > > This means that you have > > > in your document. Cool. I thought you just needed this in the dtd. I'd been spending all my time trying to figure out the other problem. I hopefully would have figured this out after looking at the top of an xbel document. :) > Not really, this is the systems way of reporting that your document is > not well-formed. All XML attributes _must_ have a value if they are > present in the start tag. > > The XML grammar shows this clearly: > > Thanks for the clarification Lars. I guess I just assumed that you could reproduce html in xml, and therefor (assumed) there had to be a way to have a valueless attribute. Thanks again, --- John Eikenberry [jae@taos.kavi.com - http://taos.kavi.com/~jae/] ______________________________________________________________ "A society that will trade a little liberty for a little order will deserve neither and lose both." --B. Franklin From fredrik@pythonware.com Tue Jan 26 10:25:45 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 26 Jan 1999 11:25:45 +0100 Subject: [XML-SIG] Python 1.5.2b1's xmllib.py Considered Harmful Message-ID: <01f901be4916$3d3fc490$f29b12c2@pythonware.com> xmllib.py currently got a completely new interface in 1.5.2b1. The new interface silently breaks all existing implementations (it no longer calls start and end handlers), something that has caused us a LOT of trouble lately. For example, our highly successful xmlrpclib.py implementation doesn't work at all under 1.5.2b1. I hereby propose that the old implementation of xmllib.py should put back in Python 1.5.2 final, and that the new incompatible version is shipped under a new name (e.g. xmllib2). I don't mind if the old version is deprecated, just don't remove it from Python before 2.0. Regards /F fredrik@pythonware.com http://www.pythonware.com From fredrik@pythonware.com Tue Jan 26 11:45:32 1999 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 26 Jan 1999 12:45:32 +0100 Subject: [XML-SIG] Python 1.5.2b1's xmllib.py Considered Harmful Message-ID: <028401be4921$625727e0$f29b12c2@pythonware.com> >xmllib.py currently got a completely new interface in 1.5.2b1. duh. s/current/sudden/g /F From gherman@darwin.in-berlin.de Wed Jan 27 17:48:00 1999 From: gherman@darwin.in-berlin.de (Dinu C. Gherman) Date: Wed, 27 Jan 1999 18:48:00 +0100 Subject: [XML-SIG] XML package as RPM anywhere? Message-ID: <36AF5150.8E7A178D@darwin.in-berlin.de> Are the various versions of the XML add-ons distributed also in the popular RPM format? If so, where can the be found? It seems Oliver Andrich does not provide them. Thanks, Dinu -- Dinu C. Gherman : Mit Berlin kannste mir jagen! ................................................................ LHS International AG : http://www.lhsgroup.com 8050 Zurich : http://www.zurich.ch Switzerland : http://pgp.ai.mit.edu : mobile://49.172.3060751 From tismer@appliedbiometrics.com Thu Jan 28 08:29:43 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Thu, 28 Jan 1999 09:29:43 +0100 Subject: [XML-SIG] XML package as RPM anywhere? References: <36AF5150.8E7A178D@darwin.in-berlin.de> Message-ID: <36B01FF7.7E016A85@appliedbiometrics.com> Dinu C. Gherman wrote: > > Are the various versions of the XML add-ons distributed also > in the popular RPM format? If so, where can the be found? > It seems Oliver Andrich does not provide them. I don't think that this makes sense, already. The XML SIG has made great progress but is still very evolving. The current snapshot releases are very easy to install, since you just need to unpack the archive into a dir which is in the Python path, and you can import the modules instantly. If you want to follow the latest releases, I'd recommend to use CVS. RPM seems to be a little early. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From olli@rhein-zeitung.de Thu Jan 28 09:05:03 1999 From: olli@rhein-zeitung.de (Oliver Andrich) Date: Thu, 28 Jan 1999 10:05:03 +0100 Subject: [XML-SIG] XML package as RPM anywhere? In-Reply-To: <36B01FF7.7E016A85@appliedbiometrics.com>; from Christian Tismer on Thu, Jan 28, 1999 at 09:29:43AM +0100 References: <36AF5150.8E7A178D@darwin.in-berlin.de> <36B01FF7.7E016A85@appliedbiometrics.com> Message-ID: <19990128100502.C2267@rwpc.rhein-zeitung.de> Hi, I am using the xml stuff very much at work myself, but because it is changing so fast, I kept the xml 0.5 package in secret. If someone needs them, then I can sent them to him/her. But I think chris is right with his opinion, but if you think that the xml 0.5 should be released precompiled then this is no problem. Bye, Oliver On Thu, Jan 28, 1999 at 09:29:43AM +0100, Christian Tismer wrote: > Dinu C. Gherman wrote: > > > > Are the various versions of the XML add-ons distributed also > > in the popular RPM format? If so, where can the be found? > > It seems Oliver Andrich does not provide them. > > I don't think that this makes sense, already. The XML SIG has > made great progress but is still very evolving. The current > snapshot releases are very easy to install, since you > just need to unpack the archive into a dir which is in the > Python path, and you can import the modules instantly. > > If you want to follow the latest releases, I'd recommend > to use CVS. RPM seems to be a little early. > > ciao - chris > > -- > Christian Tismer :^) > Applied Biometrics GmbH : Have a break! Take a ride on Python's > Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net > 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ > we're tired of banana software - shipped green, ripens at home > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- Oliver Andrich, RZ-Online, Schlossstrasse Str. 42, D-56068 Koblenz Telefon: 0261-3921027 / Fax: 0261-3921033 / Web: http://rhein-zeitung.de Private Homepage: http://andrich.net/ From akuchlin@cnri.reston.va.us Fri Jan 29 14:43:52 1999 From: akuchlin@cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 29 Jan 1999 09:43:52 -0500 (EST) Subject: [XML-SIG] RE: [Zope] - XML-style DTML code In-Reply-To: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> Message-ID: <14001.50784.335761.791226@amarok.cnri.reston.va.us> Paul Everitt writes: > >Chris wrote: >> > >Ahh, tis nuthin better than seeing a patch accompany a proposal :^) > >Here's my main beef with this. The ostensible goal of the XML syntax is >to make it parse-able by new tools. Unfortunately, a valid use of the >current syntax: > > > >which is legal, would become: > > > >which *not* valid XML...is it? That is, can you have markup inside >markup? I don't believe so, but have CC'ed this to the XML-SIG where the real experts hang out. PIs have to be outside other markup; I suspect the XML way of handling your second case would be to define an entity: This is unfortunate for the application of HTML templating, because it collides with the use of entities in HTML. It also makes things difficult because the entity would have to be declared at the beginning of the file in the DOCTYPE declaration. Making the templating identical to XML, while keeping it conveniently human-editable, may not be possible. -- A.M. Kuchling http://starship.skyport.net/crew/amk/ "You? What are you?" "Me? Lady, I'm your worst nightmare -- a pumpkin with a gun." -- The Furies and Mervyn, in SANDMAN #66: "The Kindly Ones:10" From Fred L. Drake, Jr." References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> Message-ID: <14001.52924.23614.735389@weyr.cnri.reston.va.us> Paul Everitt writes: > > >which is legal, would become: This is legal: The "" is the CDATA value of the size attribute, not a comment. > > >which *not* valid XML...is it? That is, can you have markup inside The "" is a perfectly valid string value of the size attribute, just as before. > I don't believe so, but have CC'ed this to the XML-SIG where > the real experts hang out. PIs have to be outside other markup; I > suspect the XML way of handling your second case would be to define an > entity: > > In neither SGML nor XML can markup be nested like this. The use of entities is the proper way to do this in either case. Perhaps a processing tool needs to be available which can perform "entity expansion" for specified entity names only? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From petrilli@amber.org Fri Jan 29 15:26:52 1999 From: petrilli@amber.org (Christopher G. Petrilli) Date: Fri, 29 Jan 1999 10:26:52 -0500 Subject: [XML-SIG] Re: [Zope] - Re: [XML-SIG] RE: [Zope] - XML-style DTML code In-Reply-To: <14001.52924.23614.735389@weyr.cnri.reston.va.us>; from Fred L. Drake on Fri, Jan 29, 1999 at 10:07:40AM -0500 References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> <14001.52924.23614.735389@weyr.cnri.reston.va.us> Message-ID: <19990129102652.06893@amber.org> On Fri, Jan 29, 1999 at 10:07:40AM -0500, Fred L. Drake wrote: > > Paul Everitt writes: > > > > > >which is legal, would become: > > This is legal: The "" is the CDATA value of > the size attribute, not a comment. Right this is the current scheme (note that this is one use of the DTML command set that is embedded in an HTML tag, a lot aren't). And this is also how I read the sstandard. > > > > > >which *not* valid XML...is it? That is, can you have markup inside > > The "" is a perfectly valid string value of the > size attribute, just as before. Wouldn't the DTD restrict the use of < inside? I thoguht the spec required that except inside a couple things ... like PIs... that the < and & characters must be escaped? > > > > In neither SGML nor XML can markup be nested like this. The use of > entities is the proper way to do this in either case. Perhaps a > processing tool needs to be available which can perform "entity > expansion" for specified entity names only? I'm confused by what you mean here, being a newbie to XMLish things. Chris -- | Christopher Petrilli | petrilli@amber.org From Fred L. Drake, Jr." References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> <14001.52924.23614.735389@weyr.cnri.reston.va.us> <19990129102652.06893@amber.org> Message-ID: <14001.54886.704665.224130@weyr.cnri.reston.va.us> Christopher G. Petrilli writes: > Wouldn't the DTD restrict the use of < inside? I thoguht the spec > required that except inside a couple things ... like PIs... that the < > and & characters must be escaped? Hmm... not the DTD, but you got me: the XML spec may well restrict the use of < and & in quoted attribute values. While avoiding some of the delimiter-in-context rules from SGML for the benefit of parser implementors, we end up with some ugly markup. ;-( > > > > > > > In neither SGML nor XML can markup be nested like this. The use of > > entities is the proper way to do this in either case. Perhaps a > > processing tool needs to be available which can perform "entity > > expansion" for specified entity names only? > > I'm confused by what you mean here, being a newbie to XMLish things. I meant that '' (SGML this time!) did not contain nested markup. (Same for the PI in an attribute value.) '' does contain nested markup, but not nested structure. My thought was that a tool could be written which would convert: ]> &frob; & into this: replacement text & Such a tool could perform expansion on either all the entities defined in the internal subset (the stuff in [ ... ] in the DOCTYPE declaration), or allow the user to specify a list of names (and possibly values) from another source. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From petrilli@amber.org Fri Jan 29 15:43:23 1999 From: petrilli@amber.org (Christopher G. Petrilli) Date: Fri, 29 Jan 1999 10:43:23 -0500 Subject: [XML-SIG] Re: [Zope] - Re: [XML-SIG] RE: [Zope] - XML-style DTML code In-Reply-To: <14001.54886.704665.224130@weyr.cnri.reston.va.us>; from Fred L. Drake on Fri, Jan 29, 1999 at 10:40:22AM -0500 References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> <14001.52924.23614.735389@weyr.cnri.reston.va.us> <19990129102652.06893@amber.org> <14001.54886.704665.224130@weyr.cnri.reston.va.us> Message-ID: <19990129104323.11790@amber.org> On Fri, Jan 29, 1999 at 10:40:22AM -0500, Fred L. Drake wrote: > > I meant that '' (SGML this time!) did not > contain nested markup. (Same for the PI in an attribute value.) So my implementation of the is acceptable under XML guidelines? That was how I interpreted it, but gods only know! > '' does contain nested markup, but not nested > structure. > My thought was that a tool could be written which would convert: My head just exploded Thank you Very Much :-) Chris -- | Christopher Petrilli | petrilli@amber.org From Fred L. Drake, Jr." References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> <14001.52924.23614.735389@weyr.cnri.reston.va.us> <19990129102652.06893@amber.org> <14001.54886.704665.224130@weyr.cnri.reston.va.us> <19990129104323.11790@amber.org> Message-ID: <14001.55410.760735.96752@weyr.cnri.reston.va.us> Christopher G. Petrilli writes: > So my implementation of the is acceptable under XML > guidelines? That was how I interpreted it, but gods only know! (I didn't catch any of the discussion before Andrew CC'd the XML-SIG, so I think I'm missing some of the context here.) What you probably want to do is to pass an example that uses the PI syntax in all situations that you intend to support (including in attribute values if you want that), and pass it through a validating parser. If it complains, you'll know what's broken. If it doesn't, then go ahead and use it. I'd rather see PI syntax used over comment syntax for either SGML or XML for this sort of processing. > My head just exploded Thank you Very Much :-) You're welcome. ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From petrilli@amber.org Fri Jan 29 15:53:40 1999 From: petrilli@amber.org (Christopher G. Petrilli) Date: Fri, 29 Jan 1999 10:53:40 -0500 Subject: [XML-SIG] Re: [Zope] - Re: [XML-SIG] RE: [Zope] - XML-style DTML code In-Reply-To: <14001.55410.760735.96752@weyr.cnri.reston.va.us>; from Fred L. Drake on Fri, Jan 29, 1999 at 10:49:06AM -0500 References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> <14001.52924.23614.735389@weyr.cnri.reston.va.us> <19990129102652.06893@amber.org> <14001.54886.704665.224130@weyr.cnri.reston.va.us> <19990129104323.11790@amber.org> <14001.55410.760735.96752@weyr.cnri.reston.va.us> Message-ID: <19990129105340.15248@amber.org> On Fri, Jan 29, 1999 at 10:49:06AM -0500, Fred L. Drake wrote: > > Christopher G. Petrilli writes: > > So my implementation of the is acceptable under XML > > guidelines? That was how I interpreted it, but gods only know! > > (I didn't catch any of the discussion before Andrew CC'd the > XML-SIG, so I think I'm missing some of the context here.) > What you probably want to do is to pass an example that uses the PI > syntax in all situations that you intend to support (including in > attribute values if you want that), and pass it through a validating > parser. If it complains, you'll know what's broken. If it doesn't, > then go ahead and use it. I'd rather see PI syntax used over comment > syntax for either SGML or XML for this sort of processing. Well, then I'll sit down and write a test suite as soon as I get the brainpower back from your exploding my head, and we can see what explodes and what doesn't. Also, I'm going to tweek the syntax since everyone seems to want to get rid of some vestigal old pieces... Also, I'm not sure it's intended to BE XML, more accurately it's intended to LOOK like XML to an XML editor, the move to full XML for this could be troublesome... for exmaple: Print something here. Yes I realise that XML has it's own constructs for doing things like this, BUT ... what I'm trying to do is create a migration path, and move it to something that starts to LOOK like XML, so that people using can use it, and not have to worry about the troublesome side-effects of putting logic in comment code. Please no religious responses though :-) Chris -- | Christopher Petrilli | petrilli@amber.org From larsga@ifi.uio.no Fri Jan 29 16:01:23 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 29 Jan 1999 17:01:23 +0100 Subject: [XML-SIG] RE: [Zope] - XML-style DTML code In-Reply-To: <14001.50784.335761.791226@amarok.cnri.reston.va.us> References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> Message-ID: * Paul Everitt | | | [...] | | | which *not* valid XML...is it? Neither of these are well-formed XML, since '<'s are not allowed in attribute values. The spec is less clear than it ought to be on this[1], perhaps, but xmlproc, XP, Lark and the Sun XML parser are all in agreement that this isn't allowed. AElfred allows it, but then some checks have been left out of AElfred, ostensibly for class file size reasons. | That is, can you have markup inside markup? No. Even if you write the PI in the attribute won't be recognized as one. However, not knowing Zope I don't think this is fatal if Zope substitutes this before any XML/HTML parsers see the result. If you're trying to use XML/HTML/SGML syntax for a preprocessor then maybe that isn't the way to go. * Andrew M. Kuchling | | I don't believe so, but have CC'ed this to the XML-SIG where the | real experts hang out. PIs have to be outside other markup; I | suspect the XML way of handling your second case would be to define | an entity: | | This is right, yes. --Lars M. [1] The relevant part is a WFC to production 41 in section 3.1. From Fred L. Drake, Jr." References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> <14001.52924.23614.735389@weyr.cnri.reston.va.us> <19990129102652.06893@amber.org> <14001.54886.704665.224130@weyr.cnri.reston.va.us> <19990129104323.11790@amber.org> <14001.55410.760735.96752@weyr.cnri.reston.va.us> <19990129105340.15248@amber.org> Message-ID: <14001.57143.185047.67235@weyr.cnri.reston.va.us> Christopher G. Petrilli writes: > Also, I'm not sure it's intended to BE XML, more accurately it's > intended to LOOK like XML to an XML editor, the move to full XML for > this could be troublesome... for exmaple: If an XML editor is going to handle it, it better be XML! If it looks like XML, someone will want to use an editor for it, so.... -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From petrilli@amber.org Fri Jan 29 16:21:23 1999 From: petrilli@amber.org (Christopher G. Petrilli) Date: Fri, 29 Jan 1999 11:21:23 -0500 Subject: [XML-SIG] Re: [Zope] - Re: [XML-SIG] RE: [Zope] - XML-style DTML code In-Reply-To: <14001.57143.185047.67235@weyr.cnri.reston.va.us>; from Fred L. Drake on Fri, Jan 29, 1999 at 11:17:59AM -0500 References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> <14001.52924.23614.735389@weyr.cnri.reston.va.us> <19990129102652.06893@amber.org> <14001.54886.704665.224130@weyr.cnri.reston.va.us> <19990129104323.11790@amber.org> <14001.55410.760735.96752@weyr.cnri.reston.va.us> <19990129105340.15248@amber.org> <14001.57143.185047.67235@weyr.cnri.reston.va.us> Message-ID: <19990129112123.13500@amber.org> On Fri, Jan 29, 1999 at 11:17:59AM -0500, Fred L. Drake wrote: > > Christopher G. Petrilli writes: > > Also, I'm not sure it's intended to BE XML, more accurately it's > > intended to LOOK like XML to an XML editor, the move to full XML for > > this could be troublesome... for exmaple: > > If an XML editor is going to handle it, it better be XML! If it > looks like XML, someone will want to use an editor for it, so.... Well, my goal has not been to convert DTML to an XML, but to make it more LIKE XML, something that is familiar, and something that wouldn't look out of place. Honestly, I do not expect people to try and write HTML+ZTML with an XML tool, were such a beast to actually end up existing in the hands of a normal human... Chris -- | Christopher Petrilli | petrilli@amber.org From petrilli@amber.org Fri Jan 29 16:23:32 1999 From: petrilli@amber.org (Christopher G. Petrilli) Date: Fri, 29 Jan 1999 11:23:32 -0500 Subject: [XML-SIG] RE: [Zope] - XML-style DTML code In-Reply-To: ; from Lars Marius Garshol on Fri, Jan 29, 1999 at 05:01:23PM +0100 References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> Message-ID: <19990129112332.41792@amber.org> On Fri, Jan 29, 1999 at 05:01:23PM +0100, Lars Marius Garshol wrote: > > However, not knowing Zope I don't think this is fatal if Zope > substitutes this before any XML/HTML parsers see the result. If you're > trying to use XML/HTML/SGML syntax for a preprocessor then maybe that > isn't the way to go. Currently, and I can't speak for the future of this, but currently, Zope is designed to parse DTML (the current syntax, using comments) into pure raw HTML, and nothing else... it's not intended to go to XML/SGML, and quite honestly, I don't think it would be a good fit for that. What it is, quite honestly, is a tiny little scripting ability (like PHP), not a full blown mark-up language. I believe PHP also uses as it's syntax, and I've not seen any huge explosions of fire from that one. Chris -- | Christopher Petrilli | petrilli@amber.org From pharris@forfree.at Fri Jan 29 16:23:03 1999 From: pharris@forfree.at (Phil Harris) Date: Fri, 29 Jan 1999 16:23:03 -0000 Subject: [XML-SIG] Re: [Zope] - Re: [XML-SIG] RE: [Zope] - XML-style DTML code Message-ID: <01ec01be4ba3$bc4abe40$5c773fc1@ml.uwcm.ac.uk> Surely, xml would allow <'s and >'s within quoted strings? if not, boy is that weird! ----- Original Message ----- From: Lars Marius Garshol To: ; Sent: Friday, January 29, 1999 4:01 PM Subject: [Zope] - Re: [XML-SIG] RE: [Zope] - XML-style DTML code > >* Paul Everitt >| >| >| [...] >| >| >| which *not* valid XML...is it? > >Neither of these are well-formed XML, since '<'s are not allowed in >attribute values. The spec is less clear than it ought to be on this[1], >perhaps, but xmlproc, XP, Lark and the Sun XML parser are all in >agreement that this isn't allowed. > >AElfred allows it, but then some checks have been left out of AElfred, >ostensibly for class file size reasons. > >| That is, can you have markup inside markup? > >No. Even if you write > > > >the PI in the attribute won't be recognized as one. > >However, not knowing Zope I don't think this is fatal if Zope >substitutes this before any XML/HTML parsers see the result. If you're >trying to use XML/HTML/SGML syntax for a preprocessor then maybe that >isn't the way to go. > >* Andrew M. Kuchling >| >| I don't believe so, but have CC'ed this to the XML-SIG where the >| real experts hang out. PIs have to be outside other markup; I >| suspect the XML way of handling your second case would be to define >| an entity: >| >| > >This is right, yes. > >--Lars M. > >[1] The relevant part is a WFC to production 41 in section 3.1. > > >_______________________________________________ >Zope maillist - Zope@zope.org >http://www2.zope.org/mailman/listinfo/zope > From larsga@ifi.uio.no Fri Jan 29 16:35:39 1999 From: larsga@ifi.uio.no (Lars Marius Garshol) Date: 29 Jan 1999 17:35:39 +0100 Subject: [XML-SIG] Re: [Zope] - Re: [XML-SIG] RE: [Zope] - XML-style DTML code In-Reply-To: <01ec01be4ba3$bc4abe40$5c773fc1@ml.uwcm.ac.uk> References: <01ec01be4ba3$bc4abe40$5c773fc1@ml.uwcm.ac.uk> Message-ID: * Phil Harris | | Surely, xml would allow <'s and >'s within quoted strings? It does not, unfortunately. Well, you can have them in entities, but if you use those entities in the wrong places then you're not well-formed. | if not, boy is that weird! It might be to keep people from thinking that inside an attribute value is an element instead of just a string that looks like an element. --Lars M. From paul@prescod.net Fri Jan 29 16:34:12 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 29 Jan 1999 10:34:12 -0600 Subject: [XML-SIG] RE: [Zope] - XML-style DTML code References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> <14001.52924.23614.735389@weyr.cnri.reston.va.us> Message-ID: <36B1E304.8E5165CE@prescod.net> "Fred L. Drake" wrote: > > In neither SGML nor XML can markup be nested like this. The use of > entities is the proper way to do this in either case. Perhaps a > processing tool needs to be available which can perform "entity > expansion" for specified entity names only? I discussed this a couple of months ago on the Zope list. I suggested that they use XSL template syntax. It's more verbose but it separates the levels more cleanly. The syntax for doing attributes would look something like this: 6 blah blah blah The Zope equivalent would be: 6 blah blah blah IMHO, the current Zope syntax cannot survive into the "XML age." People will want to author their templates in XML editors and Zope's illegal syntax will prevent this. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Don't you know that the smart bombs are so clever, they only kill bad people." - http://www.boingo.com/lyrics/WarAgain.html From co@daisybytes.su.uunet.de Fri Jan 29 17:00:29 1999 From: co@daisybytes.su.uunet.de (Carsten Oberscheid) Date: Fri, 29 Jan 1999 18:00:29 +0100 Subject: [XML-SIG] AW: [Zope] - XML-style DTML code Message-ID: <01BE4BB1.42EC9C90.co@daisybytes.su.uunet.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > > Paul Everitt writes: > > > >Chris wrote: > >> > > > >Ahh, tis nuthin better than seeing a patch accompany a proposal :^) > > > >Here's my main beef with this. The ostensible goal of the XML syntax is > >to make it parse-able by new tools. Unfortunately, a valid use of the > >current syntax: > > > > > > > >which is legal, would become: Sorry, I think this ain't legal, too. It's ok with sgml (at least nsgmls doesn't complain), but the XML specs say you can't use "<" inside attribute values at all. > > > > > > > >which *not* valid XML...is it? That is, can you have markup inside > >markup? > > I don't believe so, but have CC'ed this to the XML-SIG where > the real experts hang out. PIs have to be outside other markup; I > suspect the XML way of handling your second case would be to define an > entity: > > > > This is unfortunate for the application of HTML templating, because it > collides with the use of entities in HTML. It also makes things > difficult because the entity would have to be declared at the > beginning of the file in the DOCTYPE declaration. Making the > templating identical to XML, while keeping it conveniently > human-editable, may not be possible. > What about this: where &ztml; is a dummy entity declared once in the DTD. This should be valid XML. The DTML engine then interprets the PI as "I store this string as a DTML command, then next time I encounter &ztml; I replace it with the results of the DTML command". I admit that this is less editable/readable than the current DTML syntax, but it's quite close, especially if the "store" PI is kept close to the &ztml; "placeholder". For the "simple" case of DTML commands within character data Chris' proposal still works:

...plain text ... ...

without the "cmd" assignment can be "executed" and replaced immediately without the entity stunt, and it is valid XML. Regards .co. +------------------------------------------------------- daisy bytes! --------+ Carsten Oberscheid co@daisybytes.su.uunet.de digital document processing http://www.pweb.de/daisybytes.su electronic publishing -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 5.5.3i for non-commercial use iQA/AwUBNrHbHowjR4jmR8/DEQKZpgCguMJhCDXh/sHIcP+uCeqz3PpF/PMAoP4U btpPwlkRa66yQC9vahx904oU =ibSb -----END PGP SIGNATURE----- From tismer@appliedbiometrics.com Fri Jan 29 17:00:56 1999 From: tismer@appliedbiometrics.com (Christian Tismer) Date: Fri, 29 Jan 1999 18:00:56 +0100 Subject: [XML-SIG] Please stop the cross posting Message-ID: <36B1E948.42CE8FE3@appliedbiometrics.com> Friends, it is ok to cross post things which belong to two mailing lists. But can we *please* take care about the subject lines? I get crazy when I have to read Re: [Zope] - Re: [XML-SIG] RE: [Zope] - XML-style DTML code Is it possible to always reply from the sig where the origin was? I will also propose to change Mailman to handle this in a better way. My own patched version on Starship never prepends the list name if it can be matched in the "re" already. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.skyport.net 10553 Berlin : PGP key -> http://pgp.ai.mit.edu/ we're tired of banana software - shipped green, ripens at home From paul@prescod.net Fri Jan 29 16:35:59 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 29 Jan 1999 10:35:59 -0600 Subject: [XML-SIG] RE: [Zope] - XML-style DTML code References: <613145F79272D211914B0020AFF64019049E8C@gandalf.digicool.com> <14001.50784.335761.791226@amarok.cnri.reston.va.us> <14001.52924.23614.735389@weyr.cnri.reston.va.us> Message-ID: <36B1E36F.9A5AE072@prescod.net> "Fred L. Drake" wrote: > > Paul Everitt writes: > > > > > >which is legal, would become: > > This is legal: The "" is the CDATA value of > the size attribute, not a comment. That is legal SGML but not XML. > > > > > >which *not* valid XML...is it? That is, can you have markup inside > > The "" is a perfectly valid string value of the > size attribute, just as before. Ditto. > In neither SGML nor XML can markup be nested like this. The use of > entities is the proper way to do this in either case. Perhaps a > processing tool needs to be available which can perform "entity > expansion" for specified entity names only? In a valid XML document, all entities must be defined in the DTD. XML does not provide for them to be supplied by the containing application. SGML did, but XML does not. The usual way to do this is with elements, as described in XSL. Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Don't you know that the smart bombs are so clever, they only kill bad people." - http://www.boingo.com/lyrics/WarAgain.html From bwarsaw@python.org Fri Jan 29 17:16:48 1999 From: bwarsaw@python.org (Barry A. Warsaw) Date: Fri, 29 Jan 1999 12:16:48 -0500 (EST) Subject: [XML-SIG] Re: Please stop the cross posting References: <36B1E948.42CE8FE3@appliedbiometrics.com> Message-ID: <14001.60672.211755.185830@anthem.cnri.reston.va.us> >>>>> "CT" == Christian Tismer writes: CT> I will also propose to change Mailman to handle this CT> in a better way. My own patched version on Starship CT> never prepends the list name if it can be matched in the CT> "re" already. I thought Mailman already does this too. I'll double check. Chris, you might want to send your patches to mailman-developers@python.org -Barry From dieter@handshake.de Sun Jan 31 16:48:41 1999 From: dieter@handshake.de (Dieter Maurer) Date: Sun, 31 Jan 1999 17:48:41 +0100 Subject: [XML-SIG] minor BUG and Patch for "html_builder" Message-ID: <199901311648.RAA01338@lindm.dm> This is a multi-part MIME message. --------------FC5583E803777E8ABB8C4995 Content-Type: text/plain; charset=iso-8859-1 "HtmlBuilder" (from the xml-0.5 distribution) goes into an infinite loop when it encounters an empty tag explicitely closed, e.g.: "HtmlWriter" generates such constructs. A patch is appended. Dieter --------------FC5583E803777E8ABB8C4995 Content-Type: application/x-patch; name="html_builder.pat" --- :html_builder.py Tue Dec 29 10:45:25 1998 +++ html_builder.py Sun Jan 31 16:22:35 1999 @@ -72,7 +72,7 @@ while self.stack: if tag in self.empties: - continue + break start_tag = self.stack[-1] del self.stack[-1] Builder.endElement(self, start_tag) --------------FC5583E803777E8ABB8C4995--