From D.Hoeppner@tu-bs.de Tue Aug 3 13:22:04 1999 From: D.Hoeppner@tu-bs.de (=?ISO-8859-1?Q?Dierk_H=F6ppner?=) Date: Tue, 3 Aug 1999 14:22:04 +0200 Subject: [XML-SIG] SAX and HTML Message-ID: <5D650DE026C@buch.biblio.etc.tu-bs.de> Hello, I want to use SAX to extract data from HTML. I began with modifying the example saxstats.py but it did not come very far because my html-sources are not well constructed xml-documents. Then I forced the parser to use drv_htmllib but this failed because HTMLParser of htmllib wants a formatter. drv_htmllib gives None which doesn't work of course. Any hints what to do? Even RTFM ist welcome but please give a hint to a good page ;-) greetings Dierk Hoeppner Braunschweig University Library Pockelsstr. 13 D-38106 Braunschweig Germany Tel: +49-531-391-5066 Fax: -5836 E-Mail: d.hoeppner@tu-bs.de From Fred L. Drake, Jr." References: <5D650DE026C@buch.biblio.etc.tu-bs.de> Message-ID: <14246.61826.383361.367841@weyr.cnri.reston.va.us> Dierk Höppner writes: > I want to use SAX to extract data from HTML. I began with > modifying the example saxstats.py but it did not come very far > because my html-sources are not well constructed xml-documents. > Then I forced the parser to use drv_htmllib but this failed because > HTMLParser of htmllib wants a formatter. drv_htmllib gives None > which doesn't work of course. Dierk, Try changing drv_htmllib to use a formatter.NullFormatter instance. Let us know how that works; if a simple fix to drv_htmllib does the trick, I think we can do that! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From kantel@mpiwg-berlin.mpg.de Tue Aug 3 16:38:04 1999 From: kantel@mpiwg-berlin.mpg.de (=?iso-8859-1?Q?J=F6rg?= Kantel) Date: Tue, 3 Aug 1999 17:38:04 +0200 Subject: [XML-SIG] XML to XML Conversation via SAX Message-ID: (maybe a very stupid question ;-) We have a collection of (very) large XML files that we have to convert to XML files. That sounds stupid but we have to insert or to update in different tags attributes concerning on the contents of the files. I thought I could do that with Python and the saxlib but I run in a problem by writing the attributes back (in other words: I'm to stupid to use the saxlib.AttributeList-methods. I tried the following (mostly inspired by the saxlib tutorial ;-) #!usr/local/bin/python from xml.sax import saxlib import string class WriteTags(saxlib.HandlerBase, saxlib.AttributeList): def makeStartTag(self, name): tagText = "<" + name numbers = saxlib.AttributeList().getLength() print numbers if numbers: for i in numbers: tagName = saxlib.AttributeList().getName(i) tagValue = saxlib.AttributeList().getValue(i) print tagName print tagValue tagText = tagText + " " + tagName + "=\"" + tagValue + "\" " tagText = tagText + ">" return tagText (...) makeStartTag was called in the startElement-method, but numbers (and therefore tagName and tagValue too) returns always "none". I'm really not sure how to connect the AttributeList-methods with the HandlerBase. Any hints are wellcome. TIA J"org -- -------------------------------------------------------------------------- J"org Kantel Max-Planck-Institute for the History of Science Computer-Department kantel@mpiwg-berlin.mpg.de Wilhelmstr. 44 http://www.mpiwg-berlin.mpg.de/staff/kantel/kantel.html D-10117 Berlin fon: +4930-22667-220 fax: +4930-22667-299 -------------------------------------------------------------------------- From paul@prescod.net Tue Aug 3 21:05:33 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 03 Aug 1999 15:05:33 -0500 Subject: [XML-SIG] Python DOM Unification -- level References: <3724CC49.AAB857A5@prescod.net> <14116.55422.189139.235663@amarok.cnri.reston.va.us> <3724E2A1.62223458@prescod.net> <14117.52230.551462.836651@weyr.cnri.reston.va.us> <37A019E2.B334709D@FourThought.com> Message-ID: <37A74B8D.9C7E38C0@prescod.net> Mike Olson wrote: > > We're gonna have some free time in August to do some major work on 4DOM and > 4XSLT, including getting 4XSLT up to date with the latest XSLT draft and > breaking out the patterns into 4XPath (or some clever name). Cool! > I wanted to bring up the DOM interface unification topic again as we will be > working on 4DOM this month and may have time to experiment with some Lit/ python > interfaces. Last we left off, we couldn't decide how many and how lit the > interfaces should be. Is anyone still doing work to come up with a unified > interface(s)? Is it something we still want to consider? Should we > (Fourthought) just produce a lit interface as pythonic as possible and then > mold/wrap pydom and 4dom to meet it? Well I think that the main issues for the pythonic interface are: * mappings should act as Python mappings. (in fact the only standardized interfaces should probably be the __getitem__ stuff) * node lists should act as Python sequences. (ditto) * namespace properties should be modelled on the relevant operators in XPath (I think that the real DOM will be copying XPath) XPath support should probably be available both as a module and as methods on the DOM. The module is cool because it could be made available for any DOM. The methods are cool because they could be really optimized for *this* DOM. Microsoft calls the XPath-using methods "selectNodes" and "selectSingleNode". They also have "transformNode". Any DOM could add "simple" support by redirecting the methods to a DOM-generic method. They could add optimized support by writing code for the methods themselves. They could even use a mix where they call the method for complex queries! Many of the methods we discussed before like getChild, getText and so forth can be done easily as queries like node.selectNodes( "text()" ), node.selectNodes( "//text()" ) and so forth. One issue awith selectNodes is how to count nodes. XSL mandates that adjacent text nodes must be merged. The DOM does not (but probably should!). > For my 2 cents worth, I guess I see a need for 2 interfaces. The one > defined by W3C and a totally pythonic interface. Then a wrapper that can be > used to turn a DOM compliant interface implementation into the pythonic > interface. ORB/ORBless I think we decided is orthagonal to this decision. That all sounds right to me. Paul Prescod From dieter@handshake.de Tue Aug 3 22:57:15 1999 From: dieter@handshake.de (Dieter Maurer) Date: Tue, 3 Aug 1999 23:57:15 +0200 (CEST) Subject: [XML-SIG] [Ann] PyXPath 0.1 -- Implementation of the XPath July working draft on top of PyDom Message-ID: <14247.25863.875238.893246@lindm.dm> I have just released PyXPath 0.1, an implementation of the XPath July working draft on top of PyDOM. For more information and download, see URL:http://www.handshake.de/~dieter/pyprojects/pyxpath.html - Dieter From D.Hoeppner@tu-bs.de Wed Aug 4 09:03:33 1999 From: D.Hoeppner@tu-bs.de (=?ISO-8859-1?Q?Dierk_H=F6ppner?=) Date: Wed, 4 Aug 1999 10:03:33 +0200 Subject: [XML-SIG] SAX and HTML - success!?? Message-ID: <5EA03460720@buch.biblio.etc.tu-bs.de> Fred, you mentioned pylibs.py. I played around a little (far from understanding the whole thing). Perhaps I found a solution: In pylibs.SGMLParsers I just added def handle_starttag(self,tag,method,attributes): #### "Handles start tags." attrs={} for (a,v) in attributes: attrs[a]=v self.doc_handler.startElement(tag,saxutils.AttributeMap(attrs)) The demo xml/demo/saxsaxstats.py worked for me with one little change. I changed the line p=saxexts.make_parser() to p=saxexts.make_parser("xml.sax.drivers.drv_htmllib") I dont't know if this was the solution but perhaps it is a hint for your search. Thanks for your help Dierk Braunschweig University Library Pockelsstr. 13 D-38106 Braunschweig Germany Tel: +49-531-391-5066 Fax: -5836 E-Mail: d.hoeppner@tu-bs.de From Fred L. Drake, Jr." References: <5EA03460720@buch.biblio.etc.tu-bs.de> Message-ID: <14248.20301.683337.109525@weyr.cnri.reston.va.us> Dierk Höppner writes: > you mentioned pylibs.py. I played around a little (far from > understanding the whole thing). Perhaps I found a solution: In > pylibs.SGMLParsers I just added Dierk, That would have been my first thing to try! Does this solve your immediate problems? If you can send along a test script (you mentioned a modified demo; were the modifications relevant to the problem?) and example data, I can still play with it some. If this still looks like a good fix, I'll commit it to the repository. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From michel.plu@cnet.francetelecom.fr Thu Aug 5 14:14:34 1999 From: michel.plu@cnet.francetelecom.fr (PLU Michel CNET/DSM/LAN) Date: Thu, 5 Aug 1999 15:14:34 +0200 Subject: [XML-SIG] string encoding translater Message-ID: here is my problem i want to parse an IS0-8859-1 ( iso latin 1) encoded xml file . But when i parse it whith the python sax parser ( saxexts.make_parser) all strings in the resulted dom tree ( attributes value or nodes data) are store as utf-8 ( unicode) encoded string. as example for an xml line as produce a node where the value of attribute name is : Matières is there a way in python to translate the utf-8 encode string to orginal iso-8859 thanks for answers Michel From sean@digitome.com Thu Aug 12 15:26:21 1999 From: sean@digitome.com (Sean Mc Grath) Date: Thu, 12 Aug 1999 15:26:21 +0100 Subject: [XML-SIG] pyDOM NamedNodeMap - bug report and problem Message-ID: <3.0.6.32.19990812152621.00965710@gpo.iol.ie> I am trying to print out attribute name,value pairs using pyDOM and having some problems. Here is the relevant part of my code: for n in doc.documentElement.childNodes: if n.nodeType == core.ELEMENT_NODE: attrs = n.attributes for i in range (0,attrs.get_length()): attr = attrs.item(i) print attr.name print attr.value The item() method initially did not work, returning an unsubscriptable object error. This is a buglet in NamedNodeMap: Before fix: # Additional methods specified in the DOM Recommendation def item(self, index): return self.data.values[ index ] After fix: (parenthesis in call to values method of data dictionary) # Additional methods specified in the DOM Recommendation def item(self, index): return self.data.values()[ index ] I am now getting my attribute names through just fine but all my attribute values are None. There are definitely there in the DOM structure because toxml puts 'em out just fine. Ideas? regards, Developers Day Co-Chair, 9th International World Wide Web Conference 16-19, May, 2000, Amsterdam, The Netherlands http://www9.org From position.Offers.USA@Freelance.com Thu Aug 12 16:10:49 1999 From: position.Offers.USA@Freelance.com (position.Offers.USA@Freelance.com) Date: Thu, 12 Aug 1999 17:10:49 +0200 Subject: [XML-SIG] Freelance Technologies Message-ID: Madam, Sir, We noted your e-mail at the internet address "http://www.versions.com/" The mission of Freelance Technologies is to be the commercial task force for independent contractors. To achieve this, we are creating a sales network in the major cities of the United States. This network will help you to find the most interesting positions in IT consulting in the best companies. We will help to promote your skills and your career. You, as an independent consultant, will have to pay no fee or sign any exclusivity contract with Freelance Technologies. Visit our web site at http://www.freelance.com to find out more about our services. It is at your disposal and is the professional web site of the independent contractor. There, you can: - Find a list of available projects - Communicate with our sales persons via e-mail or get their contact details - Mail us your resume - Find out about other services that we can offer you (accounting, training, insurance, internet links, networking...) - Subscribe to our mailing list to receive a daily e-mail listing of new projects You can also mail you resume to the following addresses : - contactUSA@freelance.com - Freelance Technologies 75 Maiden Lane, suite 507 New York, NY 10038 Please don't hesitate to contact us if you have questions or comments or to suggest ideas or services you would like to see on the website. Also feel free to communicate our internet address to your colleagues. Thanking you, Yann Marteil, President of Freelance Technologies USA ----------------------------------------------------------- FREELANCE TECHNOLOGIES, the commercial task force of the Independant Consultant Freelance Technologies 75 Maiden Lane, suite 507 New York, NY 10038 Tel : 212 402 68 68 - Fax : 212 402 68 69 http://www.freelance.com ----------------------------------------------------------- ... We are fully aware that this document has been mailed without your request. We apologize if you are not concerned by this message. If you don't reply with us you won't receive anything from us again. From akuchlin@mems-exchange.org Fri Aug 13 03:44:40 1999 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 12 Aug 1999 22:44:40 -0400 (EDT) Subject: [XML-SIG] pyDOM NamedNodeMap - bug report and problem In-Reply-To: <3.0.6.32.19990812152621.00965710@gpo.iol.ie> References: <3.0.6.32.19990812152621.00965710@gpo.iol.ie> Message-ID: <14259.34456.576862.738920@amarok.cnri.reston.va.us> Sean Mc Grath writes: >I am now getting my attribute names through just fine but all my attribute >values are None. There are definitely there in the DOM structure because >toxml puts 'em out just fine. Ideas? Try this patch; without the NODE_CLASS stuff in the patch, the item() method returns a _node instance, which shouldn't be exposed to the user. -- A.M. Kuchling http://starship.python.net/crew/amk/ Welcome, one and all, to the far-flung future of -- 1965! -- Zot, in ZOT! #1 Index: core.py =================================================================== RCS file: /home/cvsroot/xml/dom/core.py,v retrieving revision 1.46 diff -C2 -r1.46 core.py *** core.py 1999/05/08 20:18:18 1.46 --- core.py 1999/08/13 02:26:34 *************** *** 259,265 **** # Additional methods specified in the DOM Recommendation def item(self, index): ! return self.data.values[ index ] getNamedItem = UserDict.UserDict.__getitem__ --- 259,266 ---- # Additional methods specified in the DOM Recommendation def item(self, index): ! n = self.data.values()[ index ] ! return NODE_CLASS[ n.type ](n, self._document ) getNamedItem = UserDict.UserDict.__getitem__ From dieter@handshake.de Thu Aug 12 18:06:22 1999 From: dieter@handshake.de (Dieter Maurer) Date: Thu, 12 Aug 1999 19:06:22 +0200 Subject: [XML-SIG] XML 0.5.1 bug: 'amp' character reference not handled correctly by "HtmlBuilder/HtmlWriter" Message-ID: <199908121706.TAA00810@lindm.dm> "HtmlBuilder" translates '&' into an entity reference. This does not follow the DOM spec. It specifies that character references are expected to be expanded by the HTML/XML processor. "XmlWriter/HtmlWriter" does not output the 'amp' entity reference. This, obviously, is a bug in "XmlWriter/HtmlWriter". By the way, processing instructions are not output, too. I have fixed my "&" problem by adding "amp" to the "expand_entities" tuple in "HtmlBuilde". This, however, is not a general solution. - Dieter From dieter@handshake.de Fri Aug 13 07:41:24 1999 From: dieter@handshake.de (Dieter Maurer) Date: Fri, 13 Aug 1999 08:41:24 +0200 (CEST) Subject: [XML-SIG] pyDOM NamedNodeMap - bug report and problem In-Reply-To: <3.0.6.32.19990812152621.00965710@gpo.iol.ie> References: <3.0.6.32.19990812152621.00965710@gpo.iol.ie> Message-ID: <14259.47828.462367.393783@lindm.dm> Hello Sean Sean Mc Grath writes: > After fix: (parenthesis in call to values method of data dictionary) > # Additional methods specified in the DOM Recommendation > def item(self, index): > return self.data.values()[ index ] > > I am now getting my attribute names through just fine but all my attribute > values are None. There are definitely there in the DOM structure because > toxml puts 'em out just fine. Ideas? For some unknown reason (a bug, I think), the real attribute information is in the "children[0]" attribute of the returned "item". You may try: > def item(self, index): > return self.data.values()[ index ].children[0] But I am not sure, whether this will work for all NamedNodeMap's. And it is probably not the correct solution, because it returns a "_nodeData" instance rather than an "Attr" instance. Almost surely, the correct implementation is: > def item(self, index): > return Attr(self.data.values()[ index ],self._document) - Dieter From D.Hoeppner@tu-bs.de Fri Aug 13 07:53:26 1999 From: D.Hoeppner@tu-bs.de (=?ISO-8859-1?Q?Dierk_H=F6ppner?=) Date: Fri, 13 Aug 1999 08:53:26 +0200 Subject: [XML-SIG] entity munching monster tracked down! Message-ID: <6C0E0697493@buch.biblio.etc.tu-bs.de> Dear SIGgers, when playing around with the xml-package I sent an ordinary html file through a slightly modified xml/demo/dom/html2html.py. The output was html, too. Almost, because except '<', '&' and '>' all other entities vanished :-(( You can see it in the output of the original html2html. The data contains the word 'trouvés' which in the html output becomes 'trouvs' My solution (the experts of you have decide if this was alright): xml.dom.writer.HtmlWriter derives from xml.dom.writer.XmlWriter which has a method doText. The last line says self.stream.write(escape(data)) xml.utils.escape() just 'escapes' thos three entities mentiond above. But it may be called with an extra table for entities to be converted. I modified XmlWriter a little: I added self.escapes={} to __init__() and in doText the last line now is self.stream.write(escape(data, self.escapes)) In html2html I now build the almost invers version of htmlentitydefs.entitydefs but leave out <, &, and >. (My routine MakeEscapes()) The lines w = HtmlWriter() w.write(b.document) became w = HtmlWriter() w.escapes = MakeEscapes() w.write(b.document) It works but not perfectly. In another text I had an image Nächster which becomes N&auml;chster The solution for this problem I didn't found yet :-( Greetings Dierk Hoeppner Universitaetsbibliothek Pockelsstr. 13 D-38106 Braunschweig Germany Tel: +49-531-391-5066 Fax: -5836 E-Mail: d.hoeppner@tu-bs.de From akuchlin@mems-exchange.org Fri Aug 13 14:41:20 1999 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 13 Aug 1999 09:41:20 -0400 (EDT) Subject: [XML-SIG] pyDOM NamedNodeMap - bug report and problem In-Reply-To: <14259.47828.462367.393783@lindm.dm> References: <3.0.6.32.19990812152621.00965710@gpo.iol.ie> <14259.47828.462367.393783@lindm.dm> Message-ID: <14260.8320.385791.157498@amarok.cnri.reston.va.us> Dieter Maurer writes: >For some unknown reason (a bug, I think), >the real attribute information is in the "children[0]" attribute >of the returned "item". The DOM implementation builds an internal tree of objects of class _node; however, users of the implementation never see a _node instance, but instead an instance of Element, Text, or whatever, that acts as a proxy for the _node instance. The user-visible proxy holds the parent pointer for the node, thus avoiding creating a cycle of references and leaking memory. To create the proxy for a _node n, you use NODE_CLASS[ n.type ](n, self._document ) . As a note to users, if you *ever* get returned a _node instance, that is a bug and should be reported. -- A.M. Kuchling http://starship.python.net/crew/amk/ The story so far: In the beginning the Universe was created. This has made a lot of people very angry and has been widely regarded as a bad move. -- Douglas Adams, _The Restaurant at the End of the Universe_ From Fred L. Drake, Jr." References: <199908121706.TAA00810@lindm.dm> Message-ID: <14260.9408.396713.728418@weyr.cnri.reston.va.us> --Apu33M+PUU Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit Dieter Maurer writes: > "HtmlBuilder" translates '&' into an entity reference. > This does not follow the DOM spec. It specifies that > character references are expected to be expanded by the > HTML/XML processor. > > "XmlWriter/HtmlWriter" does not output the 'amp' entity reference. > This, obviously, is a bug in "XmlWriter/HtmlWriter". No, but if & is present as data, it writes out &, so I think that's OK. > By the way, processing instructions are not output, too. You you sure they're in your tree? What I see is that they are output, but using the XML-style syntax: instead of . I've checked in a fix that allows HtmlWriter to produce SGML-style PIs. This *doesn't* do anything to change the handling of PIs as (target, value) tuples; this was a concept introduced in some of the XML APIs (not even XML itself as I understand it). The patch to xml/dom/writer.py is attached; it also teaches the *Lineariser classes to use cStringIO when available. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives --Apu33M+PUU Content-Type: text/plain Content-Description: xml/dom/writer.py patch Content-Disposition: inline; filename="PATCH" Content-Transfer-Encoding: 7bit Index: writer.py =================================================================== RCS file: /home/cvsroot/xml/dom/writer.py,v retrieving revision 1.9 retrieving revision 1.10 diff -c -r1.9 -r1.10 *** writer.py 1999/04/28 02:42:19 1.9 --- writer.py 1999/08/13 13:50:18 1.10 *************** *** 124,131 **** class XmlLineariser(XmlWriter): def __init__(self): ! import StringIO ! self.buffer = StringIO.StringIO() XmlWriter.__init__(self, self.buffer) def linearise(self, node): --- 124,134 ---- class XmlLineariser(XmlWriter): def __init__(self): ! try: ! from cStringIO import StringIO ! except ImportError: ! from StringIO import StringIO ! self.buffer = StringIO() XmlWriter.__init__(self, self.buffer) def linearise(self, node): *************** *** 169,180 **** self._setNewLines(nl_dict) class HtmlLineariser(HtmlWriter): def __init__(self): ! import StringIO ! self.buffer = StringIO.StringIO() HtmlWriter.__init__(self, self.buffer) def linearise(self, node): --- 172,192 ---- self._setNewLines(nl_dict) + def doOtherNode(self, node): + if node.get_nodeType() == PROCESSING_INSTRUCTION_NODE: + self.stream.write("" % (node.target, node.value)) + else: + XmlWriter.doOtherNode(self, node) + class HtmlLineariser(HtmlWriter): def __init__(self): ! try: ! from cStringIO import StringIO ! except ImportError: ! from StringIO import StringIO ! self.buffer = StringIO() HtmlWriter.__init__(self, self.buffer) def linearise(self, node): --Apu33M+PUU-- From akuchlin@mems-exchange.org Fri Aug 13 17:11:19 1999 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 13 Aug 1999 12:11:19 -0400 (EDT) Subject: [XML-SIG] CVS tree reorg imminent Message-ID: <199908131611.MAA13940@amarok.cnri.reston.va.us> A while back we discussed tidying up the directory structure of the XML-SIG's code. I'd like to do the rearrangement sometime this weekend. This may disrupt the tree for a bit while things settle down after the rearrangement, so people who follow the CVS tree should be aware of the impending changes. On the bright side, this should make things much neater and allow us to simplify the installation process. -- A.M. Kuchling http://starship.python.net/crew/amk/ My early and invincible love of reading, which I would not exchange for the treasures of India... -- Edward Gibbon From Jeff.Johnson@icn.siemens.com Fri Aug 13 17:46:02 1999 From: Jeff.Johnson@icn.siemens.com (Jeff.Johnson@icn.siemens.com) Date: Fri, 13 Aug 1999 12:46:02 -0400 Subject: [XML-SIG] XML 0.5.1 bug: 'amp' character reference not handled correctly by "HtmlBuilder/HtmlWriter" Message-ID: <852567CC.005BDFE0.00@li01.lm.ssc.siemens.com> I had similar problems a while back and came up with the following hack (nobody seemed to think it was problem so I had to fix it myself)... I have no idea if this is a good fix but it seemed to fix most of my problems... class MyHtmlBuilder(HtmlBuilder): def handle_charref(self, name): #print name try: n = string.atoi(name) except string.atoi_error: self.unknown_charref(name) return # JCJ 1999-06-11: This turns µ into chr(181) which when saved # back as HTML, is no good. #if not 0 <= n <= 255: if not 0 <= n <= 127: self.unknown_charref(name) return self.handle_data(chr(n)) def unknown_charref(self, ref): #gLog.Warning('unknown_charref %s' % ref) Builder.entityref(self, '#' + ref) def unknown_entityref(self, ref): gLog.Error('unknown_entityref %s' % ref) Dieter Maurer on 08/12/99 01:06:22 PM To: xml-sig@python.org cc: (bcc: Jeff Johnson/Service/ICN) Subject: [XML-SIG] XML 0.5.1 bug: 'amp' character reference not handled correctly by "HtmlBuilder/HtmlWriter" "HtmlBuilder" translates '&' into an entity reference. This does not follow the DOM spec. It specifies that character references are expected to be expanded by the HTML/XML processor. "XmlWriter/HtmlWriter" does not output the 'amp' entity reference. This, obviously, is a bug in "XmlWriter/HtmlWriter". By the way, processing instructions are not output, too. I have fixed my "&" problem by adding "amp" to the "expand_entities" tuple in "HtmlBuilde". This, however, is not a general solution. - Dieter _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://www.python.org/mailman/listinfo/xml-sig From dieter@handshake.de Fri Aug 13 17:59:35 1999 From: dieter@handshake.de (Dieter Maurer) Date: Fri, 13 Aug 1999 18:59:35 +0200 (CEST) Subject: [XML-SIG] XML 0.5.1 bug: 'amp' character reference not handled correctly by "HtmlBuilder/HtmlWriter" In-Reply-To: <14260.9408.396713.728418@weyr.cnri.reston.va.us> References: <199908121706.TAA00810@lindm.dm> <14260.9408.396713.728418@weyr.cnri.reston.va.us> Message-ID: <14260.19671.741803.54779@lindm.dm> Fred L. Drake, Jr. writes: > Dieter Maurer writes: > > "HtmlBuilder" translates '&' into an entity reference. > > This does not follow the DOM spec. It specifies that > > character references are expected to be expanded by the > > HTML/XML processor. > > > > "XmlWriter/HtmlWriter" does not output the 'amp' entity reference. > > This, obviously, is a bug in "XmlWriter/HtmlWriter". > > No, but if & is present as data, it writes out &, so I think > that's OK. I do not think, it is correct. HTML input files should contain '&' rather than '&', because '&' may yield invalid HTML code. "HtmlBuilder" translates "&" into something "HtmlWriter" ignores. I think, this is a bug. > > By the way, processing instructions are not output, too. > > You you sure they're in your tree? What I see is that they are > output, but using the XML-style syntax: instead of > . In fact, I did not test it at all -- sorry! I looked at the sources and did not see a definition for Entity and Processing Instruction output. Private mail with Dierk Hoeppner suggests that some magic in XMLWriter processes entity references. You now tell me that processing instructions are magically processed. Seems, that I must have a closer look at this code. By the way, my copy of "write.py" (from the distribution tar) has an empty "XmlWriter.doOtherNode". Thank you for your comment - Dieter From Chance@hotmail.com Sun Aug 15 03:01:20 1999 From: Chance@hotmail.com (Chance@hotmail.com) Date: zo, 15 aug 1999 02:01:20 Subject: [XML-SIG] YOU Can make $50,000 or more in 90 Days!!! Message-ID: <199908142327.TAA18806@python.org> THE PROGRAM $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ INCREDIBLE $0 to $50,000 in 90 days!!! Dear Friend, You can earn $50,000 or more in next the 90 days sending e-mail. Seem impossible? Read on for details. "AS SEEN ON NATIONAL TV" Thank you for your time and interest. This is the letter you've been reading about in the news lately. Due to the popularity of this letter on the Internet, a major nightly news program recently devoted an entire show to the investigation of the program described below to see if it really can make people money. The show also investigated whether or not the program was legal. Their findings proved once and for all that there are absolutely no laws prohibiting the participation in the program. This has helped to show people that this is a simple, harmless and fun way to make some extra money at home. The results of this show have been truly remarkable. So many people are participating that those involved are doing much better than ever before. Since everyone makes more as more people try it out, its been very exciting to be a part of lately. You will understand once you experience it. HERE IT IS BELOW: *** Print This Now For Future Reference *** The following income opportunity is one you may be interested in taking a look at. It can be started with VERY LITTLE investment and the income return is TREMENDOUS!!! $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ If you would like to make at least $50,000 in less than 90 days ! Please read the enclosed program...THEN READ IT AGAIN!!! $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ THIS IS A LEGITIMATE, LEGAL, MONEY MAKING OPPORTUNITY.It does not require you to come into contact with people, do any hard work, and best of all, you never have to leave the house except to get the mail. If you believe that someday you'll get that big break that you 'vebeen waiting for, THIS IS IT! Simply follow the instructions, andyour dreams will come true. This multi-level e-mail order marketingprogram works perfectly...100% EVERY TIME. E-mail is the sales tool of the future. Take advantage of this non-commercialized method of advertising NOW!!! The longer you wait, the more people will be doing business using e-mail. Get your piece of this action!!! MULTI-LEVEL MARKETING (MLM) has finally gained respectability. It is being taught in the Harvard Business School, and both Stanford Research and the Wall Street Journal have stated that between 50% and 65% of all goods and services will be sold through multi-level methods by the mid to late 1990's. This is a Multi-Billion Dollar industry and of the 500,000 millionaires in the U.S., 20% (100,000) made their fortune in the last several years in MLM. Moreover, statistics show 45 people become millionaires everyday through Multi-Level Marketing. You may have heard this story before, but over the summer Donald Trump made an appearance on the David Letterman show. Dave asked him what he would do if he lost everything and had to start over from scratch. Without hesitating, Trump said he would find a good network marketing company and get to work. The audience started to hoot and boo him. He looked out at the audience and dead-panned his response: "That's why I'm sitting up here and you are all sitting out there!" The enclosed information is something I almost let slip through my fingers. Fortunately, sometime later I re-read everything and gave somethought and study to it. My name is Johnathon Rourke. Two years ago, the corporation I worked at for the past twelve years down-sized and my position was eliminated. After unproductive job interviews, I decided to open my own business. Over the past year, I incurred many unforeseen financial problems. I owed my family, friends and creditors over $35,000. The economy was taking a toll on my business and I just couldn't seem to make ends meet. I had to refinance and borrow against my home to support my family and struggling business. AT THAT MOMENT something significant happened in my life and I am writing to share the experience in hopes that this will change your life FOREVER FINANCIALLY!!! In mid December, I received this program via e-mail. Six month's prior to receiving this program I had been sending away for information on various business opportunities. All of the programs I received, in my opinion, were not cost effective. They were either too difficult for me to comprehend or the initial investment was too much for me to risk to see if they would work or not. One claimed that I would make a million dollars in one year...it didn't tell me I'd have to write a book to make it! But like I was saying, in December of 1997 I received this program. I didn't send for it, or ask for it, they just got my name off a mailing list.THANK GOODNESS FOR THAT!!! After reading it several times, to make sure I was reading it correctly, I couldn't believe my eyes. Here was a MONEY MAKING PHENOMENON. I could invest as much as I wanted to start, without putting me further into debt. After I got a pencil and paper and figured it out, I would at least get my money back. But like most of you I was still a little skeptical and a little worried about the legal aspects of it all. So I checked it out with the U.S. Post Office (1-800-725-2161 24-hrs) and they confirmed that it is indeed legal! After determining the program was LEGAL and NOT A CHAIN LETTER, I decided "WHY NOT." Initially I sent out 10,000 e-mails. It cost me about $15 for my time on-line. The great thing about e-mail is that I don't need any money for printing to send out the program, and because all of my orders are fulfilled via e-mail, my only expense is my time. I am telling you like it is I hope it doesn't turn you off, but I promised myself that I would not "rip-off" anyone, no matter how much money it made me. In less than one week, I was starting to receive orders for REPORT #1 By January 13, I had received 26 orders for REPORT #1. Your goal is to "RECEIVE at least 20 ORDERS FOR REPORT #1 WITHIN 2 WEEKS. IF YOU DON'T, SEND OUT MORE PROGRAMS UNTIL YOU DO!" My first step in making $50,000 in 90 days was done. By January 30, I had received 196 orders for REPORT #2. Your goal is to "RECEIVE AT LEAST 100+ ORDERS FOR REPORT #2 WITHIN 2 WEEKS. IF NOT, SEND OUT MORE PROGRAMS UNTIL YOU DO. ONCE YOU HAVE 100 ORDERS, THE REST IS EASY, RELAX, YOU WILL MAKE YOUR $50,000 GOAL." Well, I had 196 orders for REPORT #2, 96 more than I needed. So I sat back and relaxed. By March 1, of my e-mailing of 10,000, I received $58,000 with more coming in every day. I paid off ALL my debts and bought a much needed new car. Please take time to read the attached program, IT WILL CHANGE YOUR LIFE FOREVER!! ! Remember, it won't work if you don't try it. This program does work , but you must follow it EXACTLY! Especially the rules of not trying to place your name in a different place. It won't work and you'll lose out on a lot of money! In order for this program to work, you must meet your goal of 20+ orders for REPORT #1, and 100+ orders for REPORT #2 and you will make $50,000 or more in 90 days. I AM LIVING PROOF THAT IT WORKS!!! If you choose not to participate in this program, I am sorry. It really is a great opportunity with little cost or risk to you. If you choose to participate, follow the program and you will be on your way to financial security. If you are a fellow business owner and are in financial trouble like I was, or you want to start your own business, consider this a sign. I DID! Sincerely, Johnathon Rourke A PERSONAL NOTE FROM THE ORIGINATOR OF THIS PROGRAM: By the time you have read the enclosed program and reports, you should have concluded that such a program, and one that is legal, could not have been created by an amateur. Let me tell you a little about myself. I had a profitable business for 10 years. Then in 1979 my business began falling off. I was doing the same things that were previously successful for me, but it wasn't working. Finally, I figured it out. It wasn't me, it was the economy. Inflation and recession had replaced the stable economy that had been with us since 1945.I don't have to tell you what happened to the unemployment rate... because many of you know from first hand experience. There were more failures and bankruptcies than ever before. The middle class was vanishing. Those who knew what they were doing invested wisely and moved up. Those who did not, including those who never had anything to save or invest, were moving down into the ranks of the poor. As the saying goes, "THE RICH GET RICHER AND THE POOR GET POORER." The traditional methods of making money will never allow you to "move up" or "get rich", inflation will see to that. You have just received information that can give you financial freedom for the rest of your life, with "NO RISK" and "JUST A LITTLE BIT OF EFFORT." You can make more money in the next few months than you have ever imagined. I should also point out that I will not see a penny of this money, nor anyone else who has provided a testimonial for this program. I have already made over 4 MILLION DOLLARS!I have retired from the program after sending thousands and thousands of programs. Follow the program EXACTLY AS INSTRUCTED. Do not change it in any way It works exceedingly well as it is now. Remember to e-mail a copy of this exciting report to everyone you can think of. One of the people you send this to may send out 50,000...and your name will be on everyone of them! Remember though, the more you send out the more potential customers you will reach. So my friend, I have given you the ideas, information, materials and opportunity to become financially independent. IT IS UP TO YOU NOW! "THINK ABOUT IT" Before you delete this program from your mailbox, as I almost did, take a little time to read it and REALLY THINK ABOUT IT. Get a pencil and figure out what could happen when YOU participate. Figure out the worst possible response and no matter how you calculate it, you will still make a lot of money! You will definitely get back what you invested. Any doubts you have will vanish when your first orders come in. IT WORKS! Jody Jacobs, Richmond, VA HERE'S HOW THIS AMAZING PROGRAM WILL MAKE YOU THOUSANDS OF DOLLAR$ INSTRUCTIONS: This method of raising capital REALLY WORKS 100% EVERY TIME. I am sure that you could use up to $50,000 or more in the next 90 days. Before you say "BULL... ", please read this program carefully. This is not a chain letter, but a perfectly legal money making opportunity. Basically, this is what you do: As with all multi-level businesses, we build our business by recruiting new partners and selling our products. Every state in the USA allows you to recruit new multi-level business partners, and we offer a product for EVERY dollar sent. YOUR ORDERS COME BY MAIL AND ARE FILLED BY E-MAIL, so you are not involved in personal selling. You do it privately in your own home, store or office. This is the GREATEST Multi-Level Mail Order Marketing anywhere. This is what you MUST do: 1. Order all 4 reports shown on the list below (you can't sell them if youdon't order them). -- For each report, send $5.00 CASH, the NAME & NUMBER OF THE REPORT YOU ARE ORDERING, YOUR E-MAIL ADDRESS, and YOUR NAME & RETURN ADDRESS (in case of a problem) to the person whose name appears on the list next to the report. MAKE SURE YOUR RETURN ADDRESS IS ON YOUR ENVELOPE IN CASE OF ANY MAIL PROBLEMS! -- When you place your order, make sure you order each of the four reports. You will need all four reports so that you can save them on your computer and resell them. -- Within a few days you will receive, via e-mail, each of the four reports. Save them on your computer so they will be accessible for you to send to the 1,000's of people who will order them from you. 2. IMPORTANT DO NOT alter the names of the people who are listed next to each report, or their sequence on the list, in any way other than is instructed below in steps "a" through "f" or you will lose out on the majority of your profits. Once you understand the way this works, you'll also see how it doesn't work if you change it. Remember, this method has been tested,and if you alter it, it will not work. a. Look below for the listing of available reports. b. After you've ordered the four reports, take this advertisement and remove the name and address under REPORT #4. This person has made it through the cycle and is no doubt counting their $50,000! c. Move the name and address under REPORT #3 down to REPORT #4. d. Move the name and address under REPORT #2 down to REPORT #3. e. Move the name and address under REPORT #1 down to REPORT #2. f. Insert your name/address in the REPORT #1 position. Please make sure you COPY ALL INFORMATION, every name and address, ACCURATELY! 3. Take this entire letter, including the modified list of names, and save it to your computer. Make NO changes to the instruction portion of this letter. Your cost to participate in this is practically nothing (surely you can afford $20). You obviously already have an Internet connection and e-mail is FREE! There are two primary methods of building your downline: METHOD #1: SENDING BULK E-MAIL Let's say that you decide to start small, just to see how it goes, and we'll assume you and all those involved send out only 2,000 programs each. Let's also assume that the mailing receives a 0.5% response. Using a good list the response could be much better. Also, many people will send out hundreds of thousands of programs instead of 2,000. But continuing with this example, you send out only 2,000 programs. With a 0.5% response, that is only 10 orders for REPORT #1. Those 10 people respond by sending out 2,000 programs each for a total of 20,000. Out of those 0.5%, 100 people respond and order REPORT #2. Those 100 mail out 2,000 programs each for a total of 200,000. The 0.5% response to that is 1,000 orders for REPORT #3. Those 1,000 send out 2,000 programs each for a 2,000,000 total. The 0.5% response to that is 10,000 orders for REPORT #4. That's 10,000 $5 bills for you. CASH!!! Your total income in this example is $50 + $500 + $5,000 + $50,000 for a total of $55,550!!! REMEMBER FRIEND, THIS IS ASSUMING 1,990 OUT OF THE 2,000 PEOPLE YOU MAIL TO WILL DO ABSOLUTELY NOTHING AND TRASH THIS PROGRAM! DARE TO THINK FOR A MOMENT WHAT WOULD HAPPEN IF EVERYONE, OR HALF SENT OUT 100,000 PROGRAMS INSTEAD OF 2,000. Believe me, many people will do justthat, and more! By the way, your cost to participate in this is practically nothing. You obviously already have an Internet connection and e-mail is FREE!!! REPORT #2 will show you the best methods for bulk e-mailing, tell you where to obtain free bulk e-mail software and where to obtain e-mail lists. METHOD #2 - PLACING FREE ADS ON THE INTERNET Advertising on the internet is very, very inexpensive, and there are HUNDREDS of FREE places to advertise. Let's say you decide to start small just to see how well it works. Assume your goal is to get ONLY 10 people to participate on your first level. (Placing a lot of FREE ads on the Internet will EASILY get a larger response.) Also assume that everyone else in YOUR ORGANIZATION gets ONLY 10 downline members. Follow this example to achieve the STAGGERING results below: 1st level--your 10 members with $5.......................................$50 2nd level--10 members from those 10 ($5 x 100)..................$500 3rd level--10 members from those 100 ($5 x 1,000)...........$5,000 4th level--10 members from those 1,000 ($5 x 10,000).....$50,000 THIS TOTALS ----------$55,550 Remember friends, this assumes that the people who participate only recruit 10 people each. Think for a moment what would happen if they got 20 people to participate! Most people get 100's of participants! THINK ABOUT IT! For every $5.00 you receive, all you must do is e-mail them the report they ordered. THAT'S IT! ALWAYS PROVIDE SAME-DAY SERVICE ON ALL ORDERS! This will guarantee that the e-mail THEY send out with YOUR name and address on it will be prompt because they can't advertise until they receive the report! AVAILABLE REPORTS *** Order Each REPORT by NUMBER and NAME *** Notes: -- ALWAYS SEND $5 CASH (U.S. CURRENCY) FOR EACH REPORT. CHECKS NOT ACCEPTED. -- ALWAYS SEND YOUR ORDER VIA FIRST CLASS MAIL. -- Make sure the cash is concealed by wrapping it in at least two sheets of paper. On one of those sheets of paper, include: (a) the number & name of the report you are ordering, (b) your e-mail address, and (c) your name & postal address. PLACE YOUR ORDER FOR THESE REPORTS NOW: REPORT #1 "The Insider's Guide to Advertising for Free on the Internet" ORDER REPORT #1 FROM: David Jonsson Helperwestsingel 53A1 9721 BC Groningen NL REPORT #2 "The Insider's Guide to Sending Bulk E-mail on the Internet" ORDER REPORT #2 FROM: Ed Turpin 1577 C.R. 236 Clyde, OH 43410 REPORT #3 "The Secrets to Multilevel Marketing on the Internet" ORDER REPORT #3 FROM: D. Cross 365 N. Abbe Rd. Elyria, OH 44035 REPORT #4 "How to become a Millionaire utilizing the Power of Multilevel Marketing and the Internet" ORDER REPORT #4 FROM: J. Hansen P.O. Box 93055 19705 Fraser Hwy Langley, BC. Canada, V3A 8H2 About 50,000 new people get online every month! ******* TIPS FOR SUCCESS ******* -- TREAT THIS AS YOUR BUSINESS! Be prompt, professional, and follow the directions accurately. -- Send for the four reports IMMEDIATELY so you will have them when the orders start coming in because: When you receive a $5 order, you MUST send out the requested product/report. -- ALWAYS PROVIDE SAME-DAY SERVICE ON THE ORDERS YOU RECEIVE. -- Be patient and persistent with this program. If you follow the instructions exactly, your results WILL BE SUCCESSFUL! -- ABOVE ALL, HAVE FAITH IN YOURSELF AND KNOW YOU WILL SUCCEED! ******* YOUR SUCCESS GUIDELINES ******* Follow these guidelines to guarantee your success: If you don't receive 20 orders for REPORT #1 within two weeks, continue advertising or sending e-mails until you do. Then, a couple of weeks later you should receive at least 100 orders for REPORT#2. If you don 't, continue advertising or sending e-mails until you do. Once you have received 100 or more orders for REPORT #2, YOU CAN RELAX, because the system is already working for you, and the cash will continue to roll in! THIS IS IMPORTANT TO REMEMBER: Every time your name is moved down on the list, you are placed in front of a DIFFERENT report. You can KEEP TRACK of your PROGRESS by watching which report people are ordering from you. If you want to generate more income, send another batch of e-mails or continue placing ads and start the whole process again! There is no limit to the income you will generate from this business! Before you make your decision as to whether or not you participate in this program. Please answer one question. DO YOU WANT TO CHANGE YOUR LIFE? If the answer is yes, please look at the following facts about this program: 1. You are selling a product which does not Cost anything to PRODUCE, SHIP OR ADVERTISE. 2. All of your customers pay you in CASH! 3. E-mail is without question the most powerful method of distributing information on earth. This program combines the distribution power of e-mail together with the revenue generating power of multi-level marketing. 4. Your only expense--other than your initial $20 investment--is your time! 5. Virtually all of the income you generate from this program is PURE PROFIT! 6. This program will change your LIFE FOREVER. ACT NOW!Take your first step toward achieving financial independence. Orderthe reports and follow the program outlined above--SUCCESSwill be yourreward. Thank you for your time and consideration. PLEASE NOTE: If you need help with starting a business, registering a business name, learning how income tax is handled, etc., contact your localoffice of the Small Business Administration (a Federal Agency) 1-800-827-5722 for free help and answers to questions. Also, the InternalRevenue Service offers free help via telephone and free seminars aboutbusiness tax requirements. Your earnings are highly dependant on youractivities and advertising. The information contained on this site and in the report constitutes no guarantees stated nor implied. In the event that it is determined that this site or report constitutes a guarantee of any kind, that guarantee is now void. The earnings amounts listed on this site and in the report are estimates only. If you have any questions of the legality of this program, contact the Office of Associate Director for Marketing Practices, Federal Trade Commission, Bureau of Consumer Protection in Washington, DC. From akuchlin@mems-exchange.org Mon Aug 16 02:04:20 1999 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Sun, 15 Aug 1999 21:04:20 -0400 Subject: [XML-SIG] CVS tree reorganized Message-ID: <199908160104.VAA03692@207-172-146-60.s60.tnt3.ann.va.dialup.rcn.com> I've completed the rearrangement of the XML-SIG's CVS tree, though there are still some things left to tidy up. (For example, some test suite failures haven't yet been looked into.) The goal was to clean out the root directory of the distribution, and simplify the installation process. The important points are: * You now want to use the '-P' option to CVS to prune empty directories; otherwise, you'll get lots of obsolete directories that are all empty. * Python modules that need to be installed are now in the 'xml' subdirectory; for example, the 'dom', 'arch', and 'sax' subdirectories have all moved down into 'xml'. Python files that don't get installed, like those in 'demo' and 'test', are still where they were. * C extensions are now in the 'extensions' subdirectory. * Binaries for Windows and MacOS should go in the 'windows' and 'mac' directories. * Installation has been changed to follow the procedures set by the Distutils-SIG. An end user will run a Python script, setup.py. It can be given one of three arguments: 'build', 'test', and 'install'. (Note that it doesn't actually use any Distutils code, but simply tries to present a similar user interface.) The 'build' target will create a subdirectory named 'build', and copy the 'xml/' subdirectory into 'build', and will then copy compiled C extensions into build/ at the proper locations. On Unix it will also compile the C extensions; on Windows and Mac, it should copy binary files like DLLs and PYDs into the build/ subdirectory. (Volunteers to implement that are needed.) 'install' is then a simple matter of copying the build/xml/ tree to the installation location. The point of the setup.py scheme is to simplify installation on compilerless systems because the build process is reduced to some file copying. However, we need someone to write the relevant copying bits for Windows and Mac, because I'm not sure what's legal. In any case, I'm sure there are inadvertent breakages from this change; please try out the CVS tree and report problems. -- A.M. Kuchling http://starship.python.net/crew/amk/ When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl. -- Anonymous From paul@prescod.net Tue Aug 24 21:13:43 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 24 Aug 1999 16:13:43 -0400 Subject: [XML-SIG] CVS tree reorganized References: <199908160104.VAA03692@207-172-146-60.s60.tnt3.ann.va.dialup.rcn.com> Message-ID: <37C2FCF7.91398525@prescod.net> A.M. Kuchling wrote: > > * Binaries for Windows and MacOS should go in the 'windows' and > 'mac' directories. That should include xmlparse.dll and xmltok.dll. In the current distribution those go into "expat/bin" which is never in anyone's path. Also drv_expat expects pyexpat to be in xml.parsers which it isn't in the current distribution. Paul Prescod From Fred L. Drake, Jr." References: <199908160104.VAA03692@207-172-146-60.s60.tnt3.ann.va.dialup.rcn.com> <37C2FCF7.91398525@prescod.net> Message-ID: <14275.4031.17751.874877@weyr.cnri.reston.va.us> Paul Prescod writes: > Also drv_expat expects pyexpat to be in xml.parsers which it isn't in > the current distribution. The pyexpact project files for the Mac development environment should probably go under extensions as well, instead of a top-level pyexpat/ directory. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From paul@prescod.net Tue Aug 24 21:22:49 1999 From: paul@prescod.net (Paul Prescod) Date: Tue, 24 Aug 1999 16:22:49 -0400 Subject: [XML-SIG] drv_htmlllib Message-ID: <37C2FF19.A2B464F8@prescod.net> In pylibs.py, there is a comment that says: #handle_starttag is never called! In accordance with the comment, there is no definition for handle_starttag. There is a (seemingly correct) definition for unknown_starttag but it doesn't seem to ever get called. This seems to fix it: def handle_starttag( self, tag, method, attributes ): self.unknown_startag( tag, attributes ) I don't know why that wasn't in to begin with. The mysterious comment probably has something to do with it. Paul Prescod From lmariusg@ifi.uio.no Wed Aug 25 07:43:07 1999 From: lmariusg@ifi.uio.no (Lars Marius Garshol) Date: 25 Aug 1999 08:43:07 +0200 Subject: [XML-SIG] drv_htmlllib In-Reply-To: <37C2FF19.A2B464F8@prescod.net> References: <37C2FF19.A2B464F8@prescod.net> Message-ID: * Paul Prescod | | In pylibs.py, there is a comment that says: | | #handle_starttag is never called! That was put in by me. I seem to recall that when I first wrote the *mllib drivers that method was for some reason never called, and so I just left it empty. | In accordance with the comment, there is no definition for | handle_starttag. There is a (seemingly correct) definition for | unknown_starttag but it doesn't seem to ever get called. Hmmm. Maybe something to do with version mismatches? | This seems to fix it: | | def handle_starttag( self, tag, method, attributes ): | self.unknown_startag( tag, attributes ) Yup, this is correct (and I have it in my CVS tree already). I have to get my act together soon and put out a new set of releases for SAX. The time when I can do that is getting much closer, but is not there yet. --Lars M. From c.evans@clear.net.nz Wed Aug 25 12:37:33 1999 From: c.evans@clear.net.nz (Carey Evans) Date: 25 Aug 1999 23:37:33 +1200 Subject: [XML-SIG] PyDOM performance Message-ID: <87ogfwm6pu.fsf@psyche.evansnet> --=-=-= Hi. I've been rather disappointed with the speed when trying out the DOM support in the XML 0.5.1 package. To construct a tree of the fairly simple document at http://home.clear.net.nz/pages/c.evans/diary/hols199901.xml took about 45 seconds. I tried out the CVS tree and got this down to 17.8 seconds, which is quite an impressive improvement by itself, when PyDOM doesn't seem to have changed much. Looking at this with the profiler, dom/core.py spends a *lot* of time in __getattr__ and __setattr__. I didn't have anything better to do, so I rewrote these methods and got the time down to 11.7 seconds. I've attached the patch to do this below. My questions are: Is what I'm doing in this patch actually working, or am I on the wrong track? And, is it worth doing anything to PyDOM, or would I be better off looking at 4DOM, for example? Thanks. -- Carey Evans http://home.clear.net.nz/pages/c.evans/ "This is where your sanity gives in..." --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=dom-core.diff --- core.py.dist Fri Aug 13 14:33:42 1999 +++ core.py Wed Aug 25 23:03:37 1999 @@ -323,16 +323,18 @@ # to attributes such as .parentNode are redirected into calls to # get_parentNode or set_parentNode. def __getattr__(self, key): - if key[0:4] == 'get_' or key[0:4] == 'set_': - raise AttributeError, repr(key[4:]) - func = getattr(self, 'get_'+key) - return func() + method = self._get_dict.get(key) + if method is not None: + return method(self) + else: + raise AttributeError, key def __setattr__(self, key, value): - if hasattr(self, 'set_'+key): - func = getattr(self, 'set_'+key) - func( value ) - self.__dict__[key] = value + method = self._set_dict.get(key) + if method is not None: + method(self, value) + else: + self.__dict__[key] = value def __cmp__(self, other): if isinstance(other, Node): @@ -637,6 +639,19 @@ "%s is an ancestor of %s" % (repr(child), repr(parent) ) p = p.get_parentNode() + # Dictionaries of allowed get/set properties. + _get_dict = { + 'nodeName': get_nodeName, 'name': get_name, + 'nodeValue': get_nodeValue, 'value': get_value, + 'nodeType': get_nodeType, 'attributes': get_attributes, + 'childNodes': get_childNodes, 'parentNode': get_parentNode, + 'firstChild': get_firstChild, 'lastChild': get_lastChild, + 'previousSibling': get_previousSibling, + 'nextSibling': get_nextSibling, + 'ownerDocument': get_ownerDocument, + } + _set_dict = {} + class CharacterData(Node): # Attributes @@ -733,7 +748,14 @@ d.name = "#text" d.value = value return Text(d, self._document) - + + # Dictionaries of allowed get/set properties. + _get_dict = Node._get_dict.copy() + _get_dict.update({ 'data': get_data, 'length': get_length }) + _set_dict = Node._set_dict.copy() + _set_dict.update({ 'data': set_data, 'nodeValue': set_nodeValue }) + + class Attr(Node): childNodeTypes = [TEXT_NODE, ENTITY_REFERENCE_NODE] @@ -789,7 +811,23 @@ def get_parentNode(self): return None def get_previousSibling(self): return None def get_nextSibling(self): return None - + + # Dictionaries of allowed get/set properties. + _get_dict = Node._get_dict.copy() + _get_dict.update({ + 'nodeName': get_nodeName, 'name': get_name, + 'nodeValue': get_nodeValue, 'value': get_value, + 'specified': get_specified, + 'parentNode': get_parentNode, + 'previousSibling': get_previousSibling, + 'nextSibling': get_nextSibling, + }) + _set_dict = Node._set_dict.copy() + _set_dict.update({ + 'nodeValue': set_nodeValue, 'value': set_value, + }) + + class Element(Node): childNodeTypes = [ELEMENT_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE, TEXT_NODE, CDATA_SECTION_NODE, ENTITY_REFERENCE_NODE] @@ -971,6 +1009,11 @@ if L[i].type == ELEMENT_NODE: n = NODE_CLASS[ L[i].type ] (L[i], self._document) n.normalize() + + # Dictionaries of allowed get/set properties. + _get_dict = Node._get_dict.copy() + _get_dict.update({ 'tagName': get_tagName, 'attributes': get_attributes }) + class Text(CharacterData): childNodeTypes = [] @@ -1040,6 +1083,13 @@ def toxml(self): return '\n' % (self._node.name,) + + # Dictionaries of allowed get/set properties. + _get_dict = Node._get_dict.copy() + _get_dict.update({ + 'name': get_name, 'entities': get_entities, + 'notations': get_notations }) + class Notation(Node): readonly = 1 # This is a read-only class @@ -1061,7 +1111,11 @@ return '' % (self._node.name, self._node.publicId, self._node.systemId) - + + # Dictionaries of allowed get/set properties. + _get_dict = Node._get_dict.copy() + _get_dict.update({ 'publicId': get_publicId, 'systemId': get_systemId }) + class Entity(Node): readonly = 1 # This is a read-only class @@ -1077,6 +1131,14 @@ def get_notationName(self): return self._node.notationName + # Dictionaries of allowed get/set properties. + _get_dict = Node._get_dict.copy() + _get_dict.update({ + 'publicId': get_publicId, 'systemId': get_systemId, + 'notationName': get_notationName + }) + + class EntityReference(Node): childNodeTypes = [ELEMENT_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE, TEXT_NODE, CDATA_SECTION_NODE, @@ -1106,6 +1168,12 @@ raise NoModificationAllowedException("Read-only object") self._node.value = data + # Dictionaries of allowed get/set properties. + _get_dict = Node._get_dict.copy() + _get_dict.update({ 'target': get_target, 'data': get_data }) + _set_dict = Node._set_dict.copy() + _set_dict.update({ 'data': get_data }) + class Document(Node): childNodeTypes = [ELEMENT_NODE, PROCESSING_INSTRUCTION_NODE, @@ -1325,6 +1393,17 @@ Node.replaceChild(self, newChild, oldChild) + # Dictionaries of allowed get/set properties. + _get_dict = Node._get_dict.copy() + _get_dict.update({ + 'doctype': get_doctype, + 'implementation': get_implementation, + 'childNodes': get_childNodes, + 'documentElement': get_documentElement, + 'ownerDocument': get_ownerDocument, + }) + + class DocumentFragment(Node): childNodeTypes = [ELEMENT_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE, TEXT_NODE, CDATA_SECTION_NODE, @@ -1341,7 +1420,12 @@ n = NODE_CLASS[ child.type ] (child, self._document) L.append(n.toxml()) return string.join(L, "") - + + # Dictionaries of allowed get/set properties. + _get_dict = Node._get_dict.copy() + _get_dict.update({ 'parentNode': get_parentNode }) + + # Dictionary mapping types to the corresponding class object NODE_CLASS = { --=-=-=-- From gstein@lyra.org Wed Aug 25 16:31:21 1999 From: gstein@lyra.org (Greg Stein) Date: Wed, 25 Aug 1999 08:31:21 -0700 (PDT) Subject: [XML-SIG] PyDOM performance In-Reply-To: <87ogfwm6pu.fsf@psyche.evansnet> Message-ID: If the DOM is not a specific requirement, and you simply need to translate XML into a usable form for Python, then you may want to look at my qp_xml module at http://www.lyra.org/greg/python/ Cheers, -g -- Greg Stein, http://www.lyra.org/ On 25 Aug 1999, Carey Evans wrote: > Hi. > > I've been rather disappointed with the speed when trying out the DOM > support in the XML 0.5.1 package. To construct a tree of the fairly > simple document at > > http://home.clear.net.nz/pages/c.evans/diary/hols199901.xml > > took about 45 seconds. I tried out the CVS tree and got this down to > 17.8 seconds, which is quite an impressive improvement by itself, when > PyDOM doesn't seem to have changed much. > > Looking at this with the profiler, dom/core.py spends a *lot* of time > in __getattr__ and __setattr__. I didn't have anything better to do, > so I rewrote these methods and got the time down to 11.7 seconds. > I've attached the patch to do this below. > > My questions are: > > Is what I'm doing in this patch actually working, or am I on the > wrong track? > > And, is it worth doing anything to PyDOM, or would I be better off > looking at 4DOM, for example? > > Thanks. > > -- > Carey Evans http://home.clear.net.nz/pages/c.evans/ > > "This is where your sanity gives in..." > > From Mike.Olson@FourThought.com Wed Aug 25 19:07:49 1999 From: Mike.Olson@FourThought.com (Mike Olson) Date: Wed, 25 Aug 1999 13:07:49 -0500 Subject: [XML-SIG] PyDOM performance References: <87ogfwm6pu.fsf@psyche.evansnet> Message-ID: <37C430F5.4488E2E2@FourThought.com> Carey Evans wrote: > > And, is it worth doing anything to PyDOM, or would I be better off > looking at 4DOM, for example? I don't think you will get much speed increase (infact it may be slower) with 4DOM. We wrote 4DOM more conscerned with meeting the W3c spec to the letter then speed. One note, we are going to rewite all of the tree stuff in 4DOM in Red Black or avl tree in C by the end of the month or early next month which should give us some speed increases. At that we will do some serious bench marks netween the 2 and work out a pythonic interface. Mike > > Thanks. > > -- > Carey Evans http://home.clear.net.nz/pages/c.evans/ > > "This is where your sanity gives in..." > > ------------------------------------------------------------------------ > > dom-core.diffName: dom-core.diff > Type: text/x-patch -- ---------------- Mike Olson Consulting Member FourThought LLC http://www.fourthought.com http://opentechnology.org From Mike.Olson@FourThought.com Wed Aug 25 19:49:32 1999 From: Mike.Olson@FourThought.com (Mike Olson) Date: Wed, 25 Aug 1999 13:49:32 -0500 Subject: [XML-SIG] PyDOM performance References: <87ogfwm6pu.fsf@psyche.evansnet> <37C430F5.4488E2E2@FourThought.com> Message-ID: <37C43ABB.8230DD30@FourThought.com> Out of curiosity I did a quick benchmark on the file you referenced and I was rather suprised by the results. Both tests where on a Celeron 366/ 128MB running linux. Neither test performed XML validation as I did not have a DTD. 4DOM was made with the orbless option and used the Ext.Builder.FromXmlFile method. the 4DOM version was from our cvstree which will be available by weeks end. Non validating 4DOM Start: 935605414.575 End: 935605415.11 Delta: 0.535288095474 pydom is version 0.5.1 from the RPMs and used the utils.FileReader() Non validating pydom Start: 935605415.148 End: 935605418.212 Delta: 3.0643119812 I was quite suprised by this. I don't know enough about the pydom internals to explain why it is slower. I just always assumed it was faster. We will still be looking to speed up 4DOM with the C implementation of the trees. Later Mike Mike Olson wrote: > Carey Evans wrote: > > > > > And, is it worth doing anything to PyDOM, or would I be better off > > looking at 4DOM, for example? > > I don't think you will get much speed increase (infact it may be slower) with > 4DOM. We wrote 4DOM more conscerned with meeting the W3c spec to the letter > then speed. > > One note, we are going to rewite all of the tree stuff in 4DOM in Red Black or > avl tree in C by the end of the month or early next month which should give us > some speed increases. At that we will do some serious bench marks netween the > 2 and work out a pythonic interface. > > Mike > > > > > Thanks. > > > > -- > > Carey Evans http://home.clear.net.nz/pages/c.evans/ > > > > "This is where your sanity gives in..." > > > > ------------------------------------------------------------------------ > > > > dom-core.diffName: dom-core.diff > > Type: text/x-patch > > -- > ---------------- > Mike Olson > Consulting Member > FourThought LLC > http://www.fourthought.com http://opentechnology.org > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://www.python.org/mailman/listinfo/xml-sig -- ---------------- Mike Olson Consulting Member FourThought LLC http://www.fourthought.com http://opentechnology.org From Fred L. Drake, Jr." References: <87ogfwm6pu.fsf@psyche.evansnet> <37C430F5.4488E2E2@FourThought.com> <37C43ABB.8230DD30@FourThought.com> Message-ID: <14276.16503.954788.160580@weyr.cnri.reston.va.us> Mike Olson writes: > I was quite suprised by this. I don't know enough about the pydom > internals to explain why it is slower. I just always assumed it PyDOM pays a *huge* penalty in two places: the proxies used to avoid circular references cause a lot of object creation/destruction when using the document, though I'm not sure it affects construction time so much. It also used instances for the internal data format, where perhaps only lists, tuples and dictionaries are really needed (at the expense of making the code more obscure). I'd love to see the proxies disappear, and just require explicit calls to a .destroy() method, but that means another massive code change. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From dieter@handshake.de Thu Aug 26 19:03:58 1999 From: dieter@handshake.de (Dieter Maurer) Date: Thu, 26 Aug 1999 20:03:58 +0200 (CEST) Subject: [XML-SIG] PyDOM performance In-Reply-To: <14276.16503.954788.160580@weyr.cnri.reston.va.us> References: <37C43ABB.8230DD30@FourThought.com> <14276.16503.954788.160580@weyr.cnri.reston.va.us> Message-ID: <14277.32729.249378.680126@lindm.dm> Fred L. Drake, Jr. writes: > > I'd love to see the proxies disappear, and just require explicit > calls to a .destroy() method, but that means another massive code > change. Marc-Andre Lemburg recently released a new version of mxProxy. It supports weak references and thus allows for circular structures (with a somewhat unintuitive behaviour when the root element is released while references to internal tree nodes are hold, like "weakdicts"). I expect changes to be rather local, when mxProxy should be used. - Dieter From paul@prescod.net Fri Aug 27 18:43:06 1999 From: paul@prescod.net (Paul Prescod) Date: Fri, 27 Aug 1999 13:43:06 -0400 Subject: [XML-SIG] Python Tools Make a Strong Showing Message-ID: <37C6CE2A.BDAB1FD0@prescod.net> http://www.xml.com/pub/1999/08/excelon/montreal.html#python Paul Prescod From hoel@germanlloyd.org Mon Aug 30 11:47:52 1999 From: hoel@germanlloyd.org (Berthold Hoellmann) Date: Mon, 30 Aug 1999 12:47:52 +0200 Subject: [XML-SIG] problem processing XML files Message-ID: <37CA6158.11623E2D@GermanLloyd.org> Hello, I just downloaded and installed "xml-0.5.1.tgz". I want to process the ScientificPython documentation using this. My first test was a file like --- snip --- import sys from xml.dom.utils import FileReader class DomDumper(FileReader): def __init__(self,filename): FileReader.__init__(self,filename) print self.document print self.getFileType(filename) self.document.dump() d = DomDumper(sys.argv[1]) print d --- snip --- Calling this with "ScientificPython.xml" as argument only returns >python dumper.py ScientificPython.xml XML <__main__.DomDumper instance at bbc00> but not the XML structure as expected. Does the parser silently ignore syntax errors? Running >python dumper.py sample.xml with "sample.xml" copied from the "xml-0.5.1/demo/quotes" directory gives the expected result. How do I check the files syntax using python? Thanks Berthold -- email: hoel@GermanLloyd.org ) ( C[_] These opinions might be mine, but never those of my employer. From paul@prescod.net Mon Aug 30 14:03:55 1999 From: paul@prescod.net (Paul Prescod) Date: Mon, 30 Aug 1999 09:03:55 -0400 Subject: [XML-SIG] problem processing XML files References: <37CA6158.11623E2D@GermanLloyd.org> Message-ID: <37CA813B.9D1334E3@prescod.net> Berthold Hoellmann wrote: > > but not the XML structure as expected. Does the parser silently ignore > syntax errors? The problem code is in FileReader: p = saxexts.make_parser(parserName) dh = SaxBuilder() p.setDocumentHandler(dh) p.feed(stream.read()) doc = dh.document It doesn't set up an error handler. We haven't decided what the base SAX module should do when there is no error handler. It's pretty clear that it should *either* output error messages to stderr (what XML/SGML tools have done traditionally) or it should throw an exception. In this case, though, dom.utils should probably set up an explicit error handler until we figure out a good default. Paul Prescod From hinsen@cnrs-orleans.fr Tue Aug 31 15:55:26 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Tue, 31 Aug 1999 16:55:26 +0200 Subject: [XML-SIG] problem processing XML files In-Reply-To: <199908310505.BAA13359@python.org> (xml-sig-admin@python.org) References: <199908310505.BAA13359@python.org> Message-ID: <199908311455.QAA31772@chinon.cnrs-orleans.fr> Berthold Hoellmann wrote: > but not the XML structure as expected. Does the parser silently ignore > syntax errors? Running Don't know (but I'd be interested in the answer myself!), but I can tell you what the error in ScientificPython.xml is: there's no filename (or "system identifier") for the DTD. This is one of my favourite quarrels with XML, because I find it highly inconvenient to be forced to put machine-dependent data into my documents. Especially since the DocBook DTD is not at the same location on the two machines that I use regularly. Fortunately nsgmls is more tolerant; it lets me specify the filename in a catalog entry and continues parsing after reporting the error. I wish other parsers would do the same, at least optionally. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From lmariusg@ifi.uio.no Tue Aug 31 20:04:16 1999 From: lmariusg@ifi.uio.no (Lars Marius Garshol) Date: 31 Aug 1999 21:04:16 +0200 Subject: [XML-SIG] problem processing XML files In-Reply-To: <199908311455.QAA31772@chinon.cnrs-orleans.fr> References: <199908310505.BAA13359@python.org> <199908311455.QAA31772@chinon.cnrs-orleans.fr> Message-ID: * Konrad Hinsen | | Fortunately nsgmls is more tolerant; it lets me specify the filename | in a catalog entry and continues parsing after reporting the error. | I wish other parsers would do the same, at least optionally. xmlproc supports catalog files, both SGML Open ones and XCatalog ones. You need to specify both pubid and sysid, but if xmlproc can resolve the former it will use it. xmlproc will also continue after errors, but will not pass data to the application anymore. (This is required by the XML recommendation.) However, this is optional, and you can change it with the set_data_after_wf_error method. --Lars M.