From chrish at cryptocard.com Mon Dec 1 13:51:07 2003 From: chrish at cryptocard.com (Chris Herborth) Date: Mon Dec 1 13:49:38 2003 Subject: [XML-SIG] Provide your own SAX parser to the DOM? Message-ID: <3FCB8D9B.9080900@cryptocard.com> I've got PyXML 0.8.3 installed here, and I'm generating the DOM for some documents thusly: reader = xml.dom.ext.reader.Sax2.Reader() # snipped: setting up an external entity resolver and error handler dom = reader.fromStream( file( an_xml_filename ) ) Is it possible to use a different SAX parser and still get the advantages of using the PyXML DOM goodness? I'm thinking ahead to when I want to use a validating parser, although the xml.dom.ext.reader.Sax2.Reader() appears to already dig through my DTD... The reason why I'm asking is because I'm using the resulting DOM to generate HTML 3.2 for JavaHelp. My DTD uses XHTML 1.0 entities and, for the most part, I'd like to _not_ have the Sax2.Reader() translating the entities into their Unicode characters (I've referenced the XHTML 1.0 entities from my DTD)... I want to be able to leave the entities in place and/or translate them into something myself. For example, JavaHelp 2.0 implements (most of) the Latin-1 accented character entities, but almost none of the others, so I'll have to handle ™ (for example) "by hand". -- Chris Herborth chrish@cryptocard.com Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/ Never send a monster to do the work of an evil scientist. From dieter at handshake.de Tue Dec 2 13:41:22 2003 From: dieter at handshake.de (Dieter Maurer) Date: Tue Dec 2 14:45:45 2003 Subject: [XML-SIG] Provide your own SAX parser to the DOM? In-Reply-To: <3FCB8D9B.9080900@cryptocard.com> References: <3FCB8D9B.9080900@cryptocard.com> Message-ID: <16332.56530.567033.265903@gargle.gargle.HOWL> Chris Herborth wrote at 2003-12-1 13:51 -0500: > I've got PyXML 0.8.3 installed here, and I'm generating the DOM for some > documents thusly: > > reader = xml.dom.ext.reader.Sax2.Reader() > > # snipped: setting up an external entity resolver and error handler > > dom = reader.fromStream( file( an_xml_filename ) ) > > Is it possible to use a different SAX parser and still get the advantages of > using the PyXML DOM goodness? The "Reader" class has an optional "parser" argument. Look at its source... -- Dieter From juhtolv at cc.jyu.fi Mon Dec 8 10:38:45 2003 From: juhtolv at cc.jyu.fi (Juhapekka Tolvanen) Date: Mon Dec 8 10:38:49 2003 Subject: [XML-SIG] Any XBEL to OPML converters out there? Message-ID: <20031208153844.GA11878@heresy.ainola.jyu.fi> Some universal format for outline editors has been developed. It is called OPML: http://www.opml.org/ I'd like to find a way to convert my XBEL-bookmarks to OPML, too. Do you know any software for that purpose? Or could you write it right now? It would better be free (in the sense of freedom) software. If I could convert my bookmarks to OPML-format, I could participate to this: http://www.superopendirectory.com/ But hey, how about creating system, that is just like SuperOpenDirectory, but uses XBEL-format? Here is some information of outline editors: http://www.troubleshooters.com/tpromag/199911/199911.htm http://www.outliners.com/ P.S: I don't subscribe to this list. I am smart enough to read archives from WWW, but please, Cc: to me. -- Juhapekka "naula" Tolvanen * http colon slash slash iki dot fi slash juhtolv "Rakkaudesta ruikuttajat, halusta ulvojat kiert?? kaupungin syd?nt? vaanien verta. Omiin synkkiin linnoihinsa vallitusten taa pelokkaammat piilee hautomaan haamujaan." CMX From walter at livinglogic.de Mon Dec 8 15:47:27 2003 From: walter at livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Mon Dec 8 15:47:32 2003 Subject: [XML-SIG] ANN: XIST 2.3 Message-ID: <3FD4E35F.5020403@livinglogic.de> XIST 2.3 has been released! What is it? =========== XIST is an XML-based extensible HTML generator written in Python. XIST is also a DOM parser (built on top of SAX2) with a very simple and Pythonesque tree API. Every XML element type corresponds to a Python class, and these Python classes provide a conversion method to transform the XML tree (e.g., into HTML). XIST can be considered "object oriented XSL". What's new in version 2.3? ========================== * Namespace handling has been rewritten to be more standard compliant (no more namespace prefixes for entity references or processing instructions). * Global attributes will now always generate the appropriate xmlns attributes. * Support for uTidylib has been added and arguments can be passed to tidy now. * The HTMLParser can handle global attributes now. * When parsing from an URL the base URL will be correct now even if the request gets redirected (thanks to ll-url 0.11.6). * Various other small bugfixes and enhancements. For changes in older versions see: http://www.livinglogic.de/Python/xist/History.html Where can I get it? =================== XIST can be downloaded from http://ftp.livinglogic.de/xist/ or ftp://ftp.livinglogic.de/pub/livinglogic/xist/ Web pages are at http://www.livinglogic.de/Python/xist/ ViewCVS access is available at http://www.livinglogic.de/viewcvs/ Bye, Walter D?rwald From tpassin at comcast.net Tue Dec 9 22:29:28 2003 From: tpassin at comcast.net (Thomas B. Passin) Date: Tue Dec 9 22:28:27 2003 Subject: [XML-SIG] Any XBEL to OPML converters out there? In-Reply-To: <20031208153844.GA11878@heresy.ainola.jyu.fi> References: <20031208153844.GA11878@heresy.ainola.jyu.fi> Message-ID: <3FD69318.3050000@comcast.net> Juhapekka Tolvanen wrote: > Some universal format for outline editors has been developed. It is called > OPML: > > http://www.opml.org/ > > I'd like to find a way to convert my XBEL-bookmarks to OPML, too. Do you > know any software for that purpose? Or could you write it right now? It > would better be free (in the sense of freedom) software. > That should be fairly easy to do by means of an xslt stylesheet. I do not know of any, but that is the way I would do it. This has actually been the subject of a homework assignment - see http://cscisl.dce.harvard.edu/assignments/2 OPLM is not a particularly well-designed format, so I would not recommend it unless you plan to use it with some system that requires it (which it seems you do). Cheers, Tom P From lalleman at mfps.com Wed Dec 10 16:02:29 2003 From: lalleman at mfps.com (Alleman, Lowell) Date: Wed Dec 10 16:03:27 2003 Subject: [XML-SIG] Working with non-compliant XML utilities Message-ID: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com> Hi, I'm working with an application that is very picky about the XML it accepts (basically it's non-compliant). The company's support team isn't giving me many options. Certain things that the XML spec say the parser shouldn't care about, this utility cares about. Things like the order of attributes and whether an empty element is written as "" or "" need to be presented in a specific way. Any ideas on how to work around some of these issues. Python XML tools would be preferred, but at this point all ideas and/or tools are welcome. All I need is to be able to dictate the order in which the attributes appear and whether or not empty elements should be written using the shortcut ('') form. The changes I am making to the XML document are rather trivial. I've considered simply using a slew of string.replace() and few regular expressions to get job done, but there maybe a few cases where the DOM approach would be preferable over the raw text manipulation approach. FYI: So far I have tried using minidom and 4DOM (the one from PyXML 0.8.2). I haven't seen the flexibility that I require so far, but I'm not very familiar with either parser. minidom would be my preference, since it is installed as part of the standard library. Thanks in advance, - Lowell Alleman From rsalz at datapower.com Wed Dec 10 16:15:59 2003 From: rsalz at datapower.com (Rich Salz) Date: Wed Dec 10 16:10:12 2003 Subject: [XML-SIG] Working with non-compliant XML utilities In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com> References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com> Message-ID: <3FD78D0F.9010304@datapower.com> > Any ideas on how to work around some of these issues You might take a look at the c14n code in dom/ext/c14n.py; it does more than what you want, but it shows how to walk a dom, sort attributes, etc. /r$ -- Rich Salz, Chief Security Architect DataPower Technology http://www.datapower.com XS40 XML Security Gateway http://www.datapower.com/products/xs40.html XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html From fredrik at pythonware.com Thu Dec 11 02:31:52 2003 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Dec 11 02:40:22 2003 Subject: [XML-SIG] Re: Working with non-compliant XML utilities References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com> Message-ID: Lowell Alleman wrote: > I'm working with an application that is very picky about the XML it accepts > (basically it's non-compliant). The company's support team isn't giving me > many options. Certain things that the XML spec say the parser shouldn't > care about, this utility cares about. Things like the order of attributes > and whether an empty element is written as "" or "" need to be > presented in a specific way. > > Any ideas on how to work around some of these issues. Python XML tools > would be preferred, but at this point all ideas and/or tools are welcome. > All I need is to be able to dictate the order in which the attributes appear > and whether or not empty elements should be written using the shortcut > ('') form. sounds like you need a custom XML writer. a quick solution is to take a copy of the writexml() method from the minidom's Element class and make it into a function (i.e. operate on element nodes instead of self, change the recursive writexml method call to a recursive function call, and use the _write_data from the minidom module). from xml.dom import minidom from xml.dom import Node def writexml(node, writer, indent="", addindent="", newl=""): if node.nodeType != Node.ELEMENT_NODE: # use standard serializer for everything but elements node.writexml(writer, indent, addindent, newl) return writer.write(indent+"<" + node.tagName) attrs = node._get_attributes() a_names = attrs.keys() a_names.sort() for a_name in a_names: writer.write(" %s=\"" % a_name) minidom._write_data(writer, attrs[a_name].value) writer.write("\"") if node.childNodes: writer.write(">%s"%(newl)) for node in node.childNodes: writexml(node,writer,indent+addindent,addindent,newl) writer.write("%s%s" % (indent,node.tagName,newl)) else: writer.write("/>%s"%(newl)) usage example: import sys node = minidom.parseString("hello") writexml(node, sys.stdout) when this works, tweak the code (it's trivial) until it does exactly what you want. hope this helps! From and-xml at doxdesk.com Thu Dec 11 12:46:05 2003 From: and-xml at doxdesk.com (Andrew Clover) Date: Thu Dec 11 13:04:46 2003 Subject: [XML-SIG] Working with non-compliant XML utilities In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com> References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com> Message-ID: <20031211174605.GA4930@doxdesk.com> Lowell Alleman wrote: > Certain things that the XML spec say the parser shouldn't > care about, this utility cares about. Things like the order of attributes Urgh. Nasty. Well, you could try pxdom: http://www.doxdesk.com/software/py/pxdom.html A special feature of this DOM implementation is that it will maintain a fixed order of attributes, so you can rely on the output being in the order you want. > and whether an empty element is written as "" or "" need to be > presented in a specific way. Is it always one way or always the other, or a mix? pxdom will use the short form where possible, unless you ask it to do canonicalisation (using the DOM Level 3 'canonical-form' parameter). Unfortunately if you did canonicalisation, the attribute order would be changed. I might add a separate option as a non-standard extension to turn off short-forms in 1.0 if anyone else would find it useful - alteratively, hack line 4193 in version 0.9. If you need to output short forms in some cases but not in others, that's a bit more work. What you could do to fool the serialiser is put a Text node of an empty string inside every element that you want to be output in the longer form, eg.: element.appendChild(element.ownerDocument.createTextNode('')) Just don't normalise it before you serialise or the empty text nodes will disappear! Actually, it looks like this trick works in minidom, too. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From lalleman at mfps.com Thu Dec 11 14:04:25 2003 From: lalleman at mfps.com (Alleman, Lowell) Date: Thu Dec 11 14:05:22 2003 Subject: [XML-SIG] Working with non-compliant XML utilities Message-ID: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com> > -----Original Message----- > From: Andrew Clover [mailto:and-xml@doxdesk.com] > Sent: Thursday, December 11, 2003 12:46 PM > To: xml-sig@python.org > Subject: Re: [XML-SIG] Working with non-compliant XML utilities > > > > and whether an empty element is written as "" or > "" need to be > > presented in a specific way. > > Is it always one way or always the other, or a mix? It is per-element. For example element 'a' would always be , but 'b' would have to be shown as ''. If 'a' was written as ' or 'b' as , the application chokes. It's pretty annoying. The good news is that when it comes down to actuality, only a few elements need to be tweaked. It's always in the form of forcing "" to be written as "", but never the other way around. Thanks for your suggestions. - Lowell From Alexandre.Fayolle at logilab.fr Thu Dec 11 15:21:22 2003 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Thu Dec 11 15:21:27 2003 Subject: [XML-SIG] Working with non-compliant XML utilities In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com> References: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com> Message-ID: <20031211202122.GE30399@calvin> On Thu, Dec 11, 2003 at 02:04:25PM -0500, Alleman, Lowell wrote: > It is per-element. For example element 'a' would always be , but 'b' > would have to be shown as ''. If 'a' was written as ' or 'b' as > , the application chokes. It's pretty annoying. > > The good news is that when it comes down to actuality, only a few elements > need to be tweaked. It's always in the form of forcing "" to be written > as "", but never the other way around. This reminds me of DTD validation of EMPTY elements: if an element is declared EMPTY in a DTD, then it has to use the shortcut notation, otherwise the document is not valid. Now I agree that mandating some elements to use the notation denotes a severely broken parser. -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations From lalleman at mfps.com Thu Dec 11 15:54:58 2003 From: lalleman at mfps.com (Alleman, Lowell) Date: Thu Dec 11 15:55:57 2003 Subject: [XML-SIG] Working with non-compliant XML utilities Message-ID: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com> Unfortunately, it looks like I have to do the exact opposite. Most XML writers automatically condense to the form. I need to tell the writer not to do so for certain elements. The sad part about all of this really is that the tool that I'm having these issues with is a data translation tool (sometimes called data mapping). It's primary job is converting and processing data in various formats. Speaking of DTDs.... I have some new questions: The order that the attributes should appear happens to be the same order that they are listed in the in the DTD. I've tried to pull out the DTD info using 4DOM and minidom, but haven't had much success. (I confess that I didn't spend too much time trying to find the appropriate documentation.) If I can pullout the information in the , I can quickly build a dictionary of elements which contain a list of ordered attributes. (I've tested this idea building a small dictionary manually, but it would be nice to do this using the DTD.) FYI: I tried pulling in the DTD info using an external reference as well as placing it inline. (I tried the inline DTD when using for minidom. I assumed that minidom wouldn't pick it up automatically, as it is not a validating parser. But I wasn't sure if it would simply ignore the DTD). I did notice that 4DOM seemed to choke on ENTITY references ( %entity_ref; ) when the DTD was inline. Can anyone confirm that? Feel free to send URLs. Thanks again, - Lowell -----Original Message----- From: Alexandre Fayolle [mailto:Alexandre.Fayolle@logilab.fr] Sent: Thursday, December 11, 2003 3:21 PM To: xml-sig@python.org Subject: Re: [XML-SIG] Working with non-compliant XML utilities On Thu, Dec 11, 2003 at 02:04:25PM -0500, Alleman, Lowell wrote: > It is per-element. For example element 'a' would always be , but 'b' > would have to be shown as ''. If 'a' was written as ' or 'b' as > , the application chokes. It's pretty annoying. > > The good news is that when it comes down to actuality, only a few elements > need to be tweaked. It's always in the form of forcing "" to be written > as "", but never the other way around. This reminds me of DTD validation of EMPTY elements: if an element is declared EMPTY in a DTD, then it has to use the shortcut notation, otherwise the document is not valid. Now I agree that mandating some elements to use the notation denotes a severely broken parser. -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig From martin at v.loewis.de Thu Dec 11 15:59:37 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Thu Dec 11 16:00:00 2003 Subject: [XML-SIG] Working with non-compliant XML utilities In-Reply-To: <20031211202122.GE30399@calvin> References: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com> <20031211202122.GE30399@calvin> Message-ID: Alexandre Fayolle writes: > This reminds me of DTD validation of EMPTY elements: > if an element is declared EMPTY in a DTD, then it has to use the > shortcut notation, otherwise the document is not valid. That is not the case. In XML 1.0 (second edition), after clause 43, we find the definitions [Definition: An element with no content is said to be empty.] The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag. So an is also an empty element. After clause 44, we find For interoperability, the empty-element tag should be used, and should only be used, for elements which are declared EMPTY. where "For interoperability" is defined as for interoperability [Definition: Marks a sentence describing a non-binding recommendation included to increase the chances that XML documents can be processed by the existing installed base of SGML processors which predate the WebSGML Adaptations Annex to ISO 8879.] So this is really "should", not "must". Regards, Martin From martin at v.loewis.de Thu Dec 11 16:07:25 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Thu Dec 11 16:07:51 2003 Subject: [XML-SIG] Working with non-compliant XML utilities In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com> References: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com> Message-ID: "Alleman, Lowell" writes: > The order that the attributes should appear happens to be the same order > that they are listed in the in the DTD. I've tried to pull out > the DTD info using 4DOM and minidom, but haven't had much success. You should explicitly use xmlproc, and install a DTDListener. The add_attribute callbacks will come in the order of attribute declaration. > I did notice that 4DOM seemed to choke on ENTITY references ( %entity_ref; ) > when the DTD was inline. Can anyone confirm that? No. 4DOM only uses some underlying parser, so it will never choke itself - if something chokes, it is the underlying parser. Regards, Martin From Alexandre.Fayolle at logilab.fr Fri Dec 12 03:05:35 2003 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Fri Dec 12 03:05:41 2003 Subject: [XML-SIG] Working with non-compliant XML utilities In-Reply-To: References: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com> <20031211202122.GE30399@calvin> Message-ID: <20031212080535.GA3080@calvin> On Thu, Dec 11, 2003 at 09:59:37PM +0100, Martin v. L?wis wrote: > Alexandre Fayolle writes: > > > This reminds me of DTD validation of EMPTY elements: > > if an element is declared EMPTY in a DTD, then it has to use the > > shortcut notation, otherwise the document is not valid. > > That is not the case. In XML 1.0 (second edition), after clause 43, we > find the definitions > So this is really "should", not "must". Thanks a lot for the precision, Martin. I don't remember where I had got the feeling of a 'must', here. I guess I should read XML 1.0 again -- this is also really a 'should' ;-) -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations From and-xml at doxdesk.com Fri Dec 12 04:56:13 2003 From: and-xml at doxdesk.com (Andrew Clover) Date: Fri Dec 12 05:14:52 2003 Subject: [XML-SIG] Working with non-compliant XML utilities In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com> References: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com> Message-ID: <20031212095613.GA26268@doxdesk.com> Lowell Alleman wrote: > I need to tell the writer not to do so for certain elements. (Speaking of which: the empty-text-node trick seems to work with 4DOM too. Yay!) > The sad part about all of this really is that the tool that I'm having these > issues with is a data translation tool Aye, that's a pretty poor data translation tool. > The order that the attributes should appear happens to be the same order > that they are listed in the in the DTD. I've tried to pull out > the DTD info using 4DOM and minidom, but haven't had much success. No, they don't make this available; as Martin says, you'll need to fiddle with a processor to get at this info. Alternatively, in another tiresome plug for my own imp, pxdom goes give one access to the ATTLIST declararions, and guarantees the declarations will be in document order. To get a list of attr names, you could say: decls= document.doctype.pxdomAttlists.getNamedItem('tagName').declarations attrNames= [decl.nodeName for decl in decls] Or to sort an element's attributes in one go: def sortAttributesByAttlistOrder(element): doctype= element.ownerDocument.doctype if doctype is not None: attlist= doctype.pxdomAttlists.getNamedItem(el.tagName) if attlist is not None: for attdecl in attlists.declarations: attr= element.getAttributeNode(attdecl.nodeName) if attr is not None: element.removeAttributeNode(attr) element.setAttributeNode(attr) The drawback is that pxdom doesn't (currently) use external entities, including the DTD external subset, so you'd have to cram the s into the internal subset for it to work. > (I tried the inline DTD when using for minidom. I assumed that minidom > wouldn't pick it up automatically, as it is not a validating parser. Yes, minidom also does not use external entities. > I did notice that 4DOM seemed to choke on ENTITY references ( %entity_ref; ) > when the DTD was inline. Hmm. Using expat it (and minidom) seem to ignore parameter entities, but I can't get it to choke as such. If you are getting an 'Illegal parameter entity reference', that'll be because XML is stricter about where it allows parameter entities in the internal subset than in an external DTD. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From Alexandre.Fayolle at logilab.fr Fri Dec 12 07:27:14 2003 From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle) Date: Fri Dec 12 07:27:18 2003 Subject: [XML-SIG] Working with non-compliant XML utilities In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com> References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com> Message-ID: <20031212122713.GF3080@calvin> On Wed, Dec 10, 2003 at 04:02:29PM -0500, Alleman, Lowell wrote: > FYI: So far I have tried using minidom and 4DOM (the one from PyXML 0.8.2). > I haven't seen the flexibility that I require so far, but I'm not very > familiar with either parser. minidom would be my preference, since it is > installed as part of the standard library. A way to getting what you need could probably be to use SAX to translate the document you have to what your appplication will understand. Get the content handler to produce the text representation of the contents read by the parser seems feasible. Some code to start from can be found in xml.sax.writer. The startElement and endElement should be customized to produce attributes in the right order, and to close elements correctly. The complexity of the task will depend on the genericity you want to achieve, of course. -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org D?veloppement logiciel avanc? - Intelligence Artificielle - Formations From nhs at llnl.gov Fri Dec 12 12:51:42 2003 From: nhs at llnl.gov (Norman Samuelson) Date: Fri Dec 12 12:51:51 2003 Subject: [XML-SIG] Re: Working with non-compliant XML utilities In-Reply-To: References: Message-ID: <6.0.0.22.2.20031212094847.048637f8@popeye.llnl.gov> One way you may be able to do what you want with minimal effort would be to write the XML as usual, with whatever tool you care about, then process it with XSL to produce the strange results you need. - Norm - From tpassin at comcast.net Fri Dec 12 18:26:04 2003 From: tpassin at comcast.net (Thomas B. Passin) Date: Fri Dec 12 18:24:59 2003 Subject: [XML-SIG] Re: Working with non-compliant XML utilities In-Reply-To: <6.0.0.22.2.20031212094847.048637f8@popeye.llnl.gov> References: <6.0.0.22.2.20031212094847.048637f8@popeye.llnl.gov> Message-ID: <3FDA4E8C.3010604@comcast.net> Norman Samuelson wrote: > One way you may be able to do what you want with minimal effort would be > to write the XML as usual, with whatever tool you care about, then > process it with XSL to produce the strange results you need. > He can't do that - xslt will only produce normal xml, not the "strange results" - no control over attribute order or empty element form unless he writes his own serializer. Cheers, Tom P From zhaoxinzhi at hotmail.com Sat Dec 13 05:22:26 2003 From: zhaoxinzhi at hotmail.com (Xinzhi Zhao) Date: Sat Dec 13 05:22:32 2003 Subject: [XML-SIG] Parsing the XML file which has encoding 'gb2312' . Message-ID: Hi, My XML files have to use other encoding instead of the default one, i.e. 'gb2312'. When I was parsing my XML files by dint of DOM or SAX , some errors occurred. The Python xml packages can't do it now? Is there any way can finish my job? How shall I do it? Please help me. Thanks, Xinzhi Zhao zhaoxinzhi@hotmail.com ------------------------------------------------------------------------------- -- My xml file is shown as below, ----------------------------------------------
¼òµ¥µÄ XML December 12, 2003 Xinzhi Zhao Parsing XML This XML is available in IE6. However,parsing it in Python by DOM or SAX will be failed.How shall I do it?
_________________________________________________________________ Add photos to your messages with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail From mike at skew.org Sat Dec 13 08:14:13 2003 From: mike at skew.org (Mike Brown) Date: Sat Dec 13 08:14:17 2003 Subject: [XML-SIG] Parsing the XML file which has encoding 'gb2312' . In-Reply-To: "from Xinzhi Zhao at Dec 13, 2003 10:22:26 am" Message-ID: <200312131314.hBDDEDmi021838@chilled.skew.org> Xinzhi Zhao wrote: > Hi, > My XML files have to use other encoding instead of the default one, i.e. > 'gb2312'. When I was parsing my XML files by dint of DOM or SAX , some > errors occurred. The Python xml packages can't do it now? Is there any way > can finish my job? How shall I do it? Please help me. Limitations of the underlying parser, Expat, prevent certain encodings from being supported without an additional layer of code. GB2312 is among them. I think you will have to transcode your document to one of the encodings that is supported by Expat (UTF-16, UTF-16LE, UTF-16BE, UTF-8, ISO-8859-1, or US-ASCII; you probably want UTF-8 or UTF-16), and then either rewrite the encoding declaration in the XML, or find a way to make the declaration externally. Expat does support external declaration of encoding, but I don't know offhand how to do it from Python. From martin at v.loewis.de Sat Dec 13 08:45:12 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Dec 13 08:45:34 2003 Subject: [XML-SIG] Parsing the XML file which has encoding 'gb2312' . In-Reply-To: <200312131314.hBDDEDmi021838@chilled.skew.org> References: <200312131314.hBDDEDmi021838@chilled.skew.org> Message-ID: Mike Brown writes: > I think you will have to transcode your document to one of the encodings that > is supported by Expat (UTF-16, UTF-16LE, UTF-16BE, UTF-8, ISO-8859-1, or > US-ASCII Alternatively, you can use xmlproc, which supports any encoding for which you have a Python codec. Regards, Martin From zhaoxz at founder.com Thu Dec 11 09:07:37 2003 From: zhaoxz at founder.com (=?ISO-8859-1?Q?=D5=D4=D0=C2=D6=BE?=) Date: Sat Dec 13 09:41:54 2003 Subject: [XML-SIG] Parsing XML Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: face-3(2)(1).GIF Type: image/gif Size: 842 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20031211/8937ad4b/face-321.gif From fredrik at pythonware.com Sat Dec 13 09:56:17 2003 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Dec 13 09:56:21 2003 Subject: [XML-SIG] Re: Parsing XML References: Message-ID: zhaoxz@founder.com wrote: > My XML files have to use encoding 'iso-8859-1',which is different > from the default encoding 'utf-8'. > > When I was using the package from 4DOM(pyxml.souceforge.net) > to parse my XML files,errors occured. The package for parsing xml > only supports encoding 'utf-8', right? if your XML files use ISO-8859-1 encoding, they should contain an encoding directive in the ?xml header; see http://www.w3.org/TR/2000/REC-xml-20001006#NT-EncodingDecl From mike at skew.org Sat Dec 13 10:11:32 2003 From: mike at skew.org (Mike Brown) Date: Sat Dec 13 10:11:39 2003 Subject: [XML-SIG] Parsing XML In-Reply-To: "from =?ISO-8859-1?Q?=D5=D4=D0=C2=D6=BE?= at Dec 11, 2003 10:07:37 pm" Message-ID: <200312131511.hBDFBW88022355@chilled.skew.org> > My XML files have to use encoding 'iso-8859-1',which is different > from the default encoding 'utf-8'. Technically, there is no default, but conforming parsers assume utf-16 until they see there's no byte-order mark (BOM) at the beginning, and then assume utf-8 until they see something else declared in the prolog. > When I was using the package from 4DOM(pyxml.souceforge.net) > to parse my XML files,errors occured. What errors, specifically? Are you sure your XML files are actually iso-8859-1 encoded? Note: it is the XML author's responsibility to ensure that the encoding declaration in the prolog accurate reflects the actual encoding of the document. If you had a gb2312 file and just changed the declaration to say iso-8859-1, you didn't change the actual encoding of the document, you just made the declaration be wrong, which an XML parser is required to treat as a fatal error. > The package for parsing xml > only supports encoding 'utf-8', right? No, the parser that 4DOM uses (Expat) supports other encodings, as I mentioned in my other message today. iso-8859-1 should work just fine. If you are still trying to parse gb2312-encoded XML, you need to do more than just replace 'gb2312' with 'iso-8859-1' in the encoding declaration. Use Python's codecs module to wrap your gb2312 stream, decoding from gb2312 to Unicode, at which point you can safely rewrite the declaration in the prolog if necessary, and then wrap again, encoding from Unicode to utf-8 (or utf-16). This is what I meant by 'transcode'. You won't need to rewrite the declaration if you can figure out how to make Expat accept the external encoding declaration from Python. I was hoping a PyExpat expert would suggest the answer. -Mike From KSBeattie at lbl.gov Mon Dec 22 22:00:02 2003 From: KSBeattie at lbl.gov (Keith Beattie) Date: Mon Dec 22 22:00:17 2003 Subject: [XML-SIG] binding an unbound namespace prefix Message-ID: <3FE7AFB2.50407@lbl.gov> Hi all, I'm trying to parse a string which is a segment of xml (in order to canonicalize it) which doesn't have all it's namespaces bound in the segment I'm trying to parse. How do I pass the namespaces into minidom.parseString(), or Domlette.NonvalidatingReader.parseString(),such that they'll be happy with the 'unbound prefix'? I hoped to see an nsdict kw arg or some such, but no luck. Is building the dom myself the only way to do this? Thanks, ksb From walter at livinglogic.de Tue Dec 23 04:18:43 2003 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Tue Dec 23 04:19:02 2003 Subject: [XML-SIG] binding an unbound namespace prefix In-Reply-To: <3FE7AFB2.50407@lbl.gov> References: <3FE7AFB2.50407@lbl.gov> Message-ID: <3FE80873.2020102@livinglogic.de> Keith Beattie wrote: > Hi all, > > I'm trying to parse a string which is a segment of xml (in order to > canonicalize it) which doesn't have all it's namespaces bound in the > segment I'm trying to parse. How do I pass the namespaces into > minidom.parseString(), or > Domlette.NonvalidatingReader.parseString(),such that they'll be happy > with the 'unbound prefix'? I hoped to see an nsdict kw arg or some > such, but no luck. Is building the dom myself the only way to do this? You could try XIST (http://www.livinglogic.de/Python/xist/), which supports passing a prefix mapping to the parser: from ll.xist import xsc, parsers from ll.xist.ns import html, svg, fo e = parsers.parseString( "", prefixes=xsc.Prefixes(fo, s=svg, h=html) ) Unfortunately this doesn't return a standard DOM, but of course you could convert it into one. Bye, Walter D?rwald From fdrake at acm.org Tue Dec 23 08:35:05 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Dec 23 08:35:26 2003 Subject: [XML-SIG] binding an unbound namespace prefix In-Reply-To: <3FE7AFB2.50407@lbl.gov> References: <3FE7AFB2.50407@lbl.gov> Message-ID: <16360.17545.295816.495961@sftp.fdrake.net> Keith Beattie writes: > I'm trying to parse a string which is a segment of xml (in order to > canonicalize it) which doesn't have all it's namespaces bound in > the segment I'm trying to parse. How do I pass the namespaces into > minidom.parseString(), or > Domlette.NonvalidatingReader.parseString(),such that they'll be > happy with the 'unbound prefix'? I hoped to see an nsdict kw arg > or some such, but no luck. Is building the dom myself the only way > to do this? No, but working around the current API to do this is pretty painful at the moment. Please file a feature request for better fragment support; you can assign it to me if you like. There is some code in xml.dom.expatbuilder that shows how to do this; it may be a bit difficult to decipher. The code is mine, so feel free to ask questions about it here on the XML-SIG mailing list. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From csad7 at t-online.de Tue Dec 23 12:27:02 2003 From: csad7 at t-online.de (c.) Date: Tue Dec 23 12:28:49 2003 Subject: [XML-SIG] empty EntityResolver for SAX Message-ID: <3FE87AE6.6070501@cdot.de> hi, (the following description is a bit convoluted, sorry about that. i hope you understand it anyway...) i thought of providing an empty EntityResolver to my parse function that if i encounter xml files with DTDs in them these will not be processed. class EmptyEntityResolver(xml.sax.handler.EntityResolver): def resolveEntity(self, publicId, systemId): return "http://localhost/empty.txt" p = xml.sax.make_parser() p.setContentHandler(handler) p.setEntityResolver(EmptyEntityResolver()) i could use p.setFeature('http://xml.org/sax/features/external-general-entities',False) of course but i thought something like the above might be better for my purpose. my problem now is that something like return None does not work. only the above with the dummy empty.txt file needs to be present. is there a simpler way of returning an empty InputSource? thanks a lot chris From shunting at etopicality.com Tue Dec 23 16:32:42 2003 From: shunting at etopicality.com (Sam Hunting) Date: Tue Dec 23 16:32:58 2003 Subject: [XML-SIG] Which version of PyXML do I install? Message-ID: Here are the first few lines from dmesg: Linux version 2.4.23-xfs-031204 (...@...) (gcc version 2.95.4 20011002 (Debian prerelease)) #1 SMP Thu Dec 4 17:08:50 CET 2003 I'd prefer to use an rpm if possible. Sam Hunting eTopicality, Inc. --------------------------------------------------------------------------- Co-editor: ISO Reference Model for Topic Maps Topic map consulting and training: www.etopicality.com Free open source topic map tools: www.gooseworks.org XML Topic Maps: Creating and Using Topic Maps for the Web. Addison-Wesley, ISBN 0-201-74960-2. --------------------------------------------------------------------------- From and-xml at doxdesk.com Wed Dec 24 05:22:03 2003 From: and-xml at doxdesk.com (Andrew Clover) Date: Wed Dec 24 05:41:18 2003 Subject: [XML-SIG] binding an unbound namespace prefix In-Reply-To: <3FE7AFB2.50407@lbl.gov> References: <3FE7AFB2.50407@lbl.gov> Message-ID: <20031224102203.GA29545@doxdesk.com> Keith Beattie wrote: > How do I pass the namespaces into minidom.parseString(), or > Domlette.NonvalidatingReader.parseString(), such that they'll be happy > with the 'unbound prefix'? I know of no convenient way of doing this with either minidom or domlette. Probably the quickest solution is to hack the input content so it's surrounded with an element declaring all the known namespaces, then ignore the root element of the result. Alternatively, the DOM Level 3 method parseWithContext would let you insert directly into the relevant part of the document (with namespaces declared above). pxdom supports this method and the domConfig parameter 'canonical-form', so that might be a possibility too. -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From list-matt at reprocessed.org Wed Dec 24 06:52:19 2003 From: list-matt at reprocessed.org (Matt Patterson) Date: Wed Dec 24 06:52:25 2003 Subject: [XML-SIG] testing a document for validity against a schema, not a DTD Message-ID: Hello all, I'm looking for a way to validate an XML document against a schema: nothing fancy, just a simple yes/no response from the parser would probably do. I can do it several ways with DTDs, but I'm unsure about XML Schema support in Python. Can anyone enlighten me? Many thanks, Matt Patterson From chrish at cryptocard.com Wed Dec 24 11:06:28 2003 From: chrish at cryptocard.com (Chris Herborth) Date: Wed Dec 24 11:03:28 2003 Subject: [XML-SIG] Validating parser Message-ID: <3FE9B984.9040600@cryptocard.com> I'm upgrading my XML application to use the validating parser; I've been fixing previously-hidden bugs in my DTD and my document instances as I go... but now I've gotten to one that is baffling me... must be the seasonal distraction. ;-) Here's the error: Invalid XML, unable to continue. book.xml, line 11, column 3: Not a valid name And here are the first 11 lines of book.xml: %book.entities; ]> If I remove the book.ent bit, it still complains at the end of the DOCTYPE declaration, so I'm guessing there's an invalid name somewhere in my DTD. Although I'm not sure why this error wouldn't be reported until the end of the declaration, instead of during DTD parsing like my other DTD-related errors... Any help is grealy appreciated, thanks! -- Chris Herborth chrish@cryptocard.com Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/ Never send a monster to do the work of an evil scientist. From xml-sig at thewrittenword.com Mon Dec 29 18:04:20 2003 From: xml-sig at thewrittenword.com (Albert Chin) Date: Mon Dec 29 18:04:28 2003 Subject: [XML-SIG] 4suite 1.0a3/PyXML 1.0a3 on HP-UX with Python 2.3.2]\ Message-ID: <20031229230420.GA56939@spuckler.il.thewrittenword.com> I've installed PyXML 0.8.3 and 4Suite 1.0a3 on HP-UX 11.x and Solaris 2.x with GCC 3.3.2. The following program causes a failure on HP-UX but works on Solaris: $ cat a.xml $ cat a.py #!/opt/TWWfsw/python232/bin/python from xml.dom.ext.reader import PyExpat from Ft.Xml.XPath import Evaluate fd = open('a.xml', 'r') reader = PyExpat.Reader() dom = reader.fromStream(fd) $ python a.py Traceback (most recent call last): File "./a.py", line 8, in ? dom = reader.fromStream(fd) File "/opt/TWWfsw/python232p/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/PyExpat.py", line 65, in fromStream success = self.parser.ParseFile(stream) File "/opt/TWWfsw/python232p/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/PyExpat.py", line 120, in startElement self._completeTextNode() File "/opt/TWWfsw/python232p/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/PyExpat.py", line 104, in _completeTextNode if self._currText and len(self._nodeStack) and self._nodeStack[-1].nodeType != Node.DOCUMENT_NODE: AttributeError: 'NoneType' object has no attribute 'nodeType' I posted to the 4Suite-dev mailing list but the problem appears to be a PyXML one. Any ideas? -- albert chin (china@thewrittenword.com) From zhaoxinzhi at hotmail.com Mon Dec 29 21:36:44 2003 From: zhaoxinzhi at hotmail.com (Xinzhi Zhao) Date: Mon Dec 29 23:03:09 2003 Subject: [XML-SIG] Does Python support XQuery? Message-ID: Does Python support XQuery? If it does, would you please show me a example? ManyThanks. --Xinzhi Zhao _________________________________________________________________ The new MSN 8: smart spam protection and 2 months FREE* http://join.msn.com/?page=features/junkmail From scout104 at comcast.net Wed Dec 31 04:06:42 2003 From: scout104 at comcast.net (Janna) Date: Wed Dec 31 04:06:23 2003 Subject: [XML-SIG] Buy Vicodin online today, overnight shipping xyiz kccg v Message-ID: <3FF291A2.7080200@comcast.net> can you give me more info on buying vicodin? Janna Kneale scout104@comcast.net thanks