From jeremy.kloth at gmail.com Tue Jun 1 20:41:31 2010 From: jeremy.kloth at gmail.com (Jeremy Kloth) Date: Tue, 1 Jun 2010 12:41:31 -0600 Subject: [Expat-discuss] Expat on 64 bit Linux Message-ID: <201006011241.32047.jeremy.kloth@gmail.com> On Wednesday, March 24, 2010 10:09:29 am you wrote: > On Fri, Feb 5, 2010 at 10:28 AM, Jeremy Kloth wrote: > > It was done to allow Expat output to be mapped directly to Python's > > unicode objects (which can be either UCS-2 or UCS-4). > > > > If desired, I can produce the patches required to add that support to the > > Expat mainline. > > Hey Jeremy! > > Would the output type be controlled at compile time or at run time? > > This definitely is interesting to me. Do you also have a patched > pyexpat that consumes the new output, or are you using a new Python > extension to use this? Sorry for the delay, its been "in the queue". The output type is matched compile time to Python's Py_UNICODE type. The extension itself is 4Suite's cDomlette extension, which layers an Infoset (DOM-like) on top of the raw Expat callbacks. It has been in production for a very long time without issue, so I would call it quite stable. -- Jeremy Kloth From vijay.t.nikam at gmail.com Thu Jun 3 14:51:34 2010 From: vijay.t.nikam at gmail.com (Vijay Nikam) Date: Thu, 3 Jun 2010 18:21:34 +0530 Subject: [Expat-discuss] XML Parse into Structure Message-ID: Dear All, I am new to the XML parsing and list. Just couple of days before I started to work on Expat parser. I am parsing the XML files in C (libexpat.so - Linux) and I tried to parse the XML file using expat parser and was successful. So it was great start. thanks to the information provided on the following link: http://www.xml.com/pub/a/1999/09/expat/index.html#useparser The output of the XML parsed file is dumped on the console. Based on this I have two queries: 1. Is it possible to dumped the parsed file into the text file? If yes then please let me know (any pointers/ideas), thanks. 2. Is it possible to create structure with expat parser from XML file? - Is there any funtion available to achieve this in C, like, there is function available for PHP XML Expat Parser (xml_parse_into_struct)? Please let me know any pointers regarding above two mentioned queries? Any provided information will be imporatant and helpful. So Kindly please acknowledge, thank you. Kind Regards, Vijay Nikam From marco.maggi-ipsu at poste.it Fri Jun 4 16:35:18 2010 From: marco.maggi-ipsu at poste.it (Marco Maggi) Date: Fri, 04 Jun 2010 16:35:18 +0200 Subject: [Expat-discuss] XML Parse into Structure In-Reply-To: marco@localhost (Vijay Nikam's message of "Thu, 3 Jun 2010 18:21:34 +0530") References: Message-ID: <87ocfqzx09.fsf@rapitore.luna> "Vijay Nikam" wrote: > Please let me know any pointers regarding above two > mentioned queries? You write to be on Linux, so you can download this documentation file: in Texinfo format and compile it to HTML or Info with the "makeinfo" program; it should have instructions on how to do what you want. HTH -- Marco Maggi From nickmacd at gmail.com Fri Jun 4 20:13:32 2010 From: nickmacd at gmail.com (Nick MacDonald) Date: Fri, 4 Jun 2010 14:13:32 -0400 Subject: [Expat-discuss] XML Parse into Structure In-Reply-To: References: Message-ID: Vijay: If you want an "in memory" representation of your XML file, you probably don't want or need to use a SAX based parser like eXpat which is event based. You'd probably rather find a DOM based parser which is expressly designed to build a Document Object Model (the DOM in the name) in memory. You could of course layer a DOM module on top of eXpat, but I suspect that's a fair amount of work that has already been done many times before, if you do some searching on Source Force and other open source repositories I'm sure you'll find a lot of examples. The whole idea to use SAX is to be able to process a document with minimal memory overhead... and thus to be able to handle exceptionally large documents that would use too much memory if they were loaded into memory all at once. (And SAX would be faster if you were just searching quickly inside a document... no overhead loading into memory that parts you'd never use.) Good luck, Nick On Thu, Jun 3, 2010 at 8:51 AM, Vijay Nikam wrote: > I am new to the XML parsing and list. > Just couple of days before I started to work on Expat parser. > I am parsing the XML files in C (libexpat.so - Linux) and I tried to > parse the XML file using expat parser > and was successful. So it was great start. thanks to the information > provided on the following link: > ? ?http://www.xml.com/pub/a/1999/09/expat/index.html#useparser > > The output of the XML parsed file is dumped on the console. Based on > this I have two queries: > 1. Is it possible to dumped the parsed file into the text file? If yes > then please let me know (any pointers/ideas), thanks. > 2. Is it possible to create structure with expat parser from XML file? > ? ?- Is there any funtion available to achieve this in C, like, there > is function available for PHP XML Expat Parser > (xml_parse_into_struct)? > > Please let me know any pointers regarding above two mentioned queries? > Any provided information will be imporatant and helpful. > So Kindly please acknowledge, thank you. -- Nick MacDonald NickMacD at gmail.com From aleix at member.fsf.org Fri Jun 4 22:19:31 2010 From: aleix at member.fsf.org (=?UTF-8?Q?Aleix_Conchillo_Flaqu=C3=A9?=) Date: Fri, 4 Jun 2010 22:19:31 +0200 Subject: [Expat-discuss] XML Parse into Structure In-Reply-To: References: Message-ID: You can use SCEW (Simple C Expat Wrapper). I think it does what you need, and it also allows you to create in-memory XML trees and dump them to files/memory/... http://www.nongnu.org/scew/ On Fri, Jun 4, 2010 at 20:13, Nick MacDonald wrote: > Vijay: > > If you want an "in memory" representation of your XML file, you > probably don't want or need to use a SAX based parser like eXpat which > is event based. You'd probably rather find a DOM based parser which > is expressly designed to build a Document Object Model (the DOM in the > name) in memory. You could of course layer a DOM module on top of > eXpat, but I suspect that's a fair amount of work that has already > been done many times before, if you do some searching on Source Force > and other open source repositories I'm sure you'll find a lot of > examples. The whole idea to use SAX is to be able to process a > document with minimal memory overhead... and thus to be able to handle > exceptionally large documents that would use too much memory if they > were loaded into memory all at once. (And SAX would be faster if you > were just searching quickly inside a document... no overhead loading > into memory that parts you'd never use.) > > Good luck, > Nick > > From erg at research.att.com Wed Jun 9 21:14:31 2010 From: erg at research.att.com (Emden R. Gansner) Date: Wed, 09 Jun 2010 15:14:31 -0400 Subject: [Expat-discuss] libexpat & URIs Message-ID: <4C0FE817.9020403@research.att.com> Is there a setting or some technique to get libexpat to accept the ampersand character within an attribute value? For example, I would like it to parse Thanks. Emden From karl at waclawek.net Wed Jun 9 22:19:50 2010 From: karl at waclawek.net (Karl Waclawek) Date: Wed, 09 Jun 2010 16:19:50 -0400 Subject: [Expat-discuss] libexpat & URIs In-Reply-To: <4C0FE817.9020403@research.att.com> References: <4C0FE817.9020403@research.att.com> Message-ID: <4C0FF766.2070601@waclawek.net> On 09/06/2010 3:14 PM, Emden R. Gansner wrote: > Is there a setting or some technique to get libexpat to accept the > ampersand character within an attribute value? For example, I would like it > to parse > > > > Thanks. That is not well-formed XML, which means, it is not XML. XHTML does not allow this either, as it is XML. See http://www.w3.org/TR/xhtml1/#C_12 Karl -------------- next part -------------- A non-text attachment was scrubbed... Name: karl.vcf Type: text/x-vcard Size: 170 bytes Desc: not available URL: From jseyster at cs.stonybrook.edu Thu Jun 17 23:12:28 2010 From: jseyster at cs.stonybrook.edu (Justin Seyster) Date: Thu, 17 Jun 2010 17:12:28 -0400 Subject: [Expat-discuss] File not found error in the parser Message-ID: <1276809148.32549.19.camel@crossroads> I'm running into a really weird issue parsing a document with Expat: I keep getting file not found errors on files that exist. The weirdest part is that if I parse files from one particular directory, it works without a problem. Any other directory, however, and I get an IOError exception. (All the directories and files I've tried have standard Unix permissions, and I am their owner.) My code looks like this: # I verified that this does indeed return an Expat parser. parser = make_parser() parser.setFeature(feature_namespaces, 0) dh = BlankHandler() parser.setContentHandler(dh) try: xmlhandle = open(filename, 'r') # Attempting a read here succeeds parser.parse(xmlhandle) # This line throws the IOError xmlhandle.close() except IOError as e: print e.strerror sys.exit(1) Running this on files in most directories gives me the error: "No such file or directory" I know the file exists, however, because attempts to read it after opening it but before parsing it succeed. I also know that my code is at least semi-valid because it correctly parses files placed in one particular directory. Has anybody heard of this kind of problem before? Thanks. --Justin From fdrake at acm.org Tue Jun 22 14:50:46 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 22 Jun 2010 08:50:46 -0400 Subject: [Expat-discuss] File not found error in the parser In-Reply-To: <1276809148.32549.19.camel@crossroads> References: <1276809148.32549.19.camel@crossroads> Message-ID: On Thu, Jun 17, 2010 at 5:12 PM, Justin Seyster wrote: > The weirdest part is that if I parse files from one particular > directory, it works without a problem. This is strange. Is the one directory that works for you happen to be the current directory? What version of Python are you using, and on what platform? -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From fdrake at acm.org Wed Jun 30 00:43:03 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 29 Jun 2010 18:43:03 -0400 Subject: [Expat-discuss] File not found error in the parser In-Reply-To: <1277850473.5738.9.camel@crossroads> References: <1276809148.32549.19.camel@crossroads> <1277850473.5738.9.camel@crossroads> Message-ID: On Tue, Jun 29, 2010 at 6:27 PM, Justin Seyster wrote: > The one directory that works is not the current directory. ?In fact, it > seems that the magic directory stays the same regardless of what the > current directory is (and whether I use an absolute or relative path). Can you reproduce this with a short script that just does the XML parsing? If so, please post that. > I'm using the current Python from Ubuntu Karmic, which is 2.6.4. > > (Somebody kindly let me know that I sent this problem to the wrong list. > Sorry about that, and let me know if I should take this discussion of > the list.) I'm not too worried about that; I'm unlikely to see this elsewhere. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From jseyster at cs.stonybrook.edu Wed Jun 30 00:27:53 2010 From: jseyster at cs.stonybrook.edu (Justin Seyster) Date: Tue, 29 Jun 2010 18:27:53 -0400 Subject: [Expat-discuss] File not found error in the parser In-Reply-To: References: <1276809148.32549.19.camel@crossroads> Message-ID: <1277850473.5738.9.camel@crossroads> The one directory that works is not the current directory. In fact, it seems that the magic directory stays the same regardless of what the current directory is (and whether I use an absolute or relative path). I'm using the current Python from Ubuntu Karmic, which is 2.6.4. (Somebody kindly let me know that I sent this problem to the wrong list. Sorry about that, and let me know if I should take this discussion of the list.) --Justin On Tue, 2010-06-22 at 08:50 -0400, Fred Drake wrote: > On Thu, Jun 17, 2010 at 5:12 PM, Justin Seyster > wrote: > > The weirdest part is that if I parse files from one particular > > directory, it works without a problem. > > This is strange. Is the one directory that works for you happen to be > the current directory? > > What version of Python are you using, and on what platform? > > > -Fred > From jseyster at cs.stonybrook.edu Wed Jun 30 01:16:14 2010 From: jseyster at cs.stonybrook.edu (Justin Seyster) Date: Tue, 29 Jun 2010 19:16:14 -0400 Subject: [Expat-discuss] File not found error in the parser In-Reply-To: References: <1276809148.32549.19.camel@crossroads> <1277850473.5738.9.camel@crossroads> Message-ID: <1277853374.5738.19.camel@crossroads> Hmm, I wrote a script that just opens and parses an XML file, and it gives me the same problem. It looks like it's definitely something wrong either with my machine's configuration or the particular version of Python (and its Expat wappers) that I'm using. I put the script below. --Justin #!/usr/bin/env python import sys from xml.sax import handler from xml.sax import make_parser from xml.sax.handler import feature_namespaces class BlankHandler(handler.ContentHandler): def __init__(self): pass def startElement(self, name, attrs): print "Start: ", name pass def endElement(self, name): print "End: ", name pass if __name__ == '__main__': if len(sys.argv) != 2: sys.exit(1) xmlfile = sys.argv[1] parser = make_parser() parser.setFeature(feature_namespaces, 0) dh = BlankHandler() parser.setContentHandler(dh) try: xmlhandle = open(xmlfile, 'r') # Uncommenting the line below shows that xmlhandle can be read # successfully. #print xmlhandle.readline() parser.parse(xmlhandle) xmlhandle.close() except IOError as e: print "IOError: ", print e.strerror sys.exit(1) On Tue, 2010-06-29 at 18:43 -0400, Fred Drake wrote: > On Tue, Jun 29, 2010 at 6:27 PM, Justin Seyster > wrote: > > The one directory that works is not the current directory. In fact, it > > seems that the magic directory stays the same regardless of what the > > current directory is (and whether I use an absolute or relative path). > > Can you reproduce this with a short script that just does the XML parsing? > > If so, please post that. > > > I'm using the current Python from Ubuntu Karmic, which is 2.6.4. > > > > (Somebody kindly let me know that I sent this problem to the wrong list. > > Sorry about that, and let me know if I should take this discussion of > > the list.) > > I'm not too worried about that; I'm unlikely to see this elsewhere. > > > -Fred >