From boris at codesynthesis.com Tue May 1 22:47:36 2007 From: boris at codesynthesis.com (Boris Kolpackov) Date: Tue, 1 May 2007 20:47:36 +0000 (UTC) Subject: [Expat-discuss] expat Memory Footprint? References: <005d01c78a1c$9c4603a0$6402a8c0@dell4700> Message-ID: Hi Steve, "Steve Vernon" writes: > I would like to know what is the starting size, then what "variables" > it depends upon and what is the factor of expansion for each such > variable. I don't think anybody will be able to provide you with an exhaustive list. I would imagine the only factor that can result in an serious memory usage growth is the nesting of elements in your XML. But the only sure way to find out is to try some real code on some real XML. I did such measurements for XSD/e -- an XML parser generator for embedded systems -- that uses Expat underneath. Thought you might be interested: http://www.codesynthesis.com/pipermail/xsde-users/2007-March/000002.html hth, -boris -- Boris Kolpackov Code Synthesis Tools CC http://www.codesynthesis.com Open-Source, Cross-Platform C++ XML Data Binding From karl at waclawek.net Fri May 11 19:16:13 2007 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 11 May 2007 13:16:13 -0400 Subject: [Expat-discuss] New release comping up Message-ID: <4644A4DD.5000202@waclawek.net> There is a new Expat release (2.0.1) coming up. As always, once there is activity, we get even more suggestions for fixes and changes. Most of them concerned the build system this time. As we have committed a few patches recently, one of them actually being a code change, I would urge everyone to check out from CVS and run Expat through the build and your Expat dependent applications you may have, so that we can avoid leaving Expat unreleased for a while just to get these changes tested sufficiently. Karl From fdrake at acm.org Sat May 12 05:04:43 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 11 May 2007 23:04:43 -0400 Subject: [Expat-discuss] Parser benchmarks article Message-ID: <200705112304.43386.fdrake@acm.org> O'Reilly's "XML.com" site just published an article comparing several popular parsers for performance: http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html The article is heavily biased toward Java implementations (reflecting a sad state of the entire industry), but it did show that the only C implementation tested performed much better than the Java implementations for the event-based parsing. The second benchmark measured performance of the StAX parsers, which is a Java specification; there are non-Java equivalents (as I understand it), but I don't know if there's a C implementation of a similar API. I pretty much ignored that, since I've no interest in Java-based parsers. What disappointed me about the article was that only one C implementation was considered, and it wasn't Expat. I'd love to see a good comparison of Expat with the libxml2 library. -Fred -- Fred L. Drake, Jr. From karl at waclawek.net Sat May 12 06:00:34 2007 From: karl at waclawek.net (Karl Waclawek) Date: Sat, 12 May 2007 00:00:34 -0400 Subject: [Expat-discuss] Parser benchmarks article In-Reply-To: <200705112304.43386.fdrake@acm.org> References: <200705112304.43386.fdrake@acm.org> Message-ID: <46453BE2.7030107@waclawek.net> Fred L. Drake, Jr. wrote: > O'Reilly's "XML.com" site just published an article comparing several popular > parsers for performance: > > http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html > > > > What disappointed me about the article was that only one C implementation was > considered, and it wasn't Expat. I'd love to see a good comparison of Expat > with the libxml2 library. > > Fred, why don't you add a comment to that article (if you are a member of xml.com). There are 3 year old benchmarks at http://xmlbench.sourceforge.net/. Expat is somewhat faster than libxml there. Karl From bkeitch at googlemail.com Tue May 15 18:02:52 2007 From: bkeitch at googlemail.com (Ben Keitch) Date: Tue, 15 May 2007 17:02:52 +0100 Subject: [Expat-discuss] Large data sets (Expat v2.0.0; compiled cygwin) Message-ID: Can someone help me with this code. It is trying to convert an XML file of book data to tab-deliminated. Should be simple, but it seems to mangle about 200 of the 10000 records I give it. Supplying each record by itself, it works fine. I don't understand why, but not being a C programmer, I dare say I am mangling pointers, or there is a multithread issue I don't understand. here is a typical error: given lines 3380-3383 in a 917682 long XML file (it is well-formed according to xmlwf): 0816044384 9780816044382 9780816044382 ... the data given to the data handler (and printed to stderr) is: Data: 9780816 Data: 044382 Error : isbn10: 0816044384 isbn: 382 isbn13: 044382 Data: Data: 9780816044382 So in this case, ISBN10 was correct, but ISBN13 only got the last 6 digits on the first call, but managed to get all the data on the third call (the second call gives a blank line! why?) If you give just this XML record to the program, it works fine. Any help greatly appreciated -------------- next part -------------- A non-text attachment was scrubbed... Name: processfile.c Type: application/octet-stream Size: 6765 bytes Desc: not available Url : http://mail.libexpat.org/pipermail/expat-discuss/attachments/20070515/e5f90c54/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Makefile Type: application/octet-stream Size: 457 bytes Desc: not available Url : http://mail.libexpat.org/pipermail/expat-discuss/attachments/20070515/e5f90c54/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: test.xml Type: text/xml Size: 3084 bytes Desc: not available Url : http://mail.libexpat.org/pipermail/expat-discuss/attachments/20070515/e5f90c54/attachment.bin From andrelsm at iname.com Tue May 15 19:46:07 2007 From: andrelsm at iname.com (Andre Luis Monteiro) Date: Tue, 15 May 2007 12:46:07 -0500 Subject: [Expat-discuss] Large data sets (Expat v2.0.0; compiled cygwin) Message-ID: <20070515174607.995AC478088@ws1-5.us4.outblaze.com> Keitch take a look at expat-2.0.0/examples/outline.c in your distro (some previous Expat versions bring this example too). [] andrelsm > ----- Original Message ----- > From: "Ben Keitch" > To: expat-discuss at libexpat.org > Subject: [Expat-discuss] Large data sets (Expat v2.0.0; compiled cygwin) > Date: Tue, 15 May 2007 17:02:52 +0100 > > > Can someone help me with this code. It is trying to convert an XML file of > book data to tab-deliminated. Should be simple, but it seems to mangle about > 200 of the 10000 records I give it. Supplying each record by itself, it > works fine. I don't understand why, but not being a C programmer, I dare say > I am mangling pointers, or there is a multithread issue I don't understand. > > here is a typical error: > given lines 3380-3383 in a 917682 long XML file (it is well-formed according > to xmlwf): > > > 0816044384 > 9780816044382 > 9780816044382 > ... > > > the data given to the data handler (and printed to stderr) is: > > Data: 9780816 Data: 044382 > Error : isbn10: 0816044384 isbn: 382 isbn13: 044382 > Data: > Data: 9780816044382 > > So in this case, ISBN10 was correct, but ISBN13 only got the last 6 digits > on the first call, but managed to get all the data on the third call (the > second call gives a blank line! why?) > > If you give just this XML record to the program, it works fine. > > Any help greatly appreciated > << processfile.c >> > << Makefile >> > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > abra?o Andr? Lu?s = From webmaster at hartwork.org Tue May 15 19:54:10 2007 From: webmaster at hartwork.org (Sebastian Pipping) Date: Tue, 15 May 2007 19:54:10 +0200 Subject: [Expat-discuss] Large data sets (Expat v2.0.0; compiled cygwin) In-Reply-To: References: Message-ID: <4649F3C2.5040703@hartwork.org> Ben Keitch wrote: > Can someone help me with this code. It is trying to convert an XML file of > book data to tab-deliminated. Should be simple, but it seems to mangle > about > 200 of the 10000 records I give it. Supplying each record by itself, it > works fine. I don't understand why, but not being a C programmer, I dare > say > I am mangling pointers, or there is a multithread issue I don't understand. ------------------------------------------------------------ I don't see any threads in your code so I don't think this could be the case. To me it seems it is the way you talk to Expat. ------------------------------------------------------------ > > 0816044384 > 9780816044382 > 9780816044382 > ... > > > the data given to the data handler (and printed to stderr) is: > > Data: 9780816 Data: 044382 > Error : isbn10: 0816044384 isbn: 382 isbn13: 044382 > Data: > Data: 9780816044382 ------------------------------------------------------------ Expat reports (and must do so) all text in the XML file *including whitespace*. In your your code I found this line in the character data handler: fprintf(stderr,"Data: %s\t",temp); Let me use square brackets to visualize what Expat is passing you: [9780816][044382][ ][9780816044382] That also makes sense to me since that's the only place where the newline can come from. So to solve this you will have to * Concatenate the chunks passed to the char handler * Look out what element you are in But there is more you might want to re-consider in your current code: * If you use a fixed buffer of byte you have to make sure you don't write bytes in if its value is greater (-> buffer overflows!). * is pure waste of memory in case you can have books with only two numbers, not all. I would suggest to switch to pointers and dynamic allocation then. * In you do many calls to strcmp but forgot to put "else" before the "if". Currently a tag will be matched against "ISBN10", "ISBN13" and so on, even if it matched "ISBN10" already. Sebastian From regis.st-gelais at laubrass.com Tue May 15 19:55:48 2007 From: regis.st-gelais at laubrass.com (=?iso-8859-1?Q?R=E9gis_St-Gelais?=) Date: Tue, 15 May 2007 13:55:48 -0400 Subject: [Expat-discuss] Large data sets (Expat v2.0.0; compiled cygwin) References: Message-ID: <033001c7971a$464b91d0$6f01a8c0@LaubrassSag2LT> The data between your tag can be splitted and sent to your handler in separate calls. It is stated in the docs. Your handler can be called with 08160 and then with 44384 --- Regis St-Gelais Laubrass inc. ----- Original Message ----- From: Ben Keitch To: expat-discuss at libexpat.org Sent: Tuesday, May 15, 2007 12:02 PM Subject: [Expat-discuss] Large data sets (Expat v2.0.0; compiled cygwin) Can someone help me with this code. It is trying to convert an XML file of book data to tab-deliminated. Should be simple, but it seems to mangle about 200 of the 10000 records I give it. Supplying each record by itself, it works fine. I don't understand why, but not being a C programmer, I dare say I am mangling pointers, or there is a multithread issue I don't understand. here is a typical error: given lines 3380-3383 in a 917682 long XML file (it is well-formed according to xmlwf): 0816044384 9780816044382 9780816044382 ... the data given to the data handler (and printed to stderr) is: Data: 9780816 Data: 044382 Error : isbn10: 0816044384 isbn: 382 isbn13: 044382 Data: Data: 9780816044382 So in this case, ISBN10 was correct, but ISBN13 only got the last 6 digits on the first call, but managed to get all the data on the third call (the second call gives a blank line! why?) If you give just this XML record to the program, it works fine. Any help greatly appreciated ------------------------------------------------------------------------------ _______________________________________________ Expat-discuss mailing list Expat-discuss at libexpat.org http://mail.libexpat.org/mailman/listinfo/expat-discuss From lee at novomail.net Tue May 15 19:31:55 2007 From: lee at novomail.net (Lee Passey) Date: Tue, 15 May 2007 11:31:55 -0600 Subject: [Expat-discuss] Large data sets (Expat v2.0.0; compiled cygwin) In-Reply-To: References: Message-ID: <4649EE8B.5070206@novomail.net> Ben Keitch wrote: > Can someone help me with this code. It is trying to convert an XML file of > book data to tab-deliminated. Should be simple, but it seems to mangle > about 200 of the 10000 records I give it. Supplying each record by itself, > it works fine. I don't understand why, but not being a C programmer, I > dare say I am mangling pointers, or there is a multithread issue I don't > understand. > > here is a typical error: > given lines 3380-3383 in a 917682 long XML file (it is well-formed > according to xmlwf): > > > 0816044384 > 9780816044382 > 9780816044382 > ... > > > the data given to the data handler (and printed to stderr) is: > > Data: 9780816 Data: 044382 > Error : isbn10: 0816044384 isbn: 382 isbn13: 044382 > Data: > Data: 9780816044382 > > So in this case, ISBN10 was correct, but ISBN13 only got the last 6 digits > on the first call, but managed to get all the data on the third call (the > second call gives a blank line! why?) > > If you give just this XML record to the program, it works fine. > > Any help greatly appreciated Be aware of two things: 1. in XML, whitespace /is/ significant, and 2. in Expat the character data handler may be called multiple times, sequentially, with partial data. In your case, you haven't indicated the definition of BUFSIZ. Let's assume that BUFSIZ is 256. If you read a file in 256-byte chunks, in all likelihood at some point you're going to split a chunk of CData. In this case, Expat will call the character data handler (or text handler) with the partial data, return to the main method to get more data, then call the handler with the remainder of the CData. Try this: Set a StartElement handler. When the handler is called, save the name of the element and set the start of the output buffer to zero. Now, every time the CharacterData handler is called, _and_ we are inside an element which can contain CData, /add/ the data to the output buffer. When the EndElement handler is called, check to make sure that it matches the start element (just in case the XML is badly formed) /then/ store the element name (or some translation thereof) and the output buffer you have accumulated. Of course, you may need to add code to deal with potential nested elements, but that is left as an exercise for the reader. -- Nothing of significance below this line. From bkeitch at googlemail.com Wed May 16 00:24:00 2007 From: bkeitch at googlemail.com (Ben Keitch) Date: Tue, 15 May 2007 23:24:00 +0100 Subject: [Expat-discuss] Large data sets (Expat v2.0.0; compiled cygwin) In-Reply-To: <4649EE8B.5070206@novomail.net> References: <4649EE8B.5070206@novomail.net> Message-ID: Brilliant, Lee, that would explain things. I did wonder if something like that was going on. I will do as you suggest and add chunks of text together. Strangely I managed to have more success using Perl's wrapper around expat, and perl. However, that takes 2 hours to run instead of 30 seconds! Thanks again for you help, Ben On 15/05/07, Lee Passey wrote: > > Ben Keitch wrote: > > Can someone help me with this code. It is trying to convert an XML file > of > > book data to tab-deliminated. Should be simple, but it seems to mangle > > about 200 of the 10000 records I give it. Supplying each record by > itself, > > it works fine. I don't understand why, but not being a C programmer, I > > dare say I am mangling pointers, or there is a multithread issue I don't > > understand. > > > > here is a typical error: > > given lines 3380-3383 in a 917682 long XML file (it is well-formed > > according to xmlwf): > > > > > > 0816044384 > > 9780816044382 > > 9780816044382 > > ... > > > > > > the data given to the data handler (and printed to stderr) is: > > > > Data: 9780816 Data: 044382 > > Error : isbn10: 0816044384 isbn: 382 isbn13: 044382 > > Data: > > Data: 9780816044382 > > > > So in this case, ISBN10 was correct, but ISBN13 only got the last 6 > digits > > on the first call, but managed to get all the data on the third call > (the > > second call gives a blank line! why?) > > > > If you give just this XML record to the program, it works fine. > > > > Any help greatly appreciated > > Be aware of two things: 1. in XML, whitespace /is/ significant, and 2. > in Expat the character data handler may be called multiple times, > sequentially, with partial data. > > In your case, you haven't indicated the definition of BUFSIZ. Let's > assume that BUFSIZ is 256. If you read a file in 256-byte chunks, in all > likelihood at some point you're going to split a chunk of CData. In this > case, Expat will call the character data handler (or text handler) with > the partial data, return to the main method to get more data, then call > the handler with the remainder of the CData. > > Try this: > > Set a StartElement handler. When the handler is called, save the name of > the element and set the start of the output buffer to zero. Now, every > time the CharacterData handler is called, _and_ we are inside an element > which can contain CData, /add/ the data to the output buffer. When the > EndElement handler is called, check to make sure that it matches the > start element (just in case the XML is badly formed) /then/ store the > element name (or some translation thereof) and the output buffer you > have accumulated. > > Of course, you may need to add code to deal with potential nested > elements, but that is left as an exercise for the reader. > > -- > Nothing of significance below this line. > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > From didyeah971 at yahoo.fr Wed May 16 12:26:05 2007 From: didyeah971 at yahoo.fr (Didier MOUNIEN) Date: Wed, 16 May 2007 12:26:05 +0200 Subject: [Expat-discuss] Install Expat on Mac os X tiger Message-ID: I just Installed expat on mac os X tiger and when i tried to build a simple prog with XCODE, it can't retrieve expat.h. On command line neither. Though expat seems to be installed we no problem (i can access the man ...). Is there a specific way to install it on mac os? Further more, what about a C++ wrapper, i am looking an easy to understand one, cause i am not a C++ guy . thanks for help ? Did From Mark.Williams at techop.co.uk Wed May 16 10:10:24 2007 From: Mark.Williams at techop.co.uk (Mark Williams) Date: Wed, 16 May 2007 09:10:24 +0100 Subject: [Expat-discuss] Large data sets (Expat v2.0.0; compiled cygwin) In-Reply-To: Message-ID: Hi Ben, > Can someone help me with this code. It is trying to convert > an XML file of > book data to tab-deliminated. Should be simple, but it seems > to mangle about > 200 of the 10000 records I give it. This is a very common mistake. Your character data handler is not guaranteed to get all the data between element tags in one go. You need to concatenate the data until you see the end element tag. To the whole list: Is there a FAQ for v2.0? This really ought to go in one. Mark From bkeitch at cactusdata.co.uk Wed May 16 18:09:42 2007 From: bkeitch at cactusdata.co.uk (Ben Keitch) Date: Wed, 16 May 2007 17:09:42 +0100 (BST) Subject: [Expat-discuss] Large data sets (Expat v2.0.0; compiled cygwin) Message-ID: Thanks for Mark and Lee for helping me out, it was indeed a problem of not concatenating data. Yes, certainly should be in an FAQ. Thanks to everyone for helping, Ben From moncef.mezghani at free.fr Thu May 17 10:54:54 2007 From: moncef.mezghani at free.fr (Moncef Mezghani) Date: Thu, 17 May 2007 10:54:54 +0200 Subject: [Expat-discuss] Install Expat on Mac os X tiger In-Reply-To: References: Message-ID: <95AFE1AA-9C51-45B6-80C0-8EEE860691A5@free.fr> There is an other way to install it, use 'fink' and 'apt-get Install' ... Now if you are sure installation has done correctly, may be you can use in terminal console command: locate expat.h see (man locate) to see where installation has put the include file. and add its path to path search list. Moncef. Le 16 mai 07 ? 12:26, Didier MOUNIEN a ?crit : > I just Installed expat on mac os X tiger and when i tried to build > a simple > prog with XCODE, it can't retrieve expat.h. On command line > neither. Though > expat seems to be installed we no problem (i can access the > man ...). Is > there a specific way to install it on mac os? > Further more, what about a C++ wrapper, i am looking an easy to > understand > one, cause i am not a C++ guy . > thanks for help ? > Did > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > From rameshs at remoba.com Thu May 17 13:48:36 2007 From: rameshs at remoba.com (rameshs) Date: Thu, 17 May 2007 17:18:36 +0530 Subject: [Expat-discuss] How to install Expat library Message-ID: <00ad01c79879$4f33fb60$5100a8c0@silokablr.com> Hi Karl, I am new to this XML parsing, can you help me how this EXPAT library for a beginner like me, basically am looking for a parser which gives me all the data and tags in to a file. Can you help me on this, thanks in advance. Thanks & Regards, Ramesh S, From crackeur at comcast.net Fri May 18 05:19:57 2007 From: crackeur at comcast.net (Jimmy Zhang) Date: Thu, 17 May 2007 20:19:57 -0700 Subject: [Expat-discuss] libxml2 vs expat Message-ID: <00c401c798fb$6d9ce300$0d02a8c0@ximpleware> Hi, Does anyone have any references comparing expat with libXML2 in parsing throughput? In general, should I expect the similar level of performance? Thanks, jz From ramamurthy.suresh at wipro.com Fri May 18 12:16:32 2007 From: ramamurthy.suresh at wipro.com (ramamurthy.suresh at wipro.com) Date: Fri, 18 May 2007 15:46:32 +0530 Subject: [Expat-discuss] Expat-discuss Digest, Vol 86, Issue 6 In-Reply-To: References: Message-ID: <438662DA48DCAA41B1DF648BD4BD76C006F5CD9B@CHN-SNR-MBX01.wipro.com> Ramesh, You can look into the example provided by the expat package. Its enough to get the data and tags from a XML file. Suresh. -----Original Message----- From: expat-discuss-bounces at libexpat.org [mailto:expat-discuss-bounces at libexpat.org] On Behalf Of expat-discuss-request at libexpat.org Sent: Friday, May 18, 2007 3:30 PM To: expat-discuss at libexpat.org Subject: Expat-discuss Digest, Vol 86, Issue 6 Send Expat-discuss mailing list submissions to expat-discuss at libexpat.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.libexpat.org/mailman/listinfo/expat-discuss or, via email, send a message with subject or body 'help' to expat-discuss-request at libexpat.org You can reach the person managing the list at expat-discuss-owner at libexpat.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Expat-discuss digest..." Today's Topics: 1. Re: Install Expat on Mac os X tiger (Moncef Mezghani) 2. How to install Expat library (rameshs) 3. libxml2 vs expat (Jimmy Zhang) ---------------------------------------------------------------------- Message: 1 Date: Thu, 17 May 2007 10:54:54 +0200 From: Moncef Mezghani Subject: Re: [Expat-discuss] Install Expat on Mac os X tiger To: "Didier MOUNIEN" Cc: expat-discuss at libexpat.org Message-ID: <95AFE1AA-9C51-45B6-80C0-8EEE860691A5 at free.fr> Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed There is an other way to install it, use 'fink' and 'apt-get Install' ... Now if you are sure installation has done correctly, may be you can use in terminal console command: locate expat.h see (man locate) to see where installation has put the include file. and add its path to path search list. Moncef. Le 16 mai 07 ? 12:26, Didier MOUNIEN a ?crit : > I just Installed expat on mac os X tiger and when i tried to build a > simple prog with XCODE, it can't retrieve expat.h. On command line > neither. Though expat seems to be installed we no problem (i can > access the man ...). Is there a specific way to install it on mac os? > Further more, what about a C++ wrapper, i am looking an easy to > understand one, cause i am not a C++ guy . > thanks for help ? > Did > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > ------------------------------ Message: 2 Date: Thu, 17 May 2007 17:18:36 +0530 From: "rameshs" Subject: [Expat-discuss] How to install Expat library To: Message-ID: <00ad01c79879$4f33fb60$5100a8c0 at silokablr.com> Content-Type: text/plain; charset="us-ascii" Hi Karl, I am new to this XML parsing, can you help me how this EXPAT library for a beginner like me, basically am looking for a parser which gives me all the data and tags in to a file. Can you help me on this, thanks in advance. Thanks & Regards, Ramesh S, ------------------------------ Message: 3 Date: Thu, 17 May 2007 20:19:57 -0700 From: "Jimmy Zhang" Subject: [Expat-discuss] libxml2 vs expat To: Message-ID: <00c401c798fb$6d9ce300$0d02a8c0 at ximpleware> Content-Type: text/plain; charset="iso-8859-1" Hi, Does anyone have any references comparing expat with libXML2 in parsing throughput? In general, should I expect the similar level of performance? Thanks, jz ------------------------------ _______________________________________________ Expat-discuss mailing list Expat-discuss at libexpat.org http://mail.libexpat.org/mailman/listinfo/expat-discuss End of Expat-discuss Digest, Vol 86, Issue 6 ******************************************** The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com From karl at waclawek.net Fri May 18 15:19:39 2007 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 18 May 2007 09:19:39 -0400 Subject: [Expat-discuss] libxml2 vs expat In-Reply-To: <00c401c798fb$6d9ce300$0d02a8c0@ximpleware> References: <00c401c798fb$6d9ce300$0d02a8c0@ximpleware> Message-ID: <464DA7EB.1000806@waclawek.net> Jimmy Zhang wrote: > Hi, Does anyone have any references comparing expat with libXML2 in > parsing throughput? In general, should I expect the similar level of performance? > http://xmlbench.sourceforge.net/ has some benchmarks (a couple of years old). According to these Expat is slightly faster than libxml, but not by much. Karl From zlgodguy at 163.com Mon May 21 09:02:18 2007 From: zlgodguy at 163.com (zlgodguy) Date: Mon, 21 May 2007 15:02:18 +0800 (CST) Subject: [Expat-discuss] about expat2-0-0 Message-ID: <114060206.490011179730938843.JavaMail.root@bj163app127.163.com> hi all: i need your help about expat2-0-0. i want to know how modify or add a node in xml file by expat api or other file. please give me example. thanks From binary.chen at gmail.com Thu May 24 08:36:33 2007 From: binary.chen at gmail.com (Bin Chen) Date: Thu, 24 May 2007 14:36:33 +0800 Subject: [Expat-discuss] How to get the value of an element Message-ID: <1179988593.18952.1.camel@binch-desktop> Hi, For a xml like: abc The vaule of element f2 is abc, now I want to get this value, how can? Thanks a lot. Bin From binary.chen at gmail.com Thu May 24 09:22:08 2007 From: binary.chen at gmail.com (Bin Chen) Date: Thu, 24 May 2007 15:22:08 +0800 Subject: [Expat-discuss] How to get the value of an element In-Reply-To: <7C83A8A6B56D3A478333B1DF47E18586A6C39F@MPBABGEX01.corp.mphasis.com> References: <1179988593.18952.1.camel@binch-desktop> <7C83A8A6B56D3A478333B1DF47E18586A6C39F@MPBABGEX01.corp.mphasis.com> Message-ID: <1179991328.19416.3.camel@binch-desktop> Thanks, but that's not I want, I translate below code to C and run, then find many useless char is also captured. I don't need any other useless char other than the value of f2, is there any better way to do this? The CharacterDataHandler seems a handler for every char that not processed by "element handler", is it right? So something like '\t' and '\n' are also be handled, and it's totally context free. I want the content of f2, it is context sensitive, there are no API for this? ? 2007-05-24?? 12:28 +0530?Mukesh Kumar??? > > > Hi, > > Try this > > > > > > import xml.parsers.expat > > > > # 3 handler functions > > def start_element(name, attrs): > > print 'Start element:', name, attrs > > def end_element(name): > > print 'End element:', name > > def char_data(data): > > print 'Character data:', repr(data) > > > > p = xml.parsers.expat.ParserCreate() > > > > p.StartElementHandler = start_element > > p.EndElementHandler = end_element > > p.CharacterDataHandler = char_data > > > > p.Parse(""" > > Text goes here > > More text > > """) > > > > > > You can you a file to read the xml contents and read the xml document > and in a while loop , and extract the contents. > > > > > > Here is my small web-page: > > http://www.geocities.com/muki_champs > > > > Regards, > > Mukesh Srivastav, > > Sr.Software Engineer, > > India, > > Bangalore. > > > > -----Original Message----- > From: expat-discuss-bounces at libexpat.org > [mailto:expat-discuss-bounces at libexpat.org] On Behalf Of Bin Chen > Sent: Thursday, May 24, 2007 12:07 PM > To: expat-discuss at libexpat.org > Subject: [Expat-discuss] How to get the value of an element > > > > Hi, > > > > For a xml like: > > > > > > > > abc > > > > > > > > > > The vaule of element f2 is abc, now I want to get this value, how can? > > > > Thanks a lot. > > > > Bin > > > > _______________________________________________ > > Expat-discuss mailing list > > Expat-discuss at libexpat.org > > http://mail.libexpat.org/mailman/listinfo/expat-discuss > > From Mukesh.S at mphasis.com Thu May 24 08:58:39 2007 From: Mukesh.S at mphasis.com (Mukesh Kumar) Date: Thu, 24 May 2007 12:28:39 +0530 Subject: [Expat-discuss] How to get the value of an element In-Reply-To: <1179988593.18952.1.camel@binch-desktop> References: <1179988593.18952.1.camel@binch-desktop> Message-ID: <7C83A8A6B56D3A478333B1DF47E18586A6C39F@MPBABGEX01.corp.mphasis.com> Hi, Try this import xml.parsers.expat # 3 handler functions def start_element(name, attrs): print 'Start element:', name, attrs def end_element(name): print 'End element:', name def char_data(data): print 'Character data:', repr(data) p = xml.parsers.expat.ParserCreate() p.StartElementHandler = start_element p.EndElementHandler = end_element p.CharacterDataHandler = char_data p.Parse(""" Text goes here More text """) You can you a file to read the xml contents and read the xml document and in a while loop , and extract the contents. Here is my small web-page: http://www.geocities.com/muki_champs Regards, Mukesh Srivastav, Sr.Software Engineer, India, Bangalore. -----Original Message----- From: expat-discuss-bounces at libexpat.org [mailto:expat-discuss-bounces at libexpat.org] On Behalf Of Bin Chen Sent: Thursday, May 24, 2007 12:07 PM To: expat-discuss at libexpat.org Subject: [Expat-discuss] How to get the value of an element Hi, For a xml like: abc The vaule of element f2 is abc, now I want to get this value, how can? Thanks a lot. Bin _______________________________________________ Expat-discuss mailing list Expat-discuss at libexpat.org http://mail.libexpat.org/mailman/listinfo/expat-discuss From webmaster at hartwork.org Thu May 24 23:44:42 2007 From: webmaster at hartwork.org (Sebastian Pipping) Date: Thu, 24 May 2007 23:44:42 +0200 Subject: [Expat-discuss] How to get the value of an element In-Reply-To: <1179991328.19416.3.camel@binch-desktop> References: <1179988593.18952.1.camel@binch-desktop> <7C83A8A6B56D3A478333B1DF47E18586A6C39F@MPBABGEX01.corp.mphasis.com> <1179991328.19416.3.camel@binch-desktop> Message-ID: <4656074A.70702@hartwork.org> Bin Chen wrote: > Thanks, but that's not I want, I translate below code to C and run, then > find many useless char is also captured. I don't need any other useless > char other than the value of f2, is there any better way to do this? > > The CharacterDataHandler seems a handler for every char that not > processed by "element handler", is it right? So something like '\t' and > '\n' are also be handled, and it's totally context free. > > I want the content of f2, it is context sensitive, there are no API for > this? ----------------------------------------------------------- No. Expat is a "low level" parser so you trade in more work for more speed and flexibility. You will have to track tag starts and endings yourself (or use a DOM parser instead). Expat has to report all the whitespace since it by default is significant. Before throwing the whitespace away make sure the language/schema you parse tells you to. Are you implementing your own or a known format? Sebastian From ramprasad.i82 at gmail.com Fri May 25 17:12:52 2007 From: ramprasad.i82 at gmail.com (Ramprasad B) Date: Fri, 25 May 2007 20:42:52 +0530 Subject: [Expat-discuss] Parsing CDATA section Message-ID: <7c75739b0705250812h22647102ie12b0ccc01108a1f@mail.gmail.com> Hello expat`ers, I have stuck up with parsing the CDATA section in an XML file. I could not even find an example source for this. I would like to parse the following XML file, and extract `Hello World' string: Please help me with this. Thanks in advance for your help ! -- Ramprasad B From ramprasad.i82 at gmail.com Fri May 25 17:36:31 2007 From: ramprasad.i82 at gmail.com (Ramprasad B) Date: Fri, 25 May 2007 21:06:31 +0530 Subject: [Expat-discuss] Parsing CDATA section Message-ID: <7c75739b0705250836m149de394w6e24456bacb96692@mail.gmail.com> Hello expat`ers, I have stuck up with parsing the CDATA section in an XML file. I could not even find an example source for this. I would like to parse the following XML file, and extract `Hello World' string: Please help me with this. Thanks in advance for your help ! -- Ramprasad B From crazybob at crazybob.org Fri May 25 17:57:20 2007 From: crazybob at crazybob.org (Bob Lee) Date: Fri, 25 May 2007 08:57:20 -0700 Subject: [Expat-discuss] Parsing CDATA section In-Reply-To: <7c75739b0705250812h22647102ie12b0ccc01108a1f@mail.gmail.com> References: <7c75739b0705250812h22647102ie12b0ccc01108a1f@mail.gmail.com> Message-ID: Have you tried SetCdataSectionHandler? ( http://www.xml.com/pub/a/1999/09/expat/index.html?page=3#cdatahandler) I haven't myself--I just assumed Expat would treat CDATA sections just like any other XML text. Bob On 5/25/07, Ramprasad B wrote: > > Hello expat`ers, > > I have stuck up with parsing the CDATA section in an XML file. I could not > even find an example source for this. > > I would like to parse the following XML file, and extract `Hello World' > string: > > > > > > > > Please help me with this. > > Thanks in advance for your help ! > > -- > Ramprasad B > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > From crazybob at crazybob.org Fri May 25 17:57:20 2007 From: crazybob at crazybob.org (Bob Lee) Date: Fri, 25 May 2007 08:57:20 -0700 Subject: [Expat-discuss] Parsing CDATA section In-Reply-To: <7c75739b0705250812h22647102ie12b0ccc01108a1f@mail.gmail.com> References: <7c75739b0705250812h22647102ie12b0ccc01108a1f@mail.gmail.com> Message-ID: Have you tried SetCdataSectionHandler? ( http://www.xml.com/pub/a/1999/09/expat/index.html?page=3#cdatahandler) I haven't myself--I just assumed Expat would treat CDATA sections just like any other XML text. Bob On 5/25/07, Ramprasad B wrote: > > Hello expat`ers, > > I have stuck up with parsing the CDATA section in an XML file. I could not > even find an example source for this. > > I would like to parse the following XML file, and extract `Hello World' > string: > > > > > > > > Please help me with this. > > Thanks in advance for your help ! > > -- > Ramprasad B > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > _______________________________________________ Expat-discuss mailing list Expat-discuss at libexpat.org http://mail.libexpat.org/mailman/listinfo/expat-discuss From ramprasad.i82 at gmail.com Fri May 25 17:12:52 2007 From: ramprasad.i82 at gmail.com (Ramprasad B) Date: Fri, 25 May 2007 20:42:52 +0530 Subject: [Expat-discuss] Parsing CDATA section Message-ID: <7c75739b0705250812h22647102ie12b0ccc01108a1f@mail.gmail.com> Hello expat`ers, I have stuck up with parsing the CDATA section in an XML file. I could not even find an example source for this. I would like to parse the following XML file, and extract `Hello World' string: Please help me with this. Thanks in advance for your help ! -- Ramprasad B _______________________________________________ Expat-discuss mailing list Expat-discuss at libexpat.org http://mail.libexpat.org/mailman/listinfo/expat-discuss From ramprasad.i82 at gmail.com Sat May 26 08:00:05 2007 From: ramprasad.i82 at gmail.com (Ramprasad B) Date: Sat, 26 May 2007 11:30:05 +0530 Subject: [Expat-discuss] Parsing CDATA section In-Reply-To: References: <7c75739b0705250812h22647102ie12b0ccc01108a1f@mail.gmail.com> Message-ID: <7c75739b0705252300y126ac243p15244e595df20a71@mail.gmail.com> On 5/25/07, Bob Lee wrote: Have you tried SetCdataSectionHandler? ( > http://www.xml.com/pub/a/1999/09/expat/index.html?page=3#cdatahandler) Well, i went through that, but could not figure out how i can proceed . I haven't myself--I just assumed Expat would treat CDATA sections just like > any other XML text. That's true. Ramprasad B From marco.forberg at gmx.net Sat May 26 08:51:52 2007 From: marco.forberg at gmx.net (Marco Forberg) Date: Sat, 26 May 2007 08:51:52 +0200 Subject: [Expat-discuss] Parsing CDATA section In-Reply-To: <7c75739b0705252300y126ac243p15244e595df20a71@mail.gmail.com> Message-ID: <20070526064716.357621E4007@bag.python.org> >Have you tried SetCdataSectionHandler? ( >> http://www.xml.com/pub/a/1999/09/expat/index.html?page=3#cdatahandler) > > >Well, i went through that, but could not figure out how i can proceed . > >I haven't myself--I just assumed Expat would treat CDATA sections just like >> any other XML text. > > >That's true. > >Ramprasad B Expat treats CDATA as any other text BUT calls the cdata start and end handlers before and after it passes your text to the text handler. So the the order of the calls would be the following: CDATA section start handler Text handler <-- this is your CDATA CDATA section end handler From ramprasad.i82 at gmail.com Sun May 27 13:26:53 2007 From: ramprasad.i82 at gmail.com (Ramprasad B) Date: Sun, 27 May 2007 16:56:53 +0530 Subject: [Expat-discuss] Parsing CDATA section In-Reply-To: <4657d7f1.14ead4ab.0184.ffff97feSMTPIN_ADDED@mx.google.com> References: <7c75739b0705252300y126ac243p15244e595df20a71@mail.gmail.com> <4657d7f1.14ead4ab.0184.ffff97feSMTPIN_ADDED@mx.google.com> Message-ID: <7c75739b0705270426t5b35a58dp96392e59ca68cbe3@mail.gmail.com> On 5/26/07, Marco Forberg wrote: Expat treats CDATA as any other text BUT calls the cdata start and end > handlers before and after it passes your text to the text handler. So the > the order of the calls would be the following: > > CDATA section start handler > Text handler <-- this is your CDATA > CDATA section end handler Here's my code: static void XMLCALL startcData(void *userData){ int *depthPtr = (int *)userData; /*I am unable to get cdata here. userData contains a single byte here*/ } static void XMLCALL endcData(void *userData){ ; } int main(){ XML_SetCdataSectionHandler(parser, startcData, endcData); } XML file: I see that when the CDATA section starts, the startcData fucntion is called. It's still not working ! BTW, which ptr points to CDATA section when the startcData is called ? Thanks ! - Ramprasad B From crazybob at crazybob.org Sun May 27 17:16:06 2007 From: crazybob at crazybob.org (Bob Lee) Date: Sun, 27 May 2007 08:16:06 -0700 Subject: [Expat-discuss] Parsing CDATA section In-Reply-To: <7c75739b0705270426t5b35a58dp96392e59ca68cbe3@mail.gmail.com> References: <7c75739b0705252300y126ac243p15244e595df20a71@mail.gmail.com> <4657d7f1.14ead4ab.0184.ffff97feSMTPIN_ADDED@mx.google.com> <7c75739b0705270426t5b35a58dp96392e59ca68cbe3@mail.gmail.com> Message-ID: CdataSectionHandler just tells you when the section starts and ends. I assume you need to use CharacterDataHandler to get the actual text. Bob On 5/27/07, Ramprasad B wrote: > > Here's my code: > > static void XMLCALL > startcData(void *userData){ > int *depthPtr = (int *)userData; > /*I am unable to get cdata here. userData contains a single byte here*/ > } > > static void XMLCALL > endcData(void *userData){ > ; > } > > int main(){ > XML_SetCdataSectionHandler(parser, startcData, endcData); > } > > XML file: > > > > > > > > I see that when the CDATA section starts, the startcData fucntion is > called. > > It's still not working ! BTW, which ptr points to CDATA section when the > startcData is called ? > > Thanks ! > - > Ramprasad B From JRancier at penntraffic.com Tue May 29 18:46:17 2007 From: JRancier at penntraffic.com (JRancier at penntraffic.com) Date: Tue, 29 May 2007 16:46:17 +0000 (UTC) Subject: [Expat-discuss] Compiling using Metaware HighC compiler Message-ID: Greetings, I need to use the parser on a platform which has zero GNU tools nor libraries. Has any tried to compile expat using the Metaware HighC compiler? TIA. Jeff From boris at codesynthesis.com Tue May 29 21:06:20 2007 From: boris at codesynthesis.com (Boris Kolpackov) Date: Tue, 29 May 2007 19:06:20 +0000 (UTC) Subject: [Expat-discuss] Compiling using Metaware HighC compiler References: Message-ID: Hi Jeff, JRancier at penntraffic.com writes: > I need to use the parser on a platform which has zero GNU tools nor > libraries. Has any tried to compile expat using the Metaware HighC > compiler? While I haven't tried to build Expat with Metaware HighC, I have built it for several embedded systems that don't have any GNU tools. The idea is to create a config file for your platform and manually compile xmlparse.c, xmlrole.c, and xmltok.c found in the expat-X.Y.Z/lib. The winconfig.h is a good starting point for creating your own config file. Also search for winconfig.h in the above C files for places where you will need to plug in your custom config file. hth, -boris -- Boris Kolpackov Code Synthesis Tools CC http://www.codesynthesis.com Open-Source, Cross-Platform C++ XML Data Binding From JRancier at penntraffic.com Wed May 30 14:52:13 2007 From: JRancier at penntraffic.com (JRancier at penntraffic.com) Date: Wed, 30 May 2007 12:52:13 +0000 (UTC) Subject: [Expat-discuss] Compiling using Metaware HighC compiler References: Message-ID: Thanks Boris, I didn't pay attention to the winconfig.h file, but I will. I did start the down the same path, but the compiler is really choking on the syntax. I'm using the expat.dsp file with an older version of visual studio, as to export an nmake.mak makefile, simply as a starting point. Here's a snippet of the compile errors, this is with expat 2.0, perhaps I should go to an older revision: xmlparse.286: xmlparse.obj: xmlparse.c: expat.mak: xmlparse.obj is non-existent. xmlparse.a86: xmlparse.c: hcdrv xmlparse.c -c -Iptk:inc/ -Iipinc: -DHAVE_MEMMOVE -I../inc MetaWare High C Compiler 1.7 Copyright (C) 1983-91 MetaWare Incorporated. Serial 2-600434. E "xmlparse.c",L299/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L300/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L301/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L302/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L310/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L311/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L312/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L313/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L314/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L315/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L316/C18: | A functionality typedef cannot be used in a function definition. E "xmlparse.c",L761/C28: | Type `void *' is not assignment compatible with type `XML_UnknownEncodingHandler' (at "expat.h",L528/C13). E "xmlparse.c",L805/C25: | Type `void *' is not assignment compatible with type `XML_StartElementHandler' (at "expat.h",L252/C14). E "xmlparse.c",L806/C23: | Type `void *' is not assignment compatible with type `XML_EndElementHandler' (at "expat.h",L256/C14). E "xmlparse.c",L807/C26: | Type `void *' is not assignment compatible with type `XML_CharacterDataHandler' (at "expat.h",L261/C14). E "xmlparse.c",L808/C34: | Type `void *' is not assignment compatible with type `XML_ProcessingInstructionHandler' (at "expat.h",L266/C14). E "xmlparse.c",L809/C20: | Type `void *' is not assignment compatible with type `XML_CommentHandler' (at "expat.h",L272/C14). E "xmlparse.c",L810/C30: | Type `void *' is not assignment compatible with type `XML_StartCdataSectionHandler' (at "expat.h",L275/C14). E "xmlparse.c",L811/C28: | Type `void *' is not assignment compatible with type `XML_EndCdataSectionHandler' (at "expat.h",L276/C14). E "xmlparse.c",L812/C20: | Type `void *' is not assignment compatible with type `XML_DefaultHandler' (at "expat.h",L291/C14). E "xmlparse.c",L813/C29: | Type `void *' is not assignment compatible with type `XML_StartDoctypeDeclHandler' (at "expat.h",L298/C14). E "xmlparse.c",L814/C27: | Type `void *' is not assignment compatible with type `XML_EndDoctypeDeclHandler' (at "expat.h",L309/C14). E "xmlparse.c",L815/C31: | Type `void *' is not assignment compatible with type `XML_UnparsedEntityDeclHandler' (at "expat.h",L353/C14). E "xmlparse.c",L816/C25: | Type `void *' is not assignment compatible with type `XML_NotationDeclHandler' (at "expat.h",L365/C14). E "xmlparse.c",L817/C31: | Type `void *' is not assignment compatible with type `XML_StartNamespaceDeclHandler' (at "expat.h",L378/C14). E "xmlparse.c",L818/C29: | Type `void *' is not assignment compatible with type `XML_EndNamespaceDeclHandler' (at "expat.h",L383/C14). E "xmlparse.c",L819/C26: | Type `void *' is not assignment compatible with type `XML_NotStandaloneHandler' (at "expat.h",L396/C13). E "xmlparse.c",L820/C30: | Type `void *' is not assignment compatible with type `XML_ExternalEntityRefHandler' (at "expat.h",L432/C13). E "xmlparse.c",L822/C26: | Type `void *' is not assignment compatible with type `XML_SkippedEntityHandler' (at "expat.h",L449/C14). E "xmlparse.c",L823/C24: | Type `void *' is not assignment compatible with type `XML_ElementDeclHandler' (at "expat.h",L150/C14). E "xmlparse.c",L824/C24: | Type `void *' is not assignment compatible with type `XML_AttlistDeclHandler' (at "expat.h",L166/C14). E "xmlparse.c",L825/C23: | Type `void *' is not assignment compatible with type `XML_EntityDeclHandler' (at "expat.h",L329/C14). E "xmlparse.c",L826/C20: | Type `void *' is not assignment compatible with type `XML_XmlDeclHandler' (at "expat.h",L186/C14). E "xmlparse.c",L854/C28: MetaWare High C Compiler 1.7 29-May-:7 15:32:55 xmlparse.c Page 1 | Type `void *' is not assignment compatible with type `void (*)( void *)'. E "xmlparse.c",L1469/C12: From boris at codesynthesis.com Wed May 30 20:52:14 2007 From: boris at codesynthesis.com (Boris Kolpackov) Date: Wed, 30 May 2007 18:52:14 +0000 (UTC) Subject: [Expat-discuss] Compiling using Metaware HighC compiler References: Message-ID: Hi Jeff, JRancier at penntraffic.com writes: > Here's a snippet of the compile errors, this is with expat 2.0, perhaps > I should go to an older revision: The compiler seems to be rather old so I am not sure how much older you will have to go to get it compiling. I think a more fruitful approach would be to try to fix the compile errors. For the first error adding pointer to the Processor typedef might help. hth, -boris -- Boris Kolpackov Code Synthesis Tools CC http://www.codesynthesis.com Open-Source, Cross-Platform C++ XML Data Binding