From jzhang at ximpleware.com Tue Mar 3 03:43:23 2009 From: jzhang at ximpleware.com (jzhang at ximpleware.com) Date: Mon, 02 Mar 2009 20:43:23 -0600 Subject: [Expat-discuss] [ANN]VTD-XML 2.5 In-Reply-To: References: Message-ID: VTD-XML 2.5 is now released. Please go to https://sourceforge.net/project/showfiles.php?group_id=110612&package_id=120 172&release_id=661376 to download the latest version. Changes from Version 2.4 (2/2009) * Added separate VTD indexing generating and loading (see http://vtd-xml.sf.net/persistence.html for further info) * Integrated extended VTD supporting 256 GB doc (In Java only). * Added duplicateNav() for replicate multiple VTDNav instances sharing XML, VTD and LC buffer (availabe in Java and C#). * Various bug fixes and enhancements. From howachen at gmail.com Sat Mar 7 10:54:53 2009 From: howachen at gmail.com (howard chen) Date: Sat, 7 Mar 2009 17:54:53 +0800 Subject: [Expat-discuss] Process XML, using multi-thread Message-ID: Sometimes I need to parse a very large XML file, e.g. wikipedia dump. My code is quite optimized currently....but CPU bounded. So it is possible to split the file and parse using multithread? E.g. on a SMP quad core CPU, maybe the speed can be boost up by 5-6 times. From david.gregorczyk at googlemail.com Tue Mar 10 14:07:57 2009 From: david.gregorczyk at googlemail.com (David Gregorczyk) Date: Tue, 10 Mar 2009 14:07:57 +0100 Subject: [Expat-discuss] Why is expat extremely fast? Message-ID: <8b6fc2f00903100607g1f3b8c76r5e641d0888956a5f@mail.gmail.com> Hey guys, I'm a student from the University of Luebeck in Germany. For my diploma thesis I've just created a small xml parser of ca. 1000 lines of code and compared its speed to the expat parser. My results are poor: my parser's speed is only a half of expat's speed. I worked with an endless loop performing switch and if structures. Sometimes a callback function requests more buffer. Once the loop detects a final flag which is set to 1, it will parse the rest of the input stream and exit. The parser works fine but is so lame. What's the reason expat reaches such a great performance? I've tried to understand the source code and failed :-) It's just confusing me. There is so much functionality I do not perform... but my application is still slower. Is there any trick which makes expat so fast? I've searched the internet for an answer, no one could help me. Could you? Best regards and many thanks for your replies! From aleix at member.fsf.org Tue Mar 10 14:15:55 2009 From: aleix at member.fsf.org (=?ISO-8859-1?Q?Aleix_Conchillo_Flaqu=E9?=) Date: Tue, 10 Mar 2009 14:15:55 +0100 Subject: [Expat-discuss] Why is expat extremely fast? In-Reply-To: <8b6fc2f00903100607g1f3b8c76r5e641d0888956a5f@mail.gmail.com> References: <8b6fc2f00903100607g1f3b8c76r5e641d0888956a5f@mail.gmail.com> Message-ID: <49B6680B.6000806@member.fsf.org> David Gregorczyk wrote: > Hey guys, > > I'm a student from the University of Luebeck in Germany. For my diploma > thesis I've just created a small xml parser of ca. 1000 lines of code and > compared its speed to the expat parser. My results are poor: my parser's > speed is only a half of expat's speed. I worked with an endless loop > performing switch and if structures. Sometimes a callback function requests > more buffer. Once the loop detects a final flag which is set to 1, it will > parse the rest of the input stream and exit. The parser works fine but is so > lame. What's the reason expat reaches such a great performance? > I've tried to understand the source code and failed :-) It's just confusing > me. There is so much functionality I do not perform... but my application is > still slower. > > Is there any trick which makes expat so fast? > > I've searched the internet for an answer, no one could help me. Could you? > > Best regards and many thanks for your replies! > I'm not an Expat expert, but may be your problem is with "...a callback function requests more buffer". Does this mean you are allocating memory while parsing? If so, that might be your problem. Memory allocation is slow. Cheers, Aleix From david.gregorczyk at googlemail.com Wed Mar 11 00:26:14 2009 From: david.gregorczyk at googlemail.com (David Gregorczyk) Date: Wed, 11 Mar 2009 00:26:14 +0100 Subject: [Expat-discuss] Why is expat extremely fast? In-Reply-To: <49B6680B.6000806@member.fsf.org> References: <8b6fc2f00903100607g1f3b8c76r5e641d0888956a5f@mail.gmail.com> <49B6680B.6000806@member.fsf.org> Message-ID: <49B6F716.3080304@googlemail.com> Aleix Conchillo Flaqu? schrieb: > > David Gregorczyk wrote: >> Hey guys, >> >> I'm a student from the University of Luebeck in Germany. For my diploma >> thesis I've just created a small xml parser of ca. 1000 lines of code >> and >> compared its speed to the expat parser. My results are poor: my parser's >> speed is only a half of expat's speed. I worked with an endless loop >> performing switch and if structures. Sometimes a callback function >> requests >> more buffer. Once the loop detects a final flag which is set to 1, it >> will >> parse the rest of the input stream and exit. The parser works fine >> but is so >> lame. What's the reason expat reaches such a great performance? >> I've tried to understand the source code and failed :-) It's just >> confusing >> me. There is so much functionality I do not perform... but my >> application is >> still slower. >> >> Is there any trick which makes expat so fast? >> >> I've searched the internet for an answer, no one could help me. Could >> you? >> >> Best regards and many thanks for your replies! >> > > I'm not an Expat expert, but may be your problem is with "...a > callback function requests more buffer". Does this mean you are > allocating memory while parsing? If so, that might be your problem. > Memory allocation is slow. > > Cheers, > > Aleix > Thanks for your fast answer. I've found the solution for myself. Because of the deployment of my application on sensor nodes, memory is once allocated at the beginning of my algorithm and a custom programmed memory suite with limited size is used to manage data coming up from a stream. The memory suite uses a large char array (actually there is UTF-8 support only) and is constructed as a ring buffer. I compute the current position in that buffer with the modulo operator, the bottleneck of my algorithm. Modulo needs too much cpu cycles. To speed up the position computation, I used the compiler optimization (insigned int)[...] % BUFFER_SIZE_CONSTANT with BUFFER_SIZE_CONSTANT as a power of 2. Now my code runs faster than expat, I'm so lucky! :-) Greets, David From apoorvag at tataelxsi.co.in Fri Mar 13 11:43:13 2009 From: apoorvag at tataelxsi.co.in (apoorvag) Date: Fri, 13 Mar 2009 16:13:13 +0530 Subject: [Expat-discuss] expat build errors on Mac Message-ID: Hi, I am trying to build expat (2.0.1) on Mac which is required for hal(0.5.11), Initially while building expat there are no errors ..But while building hal the following error occurs .. Please help me to build hal using expat . expat_configure.txt ------ ./configure expat_make.txt ------ make expat_make_install.txt ------ make install hal_configure.txt ------ ./configure checking expat.h usability... yes checking expat.h presence... yes checking for expat.h... yes checking for XML_ParserCreate in -lexpat... no configure: error: Can't find expat library. Please install expat. Thanks..... This message (including any attachment) is confidential and may be legally privileged. Access to this message by anyone other than the intended recipient(s) listed above is unauthorized. If you are not the intended recipient you are hereby notified that any disclosure, copying, or distribution of the message, or any action taken or omission of action by you in reliance upon it, is prohibited and may be unlawful. Please immediately notify the sender by reply e-mail and permanently delete all copies of the message if you have received this message in error. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: expat_configure.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: expat_make.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: expat_make_install.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: hal_configure.txt URL: From boris.dusek at gmail.com Sat Mar 14 19:03:06 2009 From: boris.dusek at gmail.com (=?UTF-8?B?Qm9yaXMgRHXFoWVr?=) Date: Sat, 14 Mar 2009 19:03:06 +0100 Subject: [Expat-discuss] XML_CharacterDataHandler: can it receive text cut half inside a multibyte character sequence? Message-ID: Hello, I am using expat with libiconv to convert data to wchar_t, but this is a valid question for any other non-UTF8 target encoding, not just wchar_t (i.e. for ISO-8859-2): When expat calls the function set by XML_SetCharacterDataHandler, can the function receive a block of text (with parameters const XML_Char *s, int len) such that it ends in the middle of a multibyte character? (i.e. there is a unicode character encoded as a sequence of 2-4 bytes, and the block's last character, s[len-1], is a character of a multibyte sequence that is not a last character of such multibyte sequence). I can still think of a solution (i.e. copy the last incomplete multibyte sequence as indicated by iconv to beginning of a buffer and on next call to the data handler, copy the rest of data to the buffer and call iconv again - but this involves copying and possibly dynamically allocating the buffer and I want to avoid that), but it would be great if expat did not end in the middle of a multibyte sequence. Thanks for any answer, Boris Du?ek From karl at waclawek.net Sat Mar 14 22:05:13 2009 From: karl at waclawek.net (Karl Waclawek) Date: Sat, 14 Mar 2009 17:05:13 -0400 Subject: [Expat-discuss] XML_CharacterDataHandler: can it receive text cut half inside a multibyte character sequence? In-Reply-To: References: Message-ID: <49BC1C09.3050704@waclawek.net> Boris Du?ek wrote: > When expat calls the function set by XML_SetCharacterDataHandler, can > the function receive a block of text (with parameters const XML_Char > *s, int len) such that it ends in the middle of a multibyte character? > (i.e. there is a unicode character encoded as a sequence of 2-4 bytes, > and the block's last character, s[len-1], is a character of a > multibyte sequence that is not a last character of such multibyte > sequence). > but it would be great if expat did not end in the middle of a > multibyte sequence. > Expat should not return partial characters, though it can handle partial characters on input (unless it is the last input buffer, of course). Btw, there is also the UTF-16 version of Expat - libexpatw, returning UTF-16 encoded content. Karl