From SheelR@aol.com Wed Aug 1 08:06:12 2001 From: SheelR@aol.com (SheelR@aol.com) Date: Wed, 1 Aug 2001 03:06:12 EDT Subject: [Expat-discuss] is expat a reentrant parser Message-ID: <66.1243c99b.28990464@aol.com> Folks, I am planning to use expat parser in favor of xerces since expat has a very small footprint when compared to xerces. The questions that I have are Is expat reentrant? Is expat thread safe? Also, with xerces , to process 40000 elements using SAX ( size of xml buffer is roughly 2MB) takes 8 secs. I want to get it down to a couple of secs. Please advice. ~RR From fdrake@acm.org Thu Aug 2 18:36:11 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 2 Aug 2001 13:36:11 -0400 (EDT) Subject: [Expat-discuss] is expat a reentrant parser In-Reply-To: <66.1243c99b.28990464@aol.com> References: <66.1243c99b.28990464@aol.com> Message-ID: <15209.36747.450396.320026@cj42289-a.reston1.va.home.com> SheelR@aol.com writes: > Is expat reentrant? I'm not entirely sure what you mean here. If you create several parsers, you should be able to use them at the same time (from different threads or whatever). You probably don't want to muck around with a parser while it is actively parsing, aside from calling the defined APIs to add/change callbacks. > Is expat thread safe? My answer above should be sufficient; if not, please explain what you mean. Realize that while Expat is parsing, it controls the flow in its thread. > Also, with xerces , to process 40000 elements using SAX > ( size of xml buffer is roughly 2MB) > takes 8 secs. I want to get it down to a couple of secs. Please advice. Just measure it -- you didn't say anything about your hardware, so no one has any way to judge, even if they've already done relevant measurements (I haven't). If Expat is fast enough, that's great. If not, there are other parsers available. I don't really have the resources to spend performance-tuning Expat unless you want to fund the work through my employer. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From josh.martin@abq.sc.philips.com Fri Aug 3 17:49:28 2001 From: josh.martin@abq.sc.philips.com (Josh Martin) Date: Fri, 3 Aug 2001 09:49:28 -0700 Subject: [Expat-discuss] is expat a reentrant parser Message-ID: <200108031649.JAA17751@geocrawler.com> This message was sent from Geocrawler.com by "Josh Martin" >>SheelR@aol.com writes: >> Is expat reentrant? > > I'm not entirely sure what you mean here. If you create several >parsers, you should be able to use them at the same time (from >different threads or whatever). You probably don't want to muck >around with a parser while it is actively parsing, aside from calling >the defined APIs to add/change callbacks. > > -Fred > >-- >Fred L. Drake, Jr. >PythonLabs at Zope Corporation Reentrant Functions: Reentrant functions (and functions which are not interruptable by signals) are defined as functions that may be invoked, without restriction, from signal-catching functions. A function is reentrant only if, when invoked inside a signal-catching function, it does not adversly affect the normal flow of operations of the function or code that the signal-catching function interrupted. In other words, reentrant functions aren't going to unexpectedly change any critical values, and thus the result of the operations, if they are invoked in the middle of a function. I'm not positive that this is the meaning that SheelR was going for, but this is the "standard" meaning of a reentrant function. Personally, I would like to know the answer to this question, given this definition of reentrant. - Josh Martin _______________________________________________ Expat-discuss mailing list Expat-discuss@lists.sourceforge.net http://lists.sourceforge.net/lists/listinfo/expat-discuss Geocrawler.com - The Knowledge Archive From mballen@erols.com Fri Aug 3 19:16:38 2001 From: mballen@erols.com (Michael B. Allen) Date: Fri, 3 Aug 2001 14:16:38 -0400 Subject: [Expat-discuss] is expat a reentrant parser In-Reply-To: <200108031649.JAA17751@geocrawler.com>; from josh.martin@abq.sc.philips.com on Fri, Aug 03, 2001 at 09:49:28AM -0700 References: <200108031649.JAA17751@geocrawler.com> Message-ID: <20010803141638.A842@nano.foo.net> On Fri, Aug 03, 2001 at 09:49:28AM -0700, Josh Martin wrote: > This message was sent from Geocrawler.com by "Josh Martin" > >> Is expat reentrant? > > > > I'm not entirely sure what you mean here. If I recall seeing a message by James Clark that it is indeed reentrant but unforunately I don't remember where. This could have been changed of course. The sourceforge code is considerably different from Clark's original expat-1.2 source. Incedentally, how do these distributions differ? Is there a list of features that have been added? Mike From fdrake@acm.org Sat Aug 4 01:26:09 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 3 Aug 2001 20:26:09 -0400 (EDT) Subject: [Expat-discuss] is expat a reentrant parser In-Reply-To: <20010803141638.A842@nano.foo.net> References: <200108031649.JAA17751@geocrawler.com> <20010803141638.A842@nano.foo.net> Message-ID: <15211.16673.369393.251165@cj42289-a.reston1.va.home.com> Michael B. Allen writes: > I recall seeing a message by James Clark that it is indeed reentrant > but unforunately I don't remember where. This could have been changed > of course. The sourceforge code is considerably different from Clark's > original expat-1.2 source. I would not expect that quality of the code to have changed terribly, so I'll say that it *should* still hold. However, I've not analyzed it for that, and don't expect that I'll have time to do so. I suspect that it's also re-entrant using the thread-safe definition of re-entrancy, but again, have not done a serious analysis of it. It would be great for someone to do the analysis, but I don't think I'll have the time. If anyone can say that these forms of re-entrance are still allowable in the current codebase, we can be especially careful to preserve that. > Incedentally, how do these distributions differ? Is there a list of > features that have been added? This has been asked many times, but I don't recall 1.2 well enough off-hand. Please submit this as a documentation feature request in the Expat bug manager on SourceForge: http://sourceforge.net/projects/expat/ and I'll try to get to it in a few weeks, when my current swampage has a chance at subsiding. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From mballen@erols.com Sat Aug 4 04:11:36 2001 From: mballen@erols.com (Michael B. Allen) Date: Fri, 3 Aug 2001 23:11:36 -0400 Subject: [Expat-discuss] is expat a reentrant parser In-Reply-To: <15211.16673.369393.251165@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Fri, Aug 03, 2001 at 08:26:09PM -0400 References: <200108031649.JAA17751@geocrawler.com> <20010803141638.A842@nano.foo.net> <15211.16673.369393.251165@cj42289-a.reston1.va.home.com> Message-ID: <20010803231136.A5568@nano.foo.net> On Fri, Aug 03, 2001 at 08:26:09PM -0400, Fred L. Drake, Jr. wrote: > > Michael B. Allen writes: > > I recall seeing a message by James Clark that it is indeed reentrant > > but unforunately I don't remember where. This could have been changed > > of course. The sourceforge code is considerably different from Clark's > > original expat-1.2 source. > > I would not expect that quality of the code to have changed > terribly, so I'll say that it *should* still hold. Maintaining reentrancey is not a matter of quality but rather actively being conscientious about not modify data outside of the current stack context. From glancing at the code it does look like everything is carried with the XML_Parser object. At first glance something like: void XML_UseParserAsHandlerArg(XML_Parser parser) { handlerArg = parser; } Looks like setting a global but this, as well as many others like it, is actually a macro that sets the associated member of the XML_Parser object so it's ok of course. #define handlerArg (((Parser *)parser)->m_handlerArg) Mike From garys@ihug.com.au Tue Aug 7 06:29:39 2001 From: garys@ihug.com.au (Gary Stephenson) Date: Tue, 7 Aug 2001 15:29:39 +1000 Subject: [Expat-discuss] NSTriplets work? Message-ID: <000401c11f01$fb780620$fad7fea9@gateway> Hi all, I've just started work on interfacing expat to my Xbase++ XML processor. I am making excellent progress, and have learned a heap along the way - both about XML and C coding! Many thanks to all concerned, particularly JC . But.... (you just knew there was going to be a "but" somewhere didn't you?) I can't get the function XML_setReturnNSTriplet( ..) to work for element names - it only works for attributes. Browsing the source code confirms the observation - the only place where the parser's ns_triplets field is referred to is within the storeAtts(..) function. I am using expat (1.95.2) on Win2K. I am not terribly experienced as a C programmer - but I would be happy to have a go at fixing the problem if I could get confirmation that it really is a bug - and not a feature , and that nobody else is likely to be fixing it anytime soon. Many thanks in advance, gary stephenson From fdrake@acm.org Tue Aug 7 21:36:12 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 7 Aug 2001 16:36:12 -0400 (EDT) Subject: [Expat-discuss] NSTriplets work? In-Reply-To: <000401c11f01$fb780620$fad7fea9@gateway> References: <000401c11f01$fb780620$fad7fea9@gateway> Message-ID: <15216.20796.802962.436108@cj42289-a.reston1.va.home.com> Gary Stephenson writes: > I can't get the function XML_setReturnNSTriplet( ..) to work for element > names - it only works for attributes. Browsing the source code confirms the > observation - the only place where the parser's ns_triplets field is referred > to is within the storeAtts(..) function. This is a known bug: http://sourceforge.net/tracker/?func=detail&aid=231864&group_id=10127&atid=110127 but I have no idea when I'm going to have time to go code-diving after this one. > I am using expat (1.95.2) on Win2K. The bug also exists in at least 1.95.1, if not earlier. > I am not terribly experienced as a C programmer - but I would be happy to have > a go at fixing the problem if I could get confirmation that it really is a > bug - and not a feature , and that nobody else is likely to be fixing it > anytime soon. Any pointers to a solution would be welcome! If you develop a patch, please attach it to the existing bug report and I'll see about getting it checked in. Thanks for your interest! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From sam@uchicago.edu Wed Aug 8 00:01:15 2001 From: sam@uchicago.edu (Sam TH) Date: Tue, 7 Aug 2001 18:01:15 -0500 Subject: [Expat-discuss] Anyone using BeOS? In-Reply-To: <15205.45464.632584.320171@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Mon, Jul 30, 2001 at 03:12:24PM -0400 References: <15205.45464.632584.320171@cj42289-a.reston1.va.home.com> Message-ID: <20010807180115.B15677@uchicago.edu> ---------------------- multipart/signed attachment On Mon, Jul 30, 2001 at 03:12:24PM -0400, Fred L. Drake, Jr. wrote: >=20 > Does anyone here have a BeOS machine? I'd like to learn a little > more about the appearant __declspec support on that platform, since > past patches indicate that it differs from the MSVC support. > If anyone has a pointer to relevant documentation, I'd really > appreciate it. Sorry about the late response. =20 I'm the author of those patches. I no longer have acess to the BeOS machines they were tested on. As I remember it, the fundamental problem was that gcc, which we (AbiWord) use on x86 BeOS doesn't support __declspec. The Metroworks compiler does, I think. =20 =20 sam th --- sam@uchicago.edu --- http://www.abisource.com/~sam/ OpenPGP Key: CABD33FC --- http://samth.dyndns.org/key DeCSS: http://samth.dynds.org/decss ---------------------- multipart/signed attachment A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20010807/05400885/attachment.bin ---------------------- multipart/signed attachment-- From gstein@lyra.org Wed Aug 8 00:21:48 2001 From: gstein@lyra.org (Greg Stein) Date: Tue, 7 Aug 2001 16:21:48 -0700 Subject: [Expat-discuss] Anyone using BeOS? In-Reply-To: <20010807180115.B15677@uchicago.edu>; from sam@uchicago.edu on Tue, Aug 07, 2001 at 06:01:15PM -0500 References: <15205.45464.632584.320171@cj42289-a.reston1.va.home.com> <20010807180115.B15677@uchicago.edu> Message-ID: <20010807162148.P1414@lyra.org> On Tue, Aug 07, 2001 at 06:01:15PM -0500, Sam TH wrote: > On Mon, Jul 30, 2001 at 03:12:24PM -0400, Fred L. Drake, Jr. wrote: > > > > Does anyone here have a BeOS machine? I'd like to learn a little > > more about the appearant __declspec support on that platform, since > > past patches indicate that it differs from the MSVC support. > > If anyone has a pointer to relevant documentation, I'd really > > appreciate it. > > Sorry about the late response. > > I'm the author of those patches. I no longer have acess to the BeOS > machines they were tested on. As I remember it, the fundamental > problem was that gcc, which we (AbiWord) use on x86 BeOS doesn't > support __declspec. The Metroworks compiler does, I think. Shouldn't be a problem. I had seen some -checkins related to that stuff. I plan to just snarf what Apache is doing in that area. Apache's got lots of portability stuff :-) We should be able to clear it right up. Cheers, -g -- Greg Stein, http://www.lyra.org/ From asehgal@research.att.com Wed Aug 8 03:06:05 2001 From: asehgal@research.att.com (Amit Sehgal) Date: Tue, 7 Aug 2001 22:06:05 -0400 Subject: [Expat-discuss] Parsing Attributes. Message-ID: What function in expat handles parsing of attributes ? Ex. wonderful! I'd love to be able to cite it from various points in my > own project documentation. > > TIA, > > - - Mike > > - -----Original Message----- > From: Josh Martin [SMTP:josh.martin@abq.sc.philips.com] > Sent: Friday, August 03, 2001 12:49 PM > To: SheelR@aol.com; expat-discuss@lists.sourceforge.net > Subject: RE: [Expat-discuss] is expat a reentrant parser > > This message was sent from Geocrawler.com by "Josh Martin" > > > > >>SheelR@aol.com writes: > >> Is expat reentrant? > > > > I'm not entirely sure what you mean here. If > you create several > >parsers, you should be able to use them at the > same time (from > >different threads or whatever). You probably > don't want to muck > >around with a parser while it is actively > parsing, aside from calling > >the defined APIs to add/change callbacks. > > > > > > > -Fred > > > >-- > >Fred L. Drake, Jr. > >PythonLabs at Zope Corporation > > Reentrant Functions: > > Reentrant functions (and functions which are not > interruptable > by signals) are defined as functions that may be > invoked, without > restriction, from signal-catching functions. A > function is reentrant > only if, when invoked inside a signal-catching > function, it does > not adversly affect the normal flow of operations > of the function > or code that the signal-catching function > interrupted. In other > words, reentrant functions aren't going to > unexpectedly change > any critical values, and thus the result of the > operations, if they > are invoked in the middle of a function. > > I'm not positive that this is the meaning that > SheelR was going > for, but this is the "standard" meaning of a > reentrant function. > Personally, I would like to know the answer to > this question, > given this definition of reentrant. > > - Josh Martin > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > http://lists.sourceforge.net/lists/listinfo/expat-discuss > > > Geocrawler.com - The Knowledge Archive > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > http://lists.sourceforge.net/lists/listinfo/expat-discuss > > -----BEGIN PGP SIGNATURE----- > Version: PGPfreeware 6.5.8 for non-commercial use > > iQA/AwUBO2rf9owiya8i+ZOTEQLdlwCdF4RwCoa/phPg0Amuz34H3x2+G6YAnAsK > qesUwwwrwKd1FNVuKsafDp/w > =Nwae > -----END PGP SIGNATURE----- From gstein@lyra.org Wed Aug 8 20:31:50 2001 From: gstein@lyra.org (Greg Stein) Date: Wed, 8 Aug 2001 12:31:50 -0700 Subject: [Expat-discuss] is expat a reentrant parser In-Reply-To: <200108081629.KAA25277@abqn42.abq.sc.philips.com>; from Josh.Martin@abq.sc.philips.com on Wed, Aug 08, 2001 at 10:29:56AM -0600 References: <200108081629.KAA25277@abqn42.abq.sc.philips.com> Message-ID: <20010808123150.Z1414@lyra.org> By that definition, Expat is not "reentrant". It allocates memory. However, as Fred points out: it *is* thread-safe if you use different parsers in each thread. (one parser across multiple threads will hork) Cheers, -g On Wed, Aug 08, 2001 at 10:29:56AM -0600, Josh Martin wrote: > Mike, > > The first sentace is paraphrased from the HP-UX Release 11.00 man page for > sigaction(2). The rest is distilled from my learning about and use of reentrant > functions. Anyone is of course free to quote or paraphrase "my" definition if it > is useful to them. I also apologize for the odd formatting of the definition, > web form email entry was never my forte. > > - Josh Martin > > > > From: MIke Wilson > > To: Josh Martin > > Subject: RE: [Expat-discuss] is expat a reentrant parser > > Date: Fri, 3 Aug 2001 13:27:09 -0400 > > MIME-Version: 1.0 > > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > > > Josh, > > > > I just saw this post. Where did you get this definition? It's > > wonderful! I'd love to be able to cite it from various points in my > > own project documentation. > > > > TIA, > > > > - - Mike > > > > - -----Original Message----- > > From: Josh Martin [SMTP:josh.martin@abq.sc.philips.com] > > Sent: Friday, August 03, 2001 12:49 PM > > To: SheelR@aol.com; expat-discuss@lists.sourceforge.net > > Subject: RE: [Expat-discuss] is expat a reentrant parser > > > > This message was sent from Geocrawler.com by "Josh Martin" > > > > > > > > >>SheelR@aol.com writes: > > >> Is expat reentrant? > > > > > > I'm not entirely sure what you mean here. If > > you create several > > >parsers, you should be able to use them at the > > same time (from > > >different threads or whatever). You probably > > don't want to muck > > >around with a parser while it is actively > > parsing, aside from calling > > >the defined APIs to add/change callbacks. > > > > > > > > > > > > -Fred > > > > > >-- > > >Fred L. Drake, Jr. > > >PythonLabs at Zope Corporation > > > > Reentrant Functions: > > > > Reentrant functions (and functions which are not > > interruptable > > by signals) are defined as functions that may be > > invoked, without > > restriction, from signal-catching functions. A > > function is reentrant > > only if, when invoked inside a signal-catching > > function, it does > > not adversly affect the normal flow of operations > > of the function > > or code that the signal-catching function > > interrupted. In other > > words, reentrant functions aren't going to > > unexpectedly change > > any critical values, and thus the result of the > > operations, if they > > are invoked in the middle of a function. > > > > I'm not positive that this is the meaning that > > SheelR was going > > for, but this is the "standard" meaning of a > > reentrant function. > > Personally, I would like to know the answer to > > this question, > > given this definition of reentrant. > > > > - Josh Martin > > > > _______________________________________________ > > Expat-discuss mailing list > > Expat-discuss@lists.sourceforge.net > > http://lists.sourceforge.net/lists/listinfo/expat-discuss > > > > > > Geocrawler.com - The Knowledge Archive > > > > > > _______________________________________________ > > Expat-discuss mailing list > > Expat-discuss@lists.sourceforge.net > > http://lists.sourceforge.net/lists/listinfo/expat-discuss > > > > -----BEGIN PGP SIGNATURE----- > > Version: PGPfreeware 6.5.8 for non-commercial use > > > > iQA/AwUBO2rf9owiya8i+ZOTEQLdlwCdF4RwCoa/phPg0Amuz34H3x2+G6YAnAsK > > qesUwwwrwKd1FNVuKsafDp/w > > =Nwae > > -----END PGP SIGNATURE----- > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > http://lists.sourceforge.net/lists/listinfo/expat-discuss -- Greg Stein, http://www.lyra.org/ From fdrake@acm.org Wed Aug 8 20:59:55 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 8 Aug 2001 15:59:55 -0400 (EDT) Subject: [Expat-discuss] Parsing Attributes. In-Reply-To: References: Message-ID: <15217.39483.21856.169425@cj42289-a.reston1.va.home.com> Amit Sehgal writes: > Port='23456" > /> > > The start and end element handlers return the store tag. > But the Manager and Port attributes are skipped by the parser. You should be getting the attributes along with the start tag. See the examples in the article at: http://www.xml.com/pub/a/1999/09/expat/ -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From garys@ihug.com.au Fri Aug 10 07:33:50 2001 From: garys@ihug.com.au (Gary Stephenson) Date: Fri, 10 Aug 2001 16:33:50 +1000 Subject: [Expat-discuss] problems with dtddestroy() Message-ID: <001f01c12166$6e5e9e40$fad7fea9@gateway> Hi all ( "Expatriates", ?). I am trying to get Expat to parse the OASIS conformance test Suite document(s). It barfs with a debug assertion failure (dbghead.c : line 1017): _BLOCK_TYPE_IS_VALID(pHead->nBlockUse) on either of the following lines from the dtddestroy() function: if (p->scaffIndex) FREE(p->scaffIndex); if (p->scaffold) FREE(p->scaffold); traceLog( "dtd 2" ) ; Note that the failure only occurs after successfully destroying the DTD itself (testcases.dtd), and also successfully destroying the first external entity ("xmltest\xmltest.xml"). Only then does it fail when trying to destroy the second external entity ("japanese\japanese.xml"). I assume the error is that the "scaffold"ing has already been free()'ed, and I notice in the dtdCopy function the following lines appear: /* Don't want deep copying for scaffolding */ ... newDtd->scaffLevel = oldDtd->scaffLevel; newDtd->scaffIndex = oldDtd->scaffIndex; which may or may not be significant. Anybody else experienced similar problems? Can anybody tell me the right way to fix it? All processed files are in UTF-8 encoding. Using 1.95.2 on Win2K (SP2). many tias, gary From syprat@yahoo.fr Fri Aug 10 13:37:08 2001 From: syprat@yahoo.fr (=?iso-8859-1?q?Sylvain=20PRAT?=) Date: Fri, 10 Aug 2001 14:37:08 +0200 (CEST) Subject: [Expat-discuss] empty tags Message-ID: <20010810123708.97732.qmail@web14809.mail.yahoo.com> Hi, I would like to know if there's a mean to know if the read tag is empty : in fact, i would like to re-fit the tags i don't want to recognize (both start and end elements), so i expected to use the input context associated to the byte count to do that, but i think empty tags will cause problems because they will be duplicated... Another way to do that ? Thanks... Sylvain ___________________________________________________________ Do You Yahoo!? -- Vos albums photos en ligne, Yahoo! Photos : http://fr.photos.yahoo.com From fdrake@acm.org Fri Aug 10 14:10:51 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 10 Aug 2001 09:10:51 -0400 (EDT) Subject: [Expat-discuss] empty tags In-Reply-To: <20010810123708.97732.qmail@web14809.mail.yahoo.com> References: <20010810123708.97732.qmail@web14809.mail.yahoo.com> Message-ID: <15219.56667.408959.764629@cj42289-a.reston1.va.home.com> Sylvain PRAT writes: > I would like to know if there's a mean to know if the > read tag is empty : in fact, i would like to re-fit > the tags i don't want to recognize (both start and end > elements), so i expected to use the input context > associated to the byte count to do that, but i think > empty tags will cause problems because they will be > duplicated... Another way to do that ? I think the only way to tell the difference betwee and is to use the input context. The simplest way would be to check it for the end tags; if it starts with " PythonLabs at Zope Corporation From vnemchin@hotmail.com Sun Aug 12 03:40:12 2001 From: vnemchin@hotmail.com (Vassilii Nemtchinov) Date: Sun, 12 Aug 2001 02:40:12 +0000 Subject: [Expat-discuss] Character Data Message-ID: I know that the subject of handling character data has been already discussed here. Still I would like somebody to provide some suggestions on the subject. Since we can not assume that the character data arrive in one chunk for various reasons, in my solution I am allocating a buffer in the start tag handler and I am also setting a flag indicating that I've encountered the beginning of the element. I keep adding character data to a buffer in the character handler until I reset back the flag in the end handler. I can see several problems in using this method. First, it seems that the whole purpose of event-driven parser has been defied since I have to set sentinels myself and not rely entirely on the parser. Secondly, in the worst case I have to allocate as many sentinels as I have elements in the document (same goes for separate buffers for character data). I am sure that somebody found a better solution for getting character data. _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp From syprat@yahoo.fr Mon Aug 13 08:15:10 2001 From: syprat@yahoo.fr (=?iso-8859-1?q?Sylvain=20PRAT?=) Date: Mon, 13 Aug 2001 09:15:10 +0200 (CEST) Subject: [Expat-discuss] empty tags In-Reply-To: <15219.56667.408959.764629@cj42289-a.reston1.va.home.com> Message-ID: <20010813071510.24523.qmail@web14802.mail.yahoo.com> --- "Fred L. Drake, Jr." a écrit : > > Sylvain PRAT writes: > > I would like to know if there's a mean to know if > the > > read tag is empty : in fact, i would like to > re-fit > > the tags i don't want to recognize (both start > and end > > elements), so i expected to use the input context > > associated to the byte count to do that, but i > think > > empty tags will cause problems because they will > be > > duplicated... Another way to do that ? > > I think the only way to tell the difference betwee > and > is to use the input context. It's what i had guessed reading the docs > The simplest way would > be to check it > for the end tags; if it starts with " element did not use > empty element notation. Checking this should be > easy & fast (the call > to get the input context will be more expensive than > the check!), yes, but we should be aware of the charset... >but > you should be careful; if you have an empty element > tag with a *lot* > of attributes, the beginning of the tag may not be > there any more. > See the caveats in the documentation for more > information. > I think the parser is aware of the empty tag, so why couldn't this be a feature (as reading the input context is one too), especially because start tags are possibly erased before reading the end tags... ___________________________________________________________ Do You Yahoo!? -- Vos albums photos en ligne, Yahoo! Photos : http://fr.photos.yahoo.com From brc@fourlittlemice.com Mon Aug 13 08:20:07 2001 From: brc@fourlittlemice.com (Dirk Dierckx) Date: Mon, 13 Aug 2001 09:20:07 +0200 Subject: [Expat-discuss] RE: Character Data In-Reply-To: Message-ID: "Vassilii Nemtchinov" wrote: >I know that the subject of handling character data has been already >discussed here. Still I would like somebody to provide some suggestions on >the subject. Since we can not assume that the character data arrive in one >chunk for various reasons, in my solution I am allocating a buffer in the >start tag handler and I am also setting a flag indicating that I've >encountered the beginning of the element. I keep adding character data to a >buffer in the character handler until I reset back the flag in the end >handler. I can see several problems in using this method. First, it seems >that the whole purpose of event-driven parser has been defied since I have >to set sentinels myself and not rely entirely on the parser. Secondly, in >the worst case I have to allocate as many sentinels as I have elements in >the document (same goes for separate buffers for character data). I am sure >that somebody found a better solution for getting character data. I'll explain (in short) how I do it. I use something like the following: struct SParserContext { size_t m_szElementValueSize; /* init: m_szElementValueSize = (size_t)0U; */ size_t m_szElementValueLen; /* init: m_szElementValueLen = (size_t)0U; */ char *m_pchElementValue; /* init: m_pchElementValue = NULL; */ char **m_ppchCurrentElementValue; /* init: m_ppchCurrentElementValue = NULL; */ ... }; and use this structure in the following way: void assignElementValueToCurrentElement(struct SParserContext *psCtx) { if(psCtx->m_ppchCurrentElementValue && NULL == *(psCtx->m_ppchCurrentElementValue) && psCtx->m_szElementValueLen > (size_t)0U) { *(psCtx->m_ppchCurrentElementValue) = (char*)malloc( psCtx->m_szElementValueLen + (size_t)1U); strcpy(*(psCtx->m_ppchCurrentElementValue), psCtx->m_pchElementValue; } /* Make sure the element value is empty. */ psCtx->m_szElementValueLen = (size_t)0U; } void processOpenTag(void *pvUserData, const char *pcchElement, const char **ppcchAttributes) { struct SParserContext *psCtx = (struct SParserContext*)pvUserData; /* If we have a value stored in m_pchElementValue at this point => it will be the full value of the *previously encountered* element. */ assignElementValueToCurrentElement(psCtx); ... /* With your new element (pcchElement) you need a NULL char * to it's value (~pchThisElementValueBuffer) , so ... */ pchThisElementValueBuffer = NULL; psCtx->m_ppchCurrentElementValue = &pchThisElementValueBuffer; ... } void processCloseTag(void *pvUserData, const char *pcchElement) { struct SParserContext *psCtx = (struct SParserContext*)pvUserData; /* If we have a value stored in m_pchElementValue at this point => it will be the full value of the *current* (~ pcchElement) element. */ assignElementValueToCurrentElement(psCtx); } /* Beware: Decoding of pcchData has been left out of this sample code !!! */ void processTagData(void *pvUserData, const char *pcchData, int iDataLen) { const size_t cszIncrement = (size_t)512U; struct SParserContext *psCtx = (struct SParserContext*)pvUserData; size_t szDataLen = (size_t)iDataLen, szBufferSize; szBufferSize = psCtx->m_szElementValueLen + szDataLen + (size_t)1U; if(szBufferSize > psCtx->m_szElementValueSize) { char *pchNew = NULL; /* Not enough memory available in m_pchElementValue to append pcchData to it, create/enlarge m_pchElementValue. */ szBufferSize = ((szBufferSize / cszIncrement) + (size_t)1U) * cszIncrement; if(psCtx->m_szElementValueSize) pchNew = (char*)realloc(psCtx->m_pchElementValue, szBufferSize); else pchNew = (char*)malloc(szBufferSize); if(pchNew) { psCtx->m_szElementValueSize = szBufferSize; psCtx->m_pchElementValue = pchNew; } } if(szBufferSize <= psCtx->m_szElementValueSize) { char *pchString = &(psCtx->m_pchElementValue[psCtx->m_szElementValueLen]); memcpy((void*)pchString, (void*)pcchData, szDataLen); pchString[szDataLen] = '\0'; /* m_pchElementValue is always terminated. */ psCtx->m_szElementValueLen += szDataLen; } } PS.: The use of psCtx->m_ppchCurrentElementValue is purely to make this sample complete, in your particular impl. you should use a method that is appropriate for your problem off course. Hope this code has done more good then bad ;-). Regards, Dirk. -----Original Message----- From: expat-discuss-admin@lists.sourceforge.net [mailto:expat-discuss-admin@lists.sourceforge.net]On Behalf Of expat-discuss-request@lists.sourceforge.net Sent: Sunday, August 12, 2001 9:04 PM To: expat-discuss@lists.sourceforge.net Subject: Expat-discuss digest, Vol 1 #94 - 1 msg Send Expat-discuss mailing list submissions to expat-discuss@lists.sourceforge.net To subscribe or unsubscribe via the World Wide Web, visit http://lists.sourceforge.net/lists/listinfo/expat-discuss or, via email, send a message with subject or body 'help' to expat-discuss-request@lists.sourceforge.net You can reach the person managing the list at expat-discuss-admin@lists.sourceforge.net When replying, please edit your Subject line so it is more specific than "Re: Contents of Expat-discuss digest..." Today's Topics: 1. Character Data (Vassilii Nemtchinov) --__--__-- Message: 1 From: "Vassilii Nemtchinov" To: expat-discuss@lists.sourceforge.net Date: Sun, 12 Aug 2001 02:40:12 +0000 Subject: [Expat-discuss] Character Data I know that the subject of handling character data has been already discussed here. Still I would like somebody to provide some suggestions on the subject. Since we can not assume that the character data arrive in one chunk for various reasons, in my solution I am allocating a buffer in the start tag handler and I am also setting a flag indicating that I've encountered the beginning of the element. I keep adding character data to a buffer in the character handler until I reset back the flag in the end handler. I can see several problems in using this method. First, it seems that the whole purpose of event-driven parser has been defied since I have to set sentinels myself and not rely entirely on the parser. Secondly, in the worst case I have to allocate as many sentinels as I have elements in the document (same goes for separate buffers for character data). I am sure that somebody found a better solution for getting character data. _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp --__--__-- _______________________________________________ Expat-discuss mailing list Expat-discuss@lists.sourceforge.net http://lists.sourceforge.net/lists/listinfo/expat-discuss End of Expat-discuss Digest From fdrake@acm.org Mon Aug 13 15:51:58 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 13 Aug 2001 10:51:58 -0400 (EDT) Subject: [Expat-discuss] empty tags In-Reply-To: <20010813071510.24523.qmail@web14802.mail.yahoo.com> References: <15219.56667.408959.764629@cj42289-a.reston1.va.home.com> <20010813071510.24523.qmail@web14802.mail.yahoo.com> Message-ID: <15223.59790.73199.257638@cj42289-a.reston1.va.home.com> =?iso-8859-1?q?Sylvain=20PRAT?= writes: > yes, but we should be aware of the charset... Yes, pretty flaky stuff. Here's another approach: for every start tag, get the current source index, then for end tags, you know it was an empty tag if the position didn't change. That can be optimized a little bit by maintaining a flag: static int maybe_empty_element_tag; static long byte_index; void start(void *data, const char *el, const char **attr) { maybe_empty_element_tag = 1; byte_index = XML_GetCurrentByteIndex(parser); ... } static void end(void *data, const char *el) if (maybe_empty_element_tag) { maybe_empty_element_tag = 0; if (byte_index == XML_GetCurrentByteIndex(parser)) { /* empty-element tag */ return } } ... } All other handlers could optionally clear maybe_empty_element_tag to avoid the call back into expat for elements like characters only. Whether that would be a win depends how isolated you want this aspect of the processing, or if the performance improvement (very small) is worth the maintenance cost. > I think the parser is aware of the empty tag, so why > couldn't this be a feature (as reading the input > context is one too), especially because start tags are > possibly erased before reading the end tags... The parser has to be aware of it, but I'm not sure what would be the best way to expose it. Perhaps something similar to the XML_GetSpecifiedAttributeCount() function, which is only valid during the start-element callback? That's certainly possible, but would incur higher overhead that the approach outlined here (because it would always result in a function call). Providing that would not invalidate this approach, so they could coexist. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From rsalz@zolera.com Mon Aug 13 16:14:05 2001 From: rsalz@zolera.com (Rich Salz) Date: Mon, 13 Aug 2001 11:14:05 -0400 Subject: [Expat-discuss] Re-entrant Message-ID: <3B77EEBD.AFDB440A@zolera.com> Sorry, but that definition from HP is completely bogus. First, according to ISO C, the only safe thing to do from a signal handler is set a variable of type sig_atomic_t. Second, it's not what reentrant means. :) Informally, re-entrant means that it is safe to call the function while an existing call is pending: can I *re-enter* the function? An excellent example is yacc. Yacc exports a single function yyparse(). Suppose a write a parser for some language, and that language has an import statement. Can my import handler call yyparse to parse the new file, while the containing file is being parsed. Is yyparse re-entrant? The answer is no, it's not. Yacc stores its state in internal global variables. For another example, look at the C function "strtok." I'm not sure what the answer is to expat, except that it usually doesn't matter. Rather than re-enter, the thing to do is create another parser and have that parser do the "recursive" or "internal" work. Hope this helps. /r$ -- Zolera Systems, Your Key to Online Integrity Securing Web services: XML, SOAP, Dig-sig, Encryption http://www.zolera.com From fdrake@acm.org Mon Aug 13 16:25:05 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 13 Aug 2001 11:25:05 -0400 (EDT) Subject: [Expat-discuss] Re-entrant In-Reply-To: <3B77EEBD.AFDB440A@zolera.com> References: <3B77EEBD.AFDB440A@zolera.com> Message-ID: <15223.61777.423858.420431@cj42289-a.reston1.va.home.com> Rich Salz writes: > Informally, re-entrant means that it is safe to call the function while > an existing call is pending: can I *re-enter* the function? An This is certainly what I'm more familiar with... -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From michael.isard@compaq.com Mon Aug 13 19:16:33 2001 From: michael.isard@compaq.com (Michael Isard) Date: Mon, 13 Aug 2001 11:16:33 -0700 (PDT) Subject: [Expat-discuss] empty tags Message-ID: <15224.6529.299234.691317@diamond.pa.dec.com> >On Mon, 13 Aug 2001 10:51:58 -0400 (EDT), "Fred L. Drake, Jr." said: > =?iso-8859-1?q?Sylvain=20PRAT?= writes: >> yes, but we should be aware of the charset... > Yes, pretty flaky stuff. Here's another approach: for every start > tag, get the current source index, then for end tags, you know it > was an empty tag if the position didn't change. That can be > optimized a little bit by maintaining a flag: My code relies on the fact that XML_GetCurrentByteCount(parser) == 0 in the end element callback iff the element is empty. I think this is equivalent to what you propose but avoids storing flags. Michael. From fdrake@acm.org Mon Aug 13 20:04:56 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 13 Aug 2001 15:04:56 -0400 (EDT) Subject: [Expat-discuss] empty tags In-Reply-To: <15224.6529.299234.691317@diamond.pa.dec.com> References: <15224.6529.299234.691317@diamond.pa.dec.com> Message-ID: <15224.9432.61787.444160@cj42289-a.reston1.va.home.com> Michael Isard writes: > My code relies on the fact that > > XML_GetCurrentByteCount(parser) == 0 > > in the end element callback iff the element is empty. I think this is > equivalent to what you propose but avoids storing flags. Aha! This is good. Maintaining a flag may yield better performance, but this is probably better. I'll add a note to the documentation for XML_GetCurrentByteCount() to indicate this. I should probably write a cookbook-style document containing tips for using Expat to make these detailed distinctions and show good usage examples, but I'd really rather someone else write it and contribute it to the documentation. Besides, I should be writing a test suite for Expat. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Mon Aug 13 19:50:02 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 13 Aug 2001 14:50:02 -0400 (EDT) Subject: [Expat-discuss] Character Data In-Reply-To: References: Message-ID: <15224.8538.631600.976693@cj42289-a.reston1.va.home.com> Vassilii Nemtchinov writes: > that the whole purpose of event-driven parser has been defied since I have > to set sentinels myself and not rely entirely on the parser. Secondly, in No; you still avoid an enormous portion of the work involved with something like building a DOM has been avoided. Unless you're having to keep everything indefinately, you get a win there. Internal state tracking and short-term data accumulation are common with event-based parsing, XML or otherwise. > the worst case I have to allocate as many sentinels as I have elements in > the document (same goes for separate buffers for character data). I am sure > that somebody found a better solution for getting character data. How many buffers you need to allocate and how many state variables you need depends entirely on your application. If you're searching the text for a match, for example, there are certainly incremental algorithms that can be applied. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From JWieman@daktronics.com Mon Aug 13 21:04:51 2001 From: JWieman@daktronics.com (Joe Wiemann) Date: Mon, 13 Aug 2001 15:04:51 -0500 Subject: [Expat-discuss] Definition Of Re-entrant Message-ID: Reentrancy -=20 One of the real beauties of multi threading is that the same C function = can be called from multiple threads.... This provides great power and = reduces code space... however, it does require that C functions called = from multiple threads are reentrant. What does reentrant mean? Basically a reentrant function stores the = caller's return address on the stack and does not rely on global or static = C variables that it previously set up. Most compilers place the return = address on the stack. hence application developers must only worry about = the use of globals and statics. An example of a non-reentrant function is strtok found in the standard C = library. This function remembers the previous string pointer on subsequent= calls. it does this with a static string pointer. if this function is = called from multiple threads, it would most likely return an invalid = pointer. - Taken from threadx user manual. >>> "Fred L. Drake, Jr." 08/13/01 10:25AM >>> Rich Salz writes: > Informally, re-entrant means that it is safe to call the function while > an existing call is pending: can I *re-enter* the function? An This is certainly what I'm more familiar with... -Fred --=20 Fred L. Drake, Jr. PythonLabs at Zope Corporation _______________________________________________ Expat-discuss mailing list Expat-discuss@lists.sourceforge.net=20 http://lists.sourceforge.net/lists/listinfo/expat-discuss From dcrowley@scitegic.com Mon Aug 13 22:01:54 2001 From: dcrowley@scitegic.com (David Crowley) Date: Mon, 13 Aug 2001 14:01:54 -0700 Subject: [Expat-discuss] Proposal for XML_ParserReset() function Message-ID: <5.1.0.14.0.20010813135650.02d77fd8@pop.business.earthlink.net> ---------------------- multipart/mixed attachment I just posted this into the Patches on sourceforge, but thought I would present it here for further discussion and so others may test it/refine it. This functionality is quite important for my application (SOAP server) which can parse many thousands of XML documents. When doing some profiling, about 12-15% of the time was being spent in memory allocation routines when creating and destroying the Parser object. I know others have asked for this functionality and so I spent some time the past weekend looking into this. David This is my first cut at adding a XML_ParserReset function. My idea was to reset the parser to a state that was almost identical to what it is after XML_ParserCreate() except that any allocated memory is preserved. As this patch is currently, I think it misght still has some potential problems with dtdInit() and possibly internalEncoding and setContext(). But for my documents/application it seems to work great. It passes Purify without any memory leaks and when parsing 5000 documents, I only get ~40 memory allocations instead of ~200,000 :) The function declartion needed for expat.h: /* Resets an existing parser to a state comparable to that after XML_ParserCreate but preserves any allocated memory. */ XMLPARSEAPI(void) XML_ParserReset(XML_Parser parser, const XML_Char *encoding); ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: reset.diff Type: application/octet-stream Size: 5937 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20010813/83be080c/reset.exe ---------------------- multipart/mixed attachment ---------------------- multipart/mixed attachment-- From mballen@erols.com Tue Aug 14 04:22:31 2001 From: mballen@erols.com (Michael B. Allen) Date: Mon, 13 Aug 2001 23:22:31 -0400 Subject: [Expat-discuss] Re-entrant In-Reply-To: <3B77EEBD.AFDB440A@zolera.com>; from rsalz@zolera.com on Mon, Aug 13, 2001 at 11:14:05AM -0400 References: <3B77EEBD.AFDB440A@zolera.com> Message-ID: <20010813232231.C1095@nano.foo.net> On Mon, Aug 13, 2001 at 11:14:05AM -0400, Rich Salz wrote: > file, while the containing file is being parsed. Is yyparse > re-entrant? The answer is no, it's not. Yacc stores its state in > internal global variables. Actually, bison does have a %pure_parser option that supposedly will un-static-ify these state variables and therefore make the parser reentrant but I think I tried it once and couldn't get it to work. About reentrance and threading: It looks like all of the functions of the Expat API are clearly reentrant (and I distinctly recall seeing a message to this effect by James Clark on a mailing list somewhere). The design is such that all state is stored in the XML_Parser object which is passed to all (almost all) functions of the Expat API. AFAIK there is no static storage allocated within functions or global writable data. All this means is that it is ok to call the same function before a previous call to that function has completed. This might occur in a multi-threaded environment or if perhaps some function was called recursively (which I'm sure happends quite natrually internally to the parser). However, this does not mean you can use the same parser object recursively or in a multi-threaded environment. This is a completely different issue. That XML_Parser object exists so that _separate_ parser objects may be used in these ways. I could be wrong though. I don't know the Expat source very well at all and it would be easy to introduce a static variable at the beginnging of a function or somesuch that inadvertantly uses static storage. I think this has been said in less words already, so sorry if I'm being redundant. Mike -- Wow a memory-mapped fork bomb! Now what on earth did you expect? - lkml From syprat@yahoo.fr Tue Aug 14 08:29:01 2001 From: syprat@yahoo.fr (=?iso-8859-1?q?Sylvain=20PRAT?=) Date: Tue, 14 Aug 2001 09:29:01 +0200 (CEST) Subject: [Expat-discuss] empty tags In-Reply-To: <15223.59790.73199.257638@cj42289-a.reston1.va.home.com> Message-ID: <20010814072901.33304.qmail@web14803.mail.yahoo.com> --- "Fred L. Drake, Jr." a écrit : > > =?iso-8859-1?q?Sylvain=20PRAT?= writes: > > yes, but we should be aware of the charset... > > Yes, pretty flaky stuff. Here's another approach: > for every start > tag, get the current source index, then for end > tags, you know > it was an empty tag if the position didn't change. > That can be > optimized a little bit by maintaining a flag: > > static int maybe_empty_element_tag; > static long byte_index; > > void > start(void *data, const char *el, const char **attr) > { > maybe_empty_element_tag = 1; > byte_index = XML_GetCurrentByteIndex(parser); > > ... > } > > static void > end(void *data, const char *el) > if (maybe_empty_element_tag) { > maybe_empty_element_tag = 0; > if (byte_index == > XML_GetCurrentByteIndex(parser)) { > /* empty-element tag */ > return > } > } > ... > } > > All other handlers could optionally clear > maybe_empty_element_tag to > avoid the call back into expat for elements like > characters > only. Whether that would be a win depends how > isolated you want > this aspect of the processing, or if the performance > improvement (very > small) is worth the maintenance cost. > > > I think the parser is aware of the empty tag, so > why > > couldn't this be a feature (as reading the input > > context is one too), especially because start > tags are > > possibly erased before reading the end tags... > > The parser has to be aware of it, but I'm not sure > what would be the > best way to expose it. Perhaps something similar to > the > XML_GetSpecifiedAttributeCount() function, which is > only valid during > the start-element callback? That's certainly > possible, but would > incur higher overhead that the approach outlined > here (because it > would always result in a function call). Providing > that would not > invalidate this approach, so they could coexist. > Finally, there's a best solution to this, using the XML_GetCurrentByteCount which returns 0 for end elements when using an empty tag... but it results in a function call (i don't mind)... Sylvain ___________________________________________________________ Do You Yahoo!? -- Vos albums photos en ligne, Yahoo! Photos : http://fr.photos.yahoo.com From k.buckley@lancaster.ac.uk Wed Aug 15 15:55:18 2001 From: k.buckley@lancaster.ac.uk (Kev Buckley) Date: Wed, 15 Aug 2001 15:55:18 +0100 (BST) Subject: [Expat-discuss] Parser query Message-ID: Hello, I'm trying to understand what characters the expat parser is throwing when it reads the numerical entity code for the &#szlig; and &#oslah; characters. Bit of background. I'm dumping data off a Palm which produces octal 337 (dec 223) for the &#szlig; 370 (dec 248) for the &#oslash; But if I have ß and ø in my XML doc, when expat parses the doc I seem to get back two bytes as follows: octal 303 + octal 237 for the ß octal 303 + octal 248 for the ø Other "extended shift" characters from the Palm seem to translate OK, in that their &#nnn; codes get parsed as octal 302 + octal NNN where NNN equates to the decimal nnn used in the numeric entity code. eg, the trademark symbol octal 231 (dec 153) -> ™ which is parsed as octal 302 + octal 231 If the above makes any sense to anyone, then do you have any clues as to what I have missed in trying to get these characters out of (through) the expat parser ? Kevin -- Regards, ---------------------------------------------------------------------- * Kevin M. Buckley e-mail: K.Buckley@lancaster.ac.uk * * * * Systems Administrator * * Computer Centre * * Lancaster University Voice: +44 (0) 1524 5 93718 * * LANCASTER. LA1 4YW Fax : +44 (0) 1524 5 25113 * * England. * * * * My PC runs Linux/GNU, you still computing the Bill Gate$' way ? * ---------------------------------------------------------------------- From fdrake@acm.org Wed Aug 15 15:59:23 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 15 Aug 2001 10:59:23 -0400 (EDT) Subject: [Expat-discuss] Parser query In-Reply-To: References: Message-ID: <15226.36427.253508.408143@cj42289-a.reston1.va.home.com> Kev Buckley writes: > But if I have ß and ø in my XML doc, when expat parses the > doc I seem to get back two bytes as follows: > > octal 303 + octal 237 for the ß > octal 303 + octal 248 for the ø ... > octal 231 (dec 153) -> ™ which is parsed as > > octal 302 + octal 231 ... > If the above makes any sense to anyone, then do you have any clues as > to what I have missed in trying to get these characters out of > (through) the expat parser ? The characters are coming through just fine if you're telling Expat that your input is Latin-1. Expat's output is encoded in UTF-8, though it can be compiled to return 16-bit characters directly. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From brc@fourlittlemice.com Thu Aug 16 07:40:10 2001 From: brc@fourlittlemice.com (Dirk Dierckx) Date: Thu, 16 Aug 2001 08:40:10 +0200 Subject: [Expat-discuss] RE: Parser query In-Reply-To: Message-ID: As Fred L. Drake already pointed out, you get it UTF-8 encoded by default and using Latin-1 requires Expat to be recompiled to work with UTF-16. If you're like me and just want to retrieve your ß in a char with value (dec 223), etc. without having to recompile expat nor using 16bit wide chars (wchar_t) you can use the following function to convert from UTF-8 to ANSI (0-255). int ishUtilUTF8toANSI(char *pchString, int iStringLen) { const size_t cszStringLen = iStringLen >= 0 ? (size_t)iStringLen : (pchString ? strlen(pchString) : (size_t)0U); int bConverted = 1; size_t szInputIdx, szOutputIdx = (size_t)0U; for(szInputIdx = (size_t)0U; bConverted && szInputIdx < cszStringLen; ++szInputIdx) { /* If input_bin(0xxxxxxx) ~ ASCII_bin(0xxxxxxx) If input_bin(110000yy 10xxxxxx) ~ ANSI_bin(yyxxxxxx) All other UTF-8 encodings don't map to ANSI so we don't convert them and fail if we encounter them. See: http://www.unicode.org for more information about UTF-8 encoding. */ if(0x00 == (pchString[szInputIdx] & 0x80)) /* Plain ascii char */ pchString[szOutputIdx++] = pchString[szInputIdx]; else if(szInputIdx + (size_t)1U < cszStringLen && 0xC0 == (pchString[szInputIdx] & 0xFC) && 0x80 == (pchString[szInputIdx + (size_t)1U] & 0xC0)) { /* UTF-8 encoded char that maps to ANSI. */ pchString[szOutputIdx++] = ((pchString[szInputIdx] & 0x03) << 6) + (pchString[szInputIdx + (size_t)1U] & 0x3F); ++szInputIdx; /* We must skip the second input char. */ } else /* UTF-8 encoding that doesn't map to ANSI or illegal input. */ bConverted = 0; } return bConverted ? (int)szOutputIdx : -1; } Note: As you can see from the code, this function converts the string inplace (modifying the data pointed to by pchString directly). This is possible because the resulting string length will be <= iStringLen. --- Regards, Dirk From willievu@hotmail.com Sat Aug 18 03:35:49 2001 From: willievu@hotmail.com (Willie Vu) Date: Fri, 17 Aug 2001 22:35:49 -0400 Subject: [Expat-discuss] undefined reference when compile examples Message-ID: I'm a newbie here. I run into a dump problem. Please help. I failed to compile the examples due the following errors. I'm running cygwin gcc version 2.95.3-5 (Reading specs from /usr/lib/gcc-lib/i686-pc-cygwin/2.95.3-5/specs, gcc version 2.95.3-5 (cygwin special)) on Windows 2000 SP2. ---- d:/temp/expat/expat-1.95.2/examples> make gcc -o elements elements.o -static -L../lib/.libs -lexpat elements.o: In function `main': /cygdrive/d/temp/expat/expat-1.95.2/examples/elements.c:31: undefined reference to `_imp__XML_ParserCreate' /cygdrive/d/temp/expat/expat-1.95.2/examples/elements.c:34: undefined reference to `_imp__XML_SetUserData' /cygdrive/d/temp/expat/expat-1.95.2/examples/elements.c:35: undefined reference to `_imp__XML_SetElementHandler' /cygdrive/d/temp/expat/expat-1.95.2/examples/elements.c:39: undefined reference to `_imp__XML_Parse' /cygdrive/d/temp/expat/expat-1.95.2/examples/elements.c:47: undefined reference to `_imp__XML_ParserFree' /cygdrive/d/temp/expat/expat-1.95.2/examples/elements.c:40: undefined reference to `_imp__XML_GetCurrentLineNumber' /cygdrive/d/temp/expat/expat-1.95.2/examples/elements.c:40: undefined reference to `_imp__XML_GetErrorCode' /cygdrive/d/temp/expat/expat-1.95.2/examples/elements.c:40: undefined reference to `_imp__XML_ErrorString' collect2: ld returned 1 exit status make: *** [elements] Error 1 _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp From mballen@erols.com Sat Aug 18 05:04:42 2001 From: mballen@erols.com (Michael B. Allen) Date: Sat, 18 Aug 2001 00:04:42 -0400 Subject: [Expat-discuss] undefined reference when compile examples In-Reply-To: ; from willievu@hotmail.com on Fri, Aug 17, 2001 at 10:35:49PM -0400 References: Message-ID: <20010818000442.A4548@nano.foo.net> On Fri, Aug 17, 2001 at 10:35:49PM -0400, Willie Vu wrote: > > d:/temp/expat/expat-1.95.2/examples> make > gcc -o elements elements.o -static -L../lib/.libs -lexpat > elements.o: In function `main': > /cygdrive/d/temp/expat/expat-1.95.2/examples/elements.c:31: undefined > reference to `_imp__XML_ParserCreate' I believe you need to include the .h path. Like -I../include/? or similar. Also, is that third dot just before libs in the library path really supposed to be there? This is a simple gcc options problem. You need to get your include and library paths right. Mike -- Wow a memory-mapped fork bomb! Now what on earth did you expect? - lkml From JWieman@daktronics.com Mon Aug 20 21:16:23 2001 From: JWieman@daktronics.com (Joe Wiemann) Date: Mon, 20 Aug 2001 15:16:23 -0500 Subject: [Expat-discuss] RE: Parser query Message-ID: Is there anyway I can pull expat out of the library implementation and = compile it statically into my C code... I am sure there is a way to do this =AF but I can't figure it out... I am = working in an embedded environment so a dll is not a viable option using = the os that we are using... I need to be able to compile this statically = into my binary file. From fdrake@acm.org Mon Aug 20 21:22:56 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 20 Aug 2001 16:22:56 -0400 (EDT) Subject: [Expat-discuss] RE: Parser query In-Reply-To: References: Message-ID: <15233.29088.634733.835654@cj42289-a.reston1.va.home.com> Joe Wiemann wrote: > Is there anyway I can pull expat out of the library implementation and > compile it statically into my C code... > > I am sure there is a way to do this but I can't figure it out... I am > working in an embedded environment so a dll is not a viable option using > the os that we are using... I need to be able to compile this statically > into my binary file. You should be able to use the C files in the lib/ directory; there really aren't any special compilation issues. You should be able to determine what pre-processor flags are appropriate for your platform. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From gstein@lyra.org Mon Aug 20 21:40:55 2001 From: gstein@lyra.org (Greg Stein) Date: Mon, 20 Aug 2001 13:40:55 -0700 Subject: [Expat-discuss] RE: Parser query In-Reply-To: <15233.29088.634733.835654@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Mon, Aug 20, 2001 at 04:22:56PM -0400 References: <15233.29088.634733.835654@cj42289-a.reston1.va.home.com> Message-ID: <20010820134055.F11073@lyra.org> On Mon, Aug 20, 2001 at 04:22:56PM -0400, Fred L. Drake, Jr. wrote: > Joe Wiemann wrote: > > Is there anyway I can pull expat out of the library implementation and > > compile it statically into my C code... > > > > I am sure there is a way to do this but I can't figure it out... I am > > working in an embedded environment so a dll is not a viable option using > > the os that we are using... I need to be able to compile this statically > > into my binary file. > > You should be able to use the C files in the lib/ directory; there > really aren't any special compilation issues. You should be able to > determine what pre-processor flags are appropriate for your platform. You could also link the Expat library into your application using static linking. Cheers, -g -- Greg Stein, http://www.lyra.org/ From syprat@yahoo.fr Tue Aug 21 14:47:49 2001 From: syprat@yahoo.fr (=?iso-8859-1?q?Sylvain=20PRAT?=) Date: Tue, 21 Aug 2001 15:47:49 +0200 (CEST) Subject: [Expat-discuss] Stopping the parser Message-ID: <20010821134749.28831.qmail@web14810.mail.yahoo.com> Hi, I'm experiencing a new problem using expat : I have to do an incremental parsing of a file , i have a file describing a set of events and i need to read it event by event (with first, next functions), so on the event end tag, i would like to stop the parser. But i see no convenient way to do that. I first used a boolean in my loop (reading from the file, parsing the resulting buffer) to stop it (which is set in the event end element handler), but the parser still parsed the rest of the buffer, and i lost the next data. We can't go back parsing (i think), the only way is to create a new parser to parse the next event (starting after the event end tag file position), but obviously, the parser will state an error because at the last event, it will encounter the end tag of the set of events (dealing with errors, possible but very bad solution i think)... I have also thought of using a stack/set/list to remember the events, but it's not easy to make it (it's c++, i couldn't use the stl, and moreover the events are quite complicated to be pushed) So is there's a solution to this, is it difficult to add a stop parser feature, something else ??? Thanks ___________________________________________________________ Do You Yahoo!? -- Vos albums photos en ligne, Yahoo! Photos : http://fr.photos.yahoo.com From lshen@cisco.com Wed Aug 22 02:11:10 2001 From: lshen@cisco.com (Lin Shen) Date: Tue, 21 Aug 2001 18:11:10 -0700 Subject: [Expat-discuss] return tag contents as it is Message-ID: <006201c12aa7$548ac490$738b6b80@cisco.com> Hi, Is it possible to have the parser return the contents of certain tags as it is? For instance, for the following xml segment text is there is a way to let the parser to return whatever contained in tag w/out any parsing, including the escape char and the tag? Please send reply to my personal account lshen@cisco.com thanks lin From Michael_B_Allen@ml.com Wed Aug 22 02:16:18 2001 From: Michael_B_Allen@ml.com (Allen, Michael B (RSCH)) Date: Tue, 21 Aug 2001 21:16:18 -0400 Subject: [Expat-discuss] return tag contents as it is Message-ID: I don't think so. That would be an extra feature of an XML parser. Perhaps you can use the special CDATA tag into which you may put anything including xml. > -----Original Message----- > From: Lin Shen [SMTP:lshen@cisco.com] > Sent: Tuesday, August 21, 2001 9:11 PM > To: expat-discuss@lists.sourceforge.net > Subject: [Expat-discuss] return tag contents as it is > > Hi, > > Is it possible to have the parser return the contents of certain tags as it > is? For instance, for the following xml segment > > > > text > > > > is there is a way to let the parser to return whatever contained in > tag w/out any parsing, including the escape char and the > tag? > > Please send reply to my personal account lshen@cisco.com > > thanks > lin > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > http://lists.sourceforge.net/lists/listinfo/expat-discuss From lshen@cisco.com Wed Aug 22 02:57:52 2001 From: lshen@cisco.com (Lin Shen) Date: Tue, 21 Aug 2001 18:57:52 -0700 Subject: [Expat-discuss] return tag contents as it is References: Message-ID: <007001c12aad$daecdef0$738b6b80@cisco.com> What about getting the begin and end positions of the tag in the buffer (whatever that is)? So that the contents can be copied out. lin ----- Original Message ----- From: "Allen, Michael B (RSCH)" To: "'Lin Shen'" ; Sent: Tuesday, August 21, 2001 6:16 PM Subject: RE: [Expat-discuss] return tag contents as it is > I don't think so. That would be an extra feature of an XML parser. Perhaps you > can use the special CDATA tag into which you may put anything including xml. > > > -----Original Message----- > > From: Lin Shen [SMTP:lshen@cisco.com] > > Sent: Tuesday, August 21, 2001 9:11 PM > > To: expat-discuss@lists.sourceforge.net > > Subject: [Expat-discuss] return tag contents as it is > > > > Hi, > > > > Is it possible to have the parser return the contents of certain tags as it > > is? For instance, for the following xml segment > > > > > > > > text > > > > > > > > is there is a way to let the parser to return whatever contained in > > tag w/out any parsing, including the escape char and the > > tag? > > > > Please send reply to my personal account lshen@cisco.com > > > > thanks > > lin > > > > > > _______________________________________________ > > Expat-discuss mailing list > > Expat-discuss@lists.sourceforge.net > > http://lists.sourceforge.net/lists/listinfo/expat-discuss > From Michael_B_Allen@ml.com Wed Aug 22 03:34:15 2001 From: Michael_B_Allen@ml.com (Allen, Michael B (RSCH)) Date: Tue, 21 Aug 2001 22:34:15 -0400 Subject: [Expat-discuss] return tag contents as it is Message-ID: Well I don't think there is an existing facility for this however I suppose you could add a function to extract that information. If you can identify where exactly begin and end tags are discovered you might set an unsigned int in the XML_Parser structure. Of course you would need to call your function in the begin and end tag handlers before the value is lost and it might need some positional translation. IMHO I don't think you should do what you're doing in the first place though. Storing tagged stuff in XML is error prone to say the least. You could write a much less obfuscated grammer to be parsed with a yacc parser. Mike > -----Original Message----- > From: Lin Shen [SMTP:lshen@cisco.com] > Sent: Tuesday, August 21, 2001 9:58 PM > To: Allen, Michael B (RSCH); expat-discuss@lists.sourceforge.net > Subject: Re: [Expat-discuss] return tag contents as it is > > What about getting the begin and end positions of the tag in the > buffer (whatever that is)? So that the contents can be copied out. > > lin > > ----- Original Message ----- > From: "Allen, Michael B (RSCH)" > To: "'Lin Shen'" ; > Sent: Tuesday, August 21, 2001 6:16 PM > Subject: RE: [Expat-discuss] return tag contents as it is > > > > I don't think so. That would be an extra feature of an XML parser. Perhaps > you > > can use the special CDATA tag into which you may put anything including > xml. > > > > > -----Original Message----- > > > From: Lin Shen [SMTP:lshen@cisco.com] > > > Sent: Tuesday, August 21, 2001 9:11 PM > > > To: expat-discuss@lists.sourceforge.net > > > Subject: [Expat-discuss] return tag contents as it is > > > > > > Hi, > > > > > > Is it possible to have the parser return the contents of certain tags as > it > > > is? For instance, for the following xml segment > > > > > > > > > > > > text > > > > > > > > > > > > is there is a way to let the parser to return whatever contained in > > > tag w/out any parsing, including the escape char and the > > > > tag? > > > > > > Please send reply to my personal account lshen@cisco.com > > > > > > thanks > > > lin > > > > > > > > > _______________________________________________ > > > Expat-discuss mailing list > > > Expat-discuss@lists.sourceforge.net > > > http://lists.sourceforge.net/lists/listinfo/expat-discuss > > > From syprat@yahoo.fr Wed Aug 22 08:10:13 2001 From: syprat@yahoo.fr (=?iso-8859-1?q?Sylvain=20PRAT?=) Date: Wed, 22 Aug 2001 09:10:13 +0200 (CEST) Subject: [Expat-discuss] return tag contents as it is Message-ID: <20010822071013.90450.qmail@web14808.mail.yahoo.com> ---------------------- multipart/mixed attachment Hi, Yes it is possible (i do it in my application). You must use the XML_GetInputContext function associated with the XML_GetCurrentByteCount to rebuild a tag parsed, but you can't get what is inside your grammar tag easily, so you must rebuild the inside of the tag with appropriate callbacks (i set a cdata, start element and end element, which concat all the strings together - i had to ignore xhtml tags in some tags) Bye, ___________________________________________________________ Do You Yahoo!? -- Vos albums photos en ligne, Yahoo! Photos : http://fr.photos.yahoo.com ---------------------- multipart/mixed attachment-- From lshen@cisco.com Fri Aug 24 03:33:56 2001 From: lshen@cisco.com (Lin Shen) Date: Thu, 23 Aug 2001 19:33:56 -0700 Subject: [Expat-discuss] XML_GetInputContext Message-ID: <000f01c12c45$394763b0$738b6b80@cisco.com> Hi, I'm trying to add XML_GetInputContext api to Expat 1.1. By just copying the 1.2 code doesn't work and that's all I experimented so far. Any pointers? thanks lin From gstein@lyra.org Fri Aug 24 04:24:45 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 23 Aug 2001 20:24:45 -0700 Subject: [Expat-discuss] XML_GetInputContext In-Reply-To: <000f01c12c45$394763b0$738b6b80@cisco.com>; from lshen@cisco.com on Thu, Aug 23, 2001 at 07:33:56PM -0700 References: <000f01c12c45$394763b0$738b6b80@cisco.com> Message-ID: <20010823202445.O26054@lyra.org> On Thu, Aug 23, 2001 at 07:33:56PM -0700, Lin Shen wrote: > Hi, > > I'm trying to add XML_GetInputContext api to Expat 1.1. > By just copying the 1.2 code doesn't work and that's all I experimented so > far. > Any pointers? Upgrade your Expat. The APIs are compatible, so why not upgrade? Cheers, -g -- Greg Stein, http://www.lyra.org/ From lshen@cisco.com Fri Aug 24 06:04:00 2001 From: lshen@cisco.com (Lin Shen) Date: Thu, 23 Aug 2001 22:04:00 -0700 Subject: [Expat-discuss] XML_GetInputContext References: <000f01c12c45$394763b0$738b6b80@cisco.com> <20010823202445.O26054@lyra.org> Message-ID: <000e01c12c5a$300b72e0$7ccf150a@cisco.com> We've ported the parser onto another platform, and it's a hell lot of work to upgrade the parser in terms of technical and legal issues. lin ----- Original Message ----- From: "Greg Stein" To: "Lin Shen" Cc: Sent: Thursday, August 23, 2001 8:24 PM Subject: Re: [Expat-discuss] XML_GetInputContext > On Thu, Aug 23, 2001 at 07:33:56PM -0700, Lin Shen wrote: > > Hi, > > > > I'm trying to add XML_GetInputContext api to Expat 1.1. > > By just copying the 1.2 code doesn't work and that's all I experimented so > > far. > > Any pointers? > > Upgrade your Expat. The APIs are compatible, so why not upgrade? > > Cheers, > -g > > -- > Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Aug 24 09:31:35 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 24 Aug 2001 01:31:35 -0700 Subject: [Expat-discuss] XML_GetInputContext In-Reply-To: <000e01c12c5a$300b72e0$7ccf150a@cisco.com>; from lshen@cisco.com on Thu, Aug 23, 2001 at 10:04:00PM -0700 References: <000f01c12c45$394763b0$738b6b80@cisco.com> <20010823202445.O26054@lyra.org> <000e01c12c5a$300b72e0$7ccf150a@cisco.com> Message-ID: <20010824013135.D27852@lyra.org> Ah... yah, I can see how that could be a problem :-) I think it would be helpful if you explain what "doesn't work" means. Compile error? The code doesn't run right? What kind of error... Cheers, -g On Thu, Aug 23, 2001 at 10:04:00PM -0700, Lin Shen wrote: > We've ported the parser onto another platform, and it's a hell lot of work > to upgrade the parser in terms of technical and legal issues. > > lin > > ----- Original Message ----- > From: "Greg Stein" > To: "Lin Shen" > Cc: > Sent: Thursday, August 23, 2001 8:24 PM > Subject: Re: [Expat-discuss] XML_GetInputContext > > > > On Thu, Aug 23, 2001 at 07:33:56PM -0700, Lin Shen wrote: > > > Hi, > > > > > > I'm trying to add XML_GetInputContext api to Expat 1.1. > > > By just copying the 1.2 code doesn't work and that's all I experimented > so > > > far. > > > Any pointers? > > > > Upgrade your Expat. The APIs are compatible, so why not upgrade? > > > > Cheers, > > -g > > > > -- > > Greg Stein, http://www.lyra.org/ > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > http://lists.sourceforge.net/lists/listinfo/expat-discuss -- Greg Stein, http://www.lyra.org/ From kfox@vulpes.com Sun Aug 26 00:58:16 2001 From: kfox@vulpes.com (Ken Fox) Date: Sat, 25 Aug 2001 19:58:16 -0400 Subject: [Expat-discuss] Documentation on Expat internals? Message-ID: <3B883B98.CC3BA797@vulpes.com> I've been using Expat for quite a while now under the guise of the Perl XML::Parser module. Just recently, i.e. today, I started getting into Expat internals. Are there any docs available? I've started some bits on the input buffer and I'm wondering if anybody else is doing this already. Google didn't turn anything up, which seems remarkable considering how popular Expat is. My docs are at . The thing that started me looking deeper into Expat is a performance problem with XML::Parser. I've always thought it was "fast enough", but recently (yesterday!) I had to replace it with a nest of regular expressions. Something's not right with that. The first part of the plan is to reduce buffer copying and improve I/O performance because that looks pretty easy. The second is to speed up the callbacks by eliminating some of the Perl sub call overhead. (It will be faster, but less general.) - Ken From shashank@arkionsystems.com Mon Aug 27 02:24:39 2001 From: shashank@arkionsystems.com (Shashank Banerjea) Date: Sun, 26 Aug 2001 21:24:39 -0400 Subject: [Expat-discuss] Basic Build Question on expat on VxWorks Message-ID: <000101c12e97$0afe6c70$04d95b18@arkserver> This is a multi-part message in MIME format. ---------------------- multipart/alternative attachment Hi, I downloaded the latest source code for expat XML parser from the Source Forge website. I plan to use this act to an XML parser on VxWorks to parse incoming XML messages. Anyone has pointers on building expat on VxWorks 5.4 (Tornado 2.0) with gnu C complier? Thanks in advance With best regards Shashank Banerjea ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20010826/c424a483/attachment.html ---------------------- multipart/alternative attachment-- From mrityu@aplion.stpn.soft.net Mon Aug 27 05:00:28 2001 From: mrityu@aplion.stpn.soft.net (Mrityunjay Kumar) Date: Mon, 27 Aug 2001 09:30:28 +0530 Subject: [Expat-discuss] Basic Build Question on expat on VxWorks In-Reply-To: <000101c12e97$0afe6c70$04d95b18@arkserver> Message-ID: <007401c12eac$cfb7b3e0$08080808@mrityunjay> This is a multi-part message in MIME format. ---------------------- multipart/alternative attachment Well, I built it recently, without any hitch; the core part of the parser (expat/lib). A good idea is to first build it on a linux machine (using the Configure script was straightforward for me), try it to verify it works, and then take the generated config.h and the makefile to VxWorks. Hope it helps. -Mrityunjay -----Original Message----- From: expat-discuss-admin@lists.sourceforge.net [mailto:expat-discuss-admin@lists.sourceforge.net]On Behalf Of Shashank Banerjea Sent: Monday, August 27, 2001 6:55 AM To: expat-discuss@lists.sourceforge.net Subject: [Expat-discuss] Basic Build Question on expat on VxWorks Hi, I downloaded the latest source code for expat XML parser from the Source Forge website. I plan to use this act to an XML parser on VxWorks to parse incoming XML messages. Anyone has pointers on building expat on VxWorks 5.4 (Tornado 2.0) with gnu C complier? Thanks in advance With best regards Shashank Banerjea ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20010827/f3e76755/attachment.html ---------------------- multipart/alternative attachment-- From ghein@thisisa.com Tue Aug 28 00:03:29 2001 From: ghein@thisisa.com (Glen Hein) Date: Mon, 27 Aug 2001 16:03:29 -0700 (MST) Subject: [Expat-discuss] whitespace trouble Message-ID: <998953409.3b8ad1c161583@thisisa.com> Hello Everybody! I'm new to expat and I recently adopted a project that was using expat. Unfortunately, I do not have contact with the orginal programmer. I've having a problem with whitespace in the xml structure that is to be parsed. It appears that any whitespace bewtween the xml tags casuse the parser to fail. I have not yet determined if the problem is in expat or in the overall application. Is there a known problem with whitespace? Is there a particular area of expat that I should be looking at? Is whitespace an option in expat? Thanks, Glen Hein ghein@thisisa.com From michael@vivtek.com Tue Aug 28 00:11:20 2001 From: michael@vivtek.com (Michael Roberts) Date: Mon, 27 Aug 2001 18:11:20 -0500 Subject: [Expat-discuss] whitespace trouble References: <998953409.3b8ad1c161583@thisisa.com> Message-ID: <3B8AD398.F89A652E@vivtek.com> I've certainly never had any trouble with whitespace. Would it be possible for you to include a sample of a *small* XML document which causes a failure, and the error code you get? That would help narrow things down. Glen Hein wrote: > Hello Everybody! > > I'm new to expat and I recently adopted a project that was > using expat. Unfortunately, I do not have contact with the > orginal programmer. > > I've having a problem with whitespace in the xml structure that > is to be parsed. It appears that any whitespace bewtween the > xml tags casuse the parser to fail. > > I have not yet determined if the problem is in expat or in the > overall application. > > Is there a known problem with whitespace? Is there a particular > area of expat that I should be looking at? Is whitespace an > option in expat? > > Thanks, > Glen Hein > ghein@thisisa.com > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > http://lists.sourceforge.net/lists/listinfo/expat-discuss From Michael_B_Allen@ml.com Tue Aug 28 00:41:58 2001 From: Michael_B_Allen@ml.com (Allen, Michael B (RSCH)) Date: Mon, 27 Aug 2001 19:41:58 -0400 Subject: [Expat-discuss] whitespace trouble Message-ID: Be very carefull about the code in your CharacterDataHandler function. The strings passed to it are NOT null terminated and may not be all of the text (may be broken into adjacent peices). This is a great place for an error. See how DOMC handles it in the chardata_fn when building the DOM tree here: http://auditorymodels.org/domc/src/lib/expatls.c Mike > -----Original Message----- > From: Glen Hein [SMTP:ghein@thisisa.com] > Sent: Monday, August 27, 2001 7:03 PM > To: expat-discuss@lists.sourceforge.net > Subject: [Expat-discuss] whitespace trouble > > > Hello Everybody! > > I'm new to expat and I recently adopted a project that was > using expat. Unfortunately, I do not have contact with the > orginal programmer. > > I've having a problem with whitespace in the xml structure that > is to be parsed. It appears that any whitespace bewtween the > xml tags casuse the parser to fail. > > I have not yet determined if the problem is in expat or in the > overall application. > > Is there a known problem with whitespace? Is there a particular > area of expat that I should be looking at? Is whitespace an > option in expat? > > Thanks, > Glen Hein > ghein@thisisa.com > > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > http://lists.sourceforge.net/lists/listinfo/expat-discuss From ghein@thisisa.com Tue Aug 28 02:51:13 2001 From: ghein@thisisa.com (Glen Hein) Date: Mon, 27 Aug 2001 18:51:13 -0700 (MST) Subject: [Expat-discuss] whitespace trouble In-Reply-To: References: Message-ID: <998963473.3b8af911f17c1@thisisa.com> I'd like to thank you from the bottom of my keyboard :-) I examined the fundamental differences between my CharacterDataHandler and the version you cited. All I had to do was to have my handler skip the whitespace: for (i = 0; i < len && !isgraph(s[i]); i++) { ; } if (i == len) { return; } -Glen Hein Quoting "Allen, Michael B (RSCH)" : > Be very carefull about the code in your CharacterDataHandler > function. The strings passed to it are NOT null terminated and > may not be all of the text (may be broken into adjacent > peices). This is a great place for an error. See how DOMC > handles it in the chardata_fn when building the DOM tree here: > > http://auditorymodels.org/domc/src/lib/expatls.c > > Mike > > -----Original Message----- > > From: Glen Hein [SMTP:ghein@thisisa.com] > > Sent: Monday, August 27, 2001 7:03 PM > > To: expat-discuss@lists.sourceforge.net > > Subject: [Expat-discuss] whitespace trouble > > > > > > Hello Everybody! > > > > I'm new to expat and I recently adopted a project that was > > using expat. Unfortunately, I do not have contact with the > > orginal programmer. > > > > I've having a problem with whitespace in the xml structure that > > is to be parsed. It appears that any whitespace bewtween the > > xml tags casuse the parser to fail. > > > > I have not yet determined if the problem is in expat or in the > > overall application. > > > > Is there a known problem with whitespace? Is there a particular > > area of expat that I should be looking at? Is whitespace an > > option in expat? > > > > Thanks, > > Glen Hein > > ghein@thisisa.com > > > > > > > > _______________________________________________ > > Expat-discuss mailing list > > Expat-discuss@lists.sourceforge.net > > http://lists.sourceforge.net/lists/listinfo/expat-discuss > From Michael_B_Allen@ml.com Tue Aug 28 02:53:16 2001 From: Michael_B_Allen@ml.com (Allen, Michael B (RSCH)) Date: Mon, 27 Aug 2001 21:53:16 -0400 Subject: [Expat-discuss] whitespace trouble Message-ID: You're welcome. Ironically this is a peice of code that should be taken out of DOMC because it assumes the user does not need it. But that's my problem. Later, Mike > -----Original Message----- > From: Glen Hein [SMTP:ghein@thisisa.com] > Sent: Monday, August 27, 2001 9:51 PM > To: Allen, Michael B (RSCH) > Cc: 'Glen Hein'; expat-discuss@lists.sourceforge.net > Subject: RE: [Expat-discuss] whitespace trouble > > > I'd like to thank you from the bottom of my keyboard :-) > > I examined the fundamental differences between my > CharacterDataHandler and the version you cited. All I had > to do was to have my handler skip the whitespace: > > for (i = 0; i < len && !isgraph(s[i]); i++) { > ; > } > if (i == len) { > return; > } > > -Glen Hein > > Quoting "Allen, Michael B (RSCH)" : > > > Be very carefull about the code in your CharacterDataHandler > > function. The strings passed to it are NOT null terminated and > > may not be all of the text (may be broken into adjacent > > peices). This is a great place for an error. See how DOMC > > handles it in the chardata_fn when building the DOM tree here: > > > > http://auditorymodels.org/domc/src/lib/expatls.c > > > > Mike > > > -----Original Message----- > > > From: Glen Hein [SMTP:ghein@thisisa.com] > > > Sent: Monday, August 27, 2001 7:03 PM > > > To: expat-discuss@lists.sourceforge.net > > > Subject: [Expat-discuss] whitespace trouble > > > > > > > > > Hello Everybody! > > > > > > I'm new to expat and I recently adopted a project that was > > > using expat. Unfortunately, I do not have contact with the > > > orginal programmer. > > > > > > I've having a problem with whitespace in the xml structure that > > > is to be parsed. It appears that any whitespace bewtween the > > > xml tags casuse the parser to fail. > > > > > > I have not yet determined if the problem is in expat or in the > > > overall application. > > > > > > Is there a known problem with whitespace? Is there a particular > > > area of expat that I should be looking at? Is whitespace an > > > option in expat? > > > > > > Thanks, > > > Glen Hein > > > ghein@thisisa.com > > > > > > > > > > > > _______________________________________________ > > > Expat-discuss mailing list > > > Expat-discuss@lists.sourceforge.net > > > http://lists.sourceforge.net/lists/listinfo/expat-discuss > > From shashank@arkionsystems.com Tue Aug 28 05:14:54 2001 From: shashank@arkionsystems.com (Shashank Banerjea) Date: Tue, 28 Aug 2001 00:14:54 -0400 Subject: [Expat-discuss] Basic Build Question on expat on VxWorks References: <007401c12eac$cfb7b3e0$08080808@mrityunjay> Message-ID: <006c01c12f77$fe155870$0adb5b18@SBVAIO> Thanks Mr. Mrityunjay Kumar . Before I take this step, I clarify one doubt, I am on a x86 architecture on Linux while my target is VxWorks on ARM. Will it require tweaking the makefile? Shashank ----- Original Message ----- From: Mrityunjay Kumar To: 'Shashank Banerjea' ; expat-discuss@lists.sourceforge.net Sent: Monday, August 27, 2001 12:00 AM Subject: RE: [Expat-discuss] Basic Build Question on expat on VxWorks Well, I built it recently, without any hitch; the core part of the parser (expat/lib). A good idea is to first build it on a linux machine (using the Configure script was straightforward for me), try it to verify it works, and then take the generated config.h and the makefile to VxWorks. Hope it helps. -Mrityunjay -----Original Message----- From: expat-discuss-admin@lists.sourceforge.net [mailto:expat-discuss-admin@lists.sourceforge.net]On Behalf Of Shashank Banerjea Sent: Monday, August 27, 2001 6:55 AM To: expat-discuss@lists.sourceforge.net Subject: [Expat-discuss] Basic Build Question on expat on VxWorks Hi, I downloaded the latest source code for expat XML parser from the Source Forge website. I plan to use this act to an XML parser on VxWorks to parse incoming XML messages. Anyone has pointers on building expat on VxWorks 5.4 (Tornado 2.0) with gnu C complier? Thanks in advance With best regards Shashank Banerjea From JWieman@daktronics.com Wed Aug 29 15:52:35 2001 From: JWieman@daktronics.com (Joe Wiemann) Date: Wed, 29 Aug 2001 09:52:35 -0500 Subject: [Expat-discuss] How can I add a function to skip a set number of bytes Message-ID: How can I add a function to skip a set number of bytes in the expat parser = buffer... Such as if I get a certain begin tag ? it gives me a length of text that = it has in it. I want to skip the text that is in the tag. From mballen@erols.com Wed Aug 29 18:43:57 2001 From: mballen@erols.com (Michael B. Allen) Date: Wed, 29 Aug 2001 13:43:57 -0400 Subject: [Expat-discuss] How can I add a function to skip a set number of bytes In-Reply-To: ; from JWieman@daktronics.com on Wed, Aug 29, 2001 at 09:52:35AM -0500 References: Message-ID: <20010829134357.C836@nano.foo.net> On Wed, Aug 29, 2001 at 09:52:35AM -0500, Joe Wiemann wrote: > How can I add a function to skip a set number of bytes in the expat parser buffer... > > Such as if I get a certain begin tag ? it gives me a length of text that it has in it. > > I want to skip the text that is in the tag. You don't want to "skip some bytes", you want to skip that tag and all it's children? This is very simple to do with Expat. Look at the outline.c or elements.c example programs that come with Expat. Modify the example to pass a flag with or as the user ponter and use it to indicate that the input should(n't) be ignored. IOW, start with the flag off. Test it at the beginning of you start tag handler. If it's off, process as normal. When you see that flag on, don't do anything in the begin tag handler, just return. When you see the end tag for that element, flip the flag off. Get it? I beleive the Expat tutorial at XML.com describes exactly this problem. Mike -- Wow a memory-mapped fork bomb! Now what on earth did you expect? - lkml From mballen@erols.com Thu Aug 30 19:24:12 2001 From: mballen@erols.com (Michael B. Allen) Date: Thu, 30 Aug 2001 14:24:12 -0400 Subject: [Expat-discuss] How can I add a function to skip a set number of bytes In-Reply-To: ; from JWieman@daktronics.com on Wed, Aug 29, 2001 at 01:40:27PM -0500 References: Message-ID: <20010830142412.A932@nano.foo.net> Sorry if I lost track of the thread but did you participate in the recent XML_GetInputContext discussion? If not, maybe that's something you can use? Mike On Wed, Aug 29, 2001 at 01:40:27PM -0500, Joe Wiemann wrote: > But I cannot 100 percent guarantee that the closing tag won't appear within the text. so it would be better if I could just skip a set number of bytes in the parser buffer. > > You don't want to "skip some bytes", you want to skip that tag and all > it's children? This is very simple to do with Expat. Look at the outline.c -- Wow a memory-mapped fork bomb! Now what on earth did you expect? - lkml