From software.au at gmail.com Wed Nov 1 12:27:31 2006 From: software.au at gmail.com (James Buchanan) Date: Wed, 1 Nov 2006 22:27:31 +1100 Subject: [Expat-discuss] expat stops parsing at '&' then calls my character data handler Message-ID: Hello, I'm parsing XML SOAP envelopes and I will typically have something like this: http://www.example.com/news/?cat=12&paged=2 The problem is that when my character data handler is called, I will get it in pieces like this: http://www.example.com/news/?cat=12 & paged=2 I obviously want it to return it to me in once piece, i.e.: http://www.example.com/news/?cat=12&paged=2 I have thought about "pre-processing" the XML data beforehand to find all occurances of & and replacing them with &. Is that a good way of handling this, or is there an expat API I can use to ignore it and return me the & (and others) in URLs and other parts so I get them in one piece and replace them with their proper characters manually? I was also thinking of setting the userData pointer to a bool so that when my start tag handler sees the tag is URL, set the userData to true (as in "in url") so that when my character data handler runs it will "accumulate" the data as it comes in if the next call of my char data handler returns & character. Then set the userData bool var "in url" to false when the char handler sees the beginning of a new URL by looking out for http://, for example. The previously accumulated pieces could then be concatenated and I'd have my URL in tact with the & where it previously sent the & by itself. What would be the best way to handle this? Any advice? Thanks very much, greatly appreciated. Spartacus From karl at waclawek.net Fri Nov 3 02:16:09 2006 From: karl at waclawek.net (Karl Waclawek) Date: Thu, 02 Nov 2006 20:16:09 -0500 Subject: [Expat-discuss] expat stops parsing at '&' then calls my character data handler In-Reply-To: References: Message-ID: <454A9859.6050609@waclawek.net> James Buchanan wrote: > I was also thinking of setting the userData pointer to a bool so that > when my start tag handler sees the tag is URL, set the userData to > true (as in "in url") so that when my character data handler runs it > will "accumulate" the data as it comes in if the next call of my char > data handler returns & character. Then set the userData bool var "in > url" to false when the char handler sees the beginning of a new URL by > looking out for http://, for example. The previously accumulated > pieces could then be concatenated and I'd have my URL in tact with the > & where it previously sent the & by itself. > > What would be the best way to handle this? Any advice? > > > Yes, accumulating character data in a buffer is the standard way of dealing with multiple character events in Expat. Karl From jonathan at claggett.org Fri Nov 3 05:22:45 2006 From: jonathan at claggett.org (Jonathan Claggett) Date: Thu, 2 Nov 2006 23:22:45 -0500 Subject: [Expat-discuss] Pull parsing with Expat? Message-ID: Hello, I'm looking to use Expat for parsing XML files (because it's fast and competent) but not via the standard callbacks. Instead, I want to loop through an input file by repeatedly calling an Expat function that returns the next XML token and its type from the input file. Ideally, there would be an XML_ParseNextToken() (or maybe the XmlTok macros?) function or something like that. I believe this kind of processing is known as 'pull' parsing since the application explicitly requests the next token to be handled. Anyway, is this kind of usage something Expat is suited for or can even handle? Thanks, Jonathan Claggett From jonathan at claggett.org Sat Nov 4 01:24:12 2006 From: jonathan at claggett.org (Jonathan Claggett) Date: Fri, 3 Nov 2006 19:24:12 -0500 Subject: [Expat-discuss] Fwd: Pull parsing with Expat? In-Reply-To: References: Message-ID: On 11/3/06, Nick MacDonald wrote: > > Jonathan: > > Can you explain a good reason for wanting to do this? The reason is that I want to be able to make nested calls to parse the XML data. When I am parsing a start tag, I'd like to be able to explicitly restart the parsing of that tag's contents and any sub-tags it may have. I'll use your XML data below to show what I want to write. You're not just being a lazy designer, right? :-) Well of course I'm being being lazy. Goes without saying. ;-) Let me give you an example of > where I think this would be a bad idea, and you tell me what you > think... > > If the data looked liked this: > First bit of Tag1 data > Some tag2 data > Additional Tag1 data > > Yet more Tag1 data > > > What particular set of tokens would you expect to receive from this XML > file? Here is some pseudo code of how I would like to parse the above XML: main() { parser = XML_Parser ("SAMPLE XML"); parser.setElementHandler ("tag1", ParseTag1); parser.parse(); } ParseTag1(parser, name, attrs) { printf ("Tag1 has started"); parser.setDataHandler (ParseTag1Data); parser.setElementHandler ("tag2", ParseTag2); parser.setElementHandler ("tag3", ParseTag3); parser.parse(); // This will not return until printf ("Tag1 has ended."); } ParseTag1Data(parser, data) { printf ("Tag1 data: %s", data); } ParseTag2(parser, name, attrs) { // looks similar to ParseTag1. with the parser.parse() // call being made in it somewhere and returning only // once has been read. } ParseTag3(parser, name, attrs) { // more of the same. } Please note that I do not want the above sample API to be added to Expat. I'm not trying to rewrite Expat into something it isn't. I'm merely interested in trying implement the above API by using functionality which Expat already has: parsing XML tokens sequentially (in the doContent() function). My suggestion to you, if you *really* still feel you need a pull > mechanism, is to write a small wedge between eXpat and your code that > uses eXpat the way it is expected, and provides a "pull" interface for > your code. I have it in my head, that if you are willing to accept > certain limitations, such code shouldn't be too complex to code up. I'm interested to know how the wedge code would work. Would it use the callbacks to call XML_PasrserStop? One thought I had was if there was a way to make XML_Parse return each time a callback was called. That would be sufficient for my goals. Thanks for your comments, Jonathan From karl at waclawek.net Sat Nov 4 02:57:56 2006 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 03 Nov 2006 20:57:56 -0500 Subject: [Expat-discuss] Fwd: Pull parsing with Expat? In-Reply-To: References: Message-ID: <454BF3A4.9090502@waclawek.net> Jonathan Claggett wrote: > > I'm interested to know how the wedge code would work. Would it use the > callbacks to call XML_PasrserStop? One thought I had was if there was a way > to make XML_Parse return each time a callback was called. That would be > sufficient for my goals. > XML_ParserStop() was added specifically to enable "pull" usage for Expat. It must be called from within a call-back handler. Karl From karl at waclawek.net Sat Nov 4 15:35:54 2006 From: karl at waclawek.net (Karl Waclawek) Date: Sat, 04 Nov 2006 09:35:54 -0500 Subject: [Expat-discuss] Pull parsing with Expat? In-Reply-To: References: Message-ID: <454CA54A.9060901@waclawek.net> Jonathan Claggett wrote: > Hello, > > I'm looking to use Expat for parsing XML files (because it's fast and > competent) but not via the standard callbacks. Instead, I want to loop > through an input file by repeatedly calling an Expat function that returns > the next XML token and its type from the input file. Ideally, there would be > an XML_ParseNextToken() (or maybe the XmlTok macros?) function or something > like that. I believe this kind of processing is known as 'pull' parsing > since the application explicitly requests the next token to be handled. > Anyway, is this kind of usage something Expat is suited for or can even > handle? > > Not directly. You would have to write a Pull layer with internal call-back handlers on top of Expat using XML_ParserStop in each handler. Karl From tom at abc123.dowco.com Sun Nov 5 00:10:57 2006 From: tom at abc123.dowco.com (Tom Younger) Date: Sat, 4 Nov 2006 15:10:57 -0800 (PST) Subject: [Expat-discuss] Fwd: Pull parsing with Expat? In-Reply-To: Message-ID: On Fri, 3 Nov 2006, Jonathan Claggett wrote: > > Let me give you an example of > > where I think this would be a bad idea, and you tell me what you > > think... > > > > If the data looked liked this: > > First bit of Tag1 data > > Some tag2 data > > Additional Tag1 data > > > > Yet more Tag1 data > > > > > > What particular set of tokens would you expect to receive from this XML > > file? > I solved essentially the same problem by using a stack of callback handlers. From jameswhetstone at comcast.net Sat Nov 11 19:10:00 2006 From: jameswhetstone at comcast.net (James Whetstone) Date: Sat, 11 Nov 2006 10:10:00 -0800 Subject: [Expat-discuss] How to extract untranslated XML from a stream References: Message-ID: <004601c705bc$9a9b5490$6501a8c0@crankshaft> Hi, I'm working on a project where, in addition to parsing all the elements in an XML stream, I need to obtain all the unparsed XML content between 2 tags. Here's an example: data more data So with this example, I want to extract all the XML content between the tags and then parse the stream using standard handlers. Can anyone make some suggestion here or point me in the right direction? Is this even possible with the API? From what I can tell, it isn't really designed to extract untranslated XML bytes like this. The reason I want to do this is because I want to parse the stream and save the data to a database in two ways: I want to save the data as individual columns AND as an entire XML document for quick retrieval. Ideally, I'd like to store the XML document without having to re-copy the content to a separate buffer. Thanks, James From jbnivoit at gmail.com Tue Nov 14 00:32:34 2006 From: jbnivoit at gmail.com (Jean-Baptiste Nivoit) Date: Tue, 14 Nov 2006 00:32:34 +0100 Subject: [Expat-discuss] integrating XML_Memory_Handling_Suite with apr_pool_t Message-ID: <45590092.4060601@gmail.com> Hi, i have a question: why are the functions in the XML_Memory_Handling_Suite not taking an additionnal "userdata" argument that could be used to pass, for instance, an apr_pool_t pointer? I am citing the example of using the apache allocator for instance when i know in advance that the scope of allocations done by a XML_Parser i create will be within the scope of an HTTP request. Another example would be a multithreaded program where each thread uses its own parser and requires the use of its own private allocator, then i'd pass the pointer to the allocator object around in the memory suite and at alloc/free sites within expat. As is, the XML_Memory_Handling_Suite would make me maintain per-thread copies of the thread-local apr_pool_t* currently in use in each thread: i'd rather pass explicitly which pool is being used rather than having to store in thread-local memory. jb. From rmukhesh at yahoo.com Wed Nov 15 14:01:49 2006 From: rmukhesh at yahoo.com (Mukhesh TVR) Date: Wed, 15 Nov 2006 05:01:49 -0800 (PST) Subject: [Expat-discuss] expat throws wrong error message Message-ID: <20061115130149.35408.qmail@web31705.mail.mud.yahoo.com> Hi All, I am using expat 2.0, and trying to parse the following xml-string. " testing soap " And, I get an error "must not undeclare prefix". But, I expect error like "unbound prefix" as the prefix 'soap' is not bound to any URI. Can somebody help in solving this ? regards, Mukhesh. ____________________________________________________________________________________ Sponsored Link Don't quit your job - take classes online www.Classesusa.com From karl at waclawek.net Wed Nov 15 14:55:19 2006 From: karl at waclawek.net (Karl Waclawek) Date: Wed, 15 Nov 2006 08:55:19 -0500 Subject: [Expat-discuss] expat throws wrong error message In-Reply-To: <20061115130149.35408.qmail@web31705.mail.mud.yahoo.com> References: <20061115130149.35408.qmail@web31705.mail.mud.yahoo.com> Message-ID: <455B1C47.6010107@waclawek.net> Mukhesh TVR wrote: > Hi All, > > I am using expat 2.0, and trying to parse the following xml-string. > " testing soap " > > And, I get an error "must not undeclare prefix". > But, I expect error like "unbound prefix" as the prefix 'soap' is not bound to any URI. > > I got an unbound prefix error. Karl From nidhiv22 at yahoo.com Thu Nov 16 23:56:01 2006 From: nidhiv22 at yahoo.com (Nidhi Vaidya) Date: Thu, 16 Nov 2006 14:56:01 -0800 (PST) Subject: [Expat-discuss] Stop/Resume Api Q Message-ID: <20061116225601.4162.qmail@web34611.mail.mud.yahoo.com> Hi all, I have created a wrapper fo Expat to manage the xml parsing from a client code. For that I am using "XML_StopParser" and "XML_ResumeParser". Can anyone please suggest which callback is best suited to call "XML_StopParser" so all the data of the current element (Name, Attributes(parsed in startElementCallback(....); Value (parsing in CharHandlerCallback(....)) is parsed properly before next "Resume". Thanks in advance. --------------------------------- Everyone is raving about the all-new Yahoo! Mail beta. From kumar_qnx at yahoo.com Fri Nov 17 18:58:04 2006 From: kumar_qnx at yahoo.com (kumar qnx) Date: Fri, 17 Nov 2006 09:58:04 -0800 (PST) Subject: [Expat-discuss] Fwd: Re: How to extract untranslated XML from a stream Message-ID: <565753.47563.qm@web57707.mail.re3.yahoo.com> Hi James, Can you be more clear in your example, you can obtain all the content within an xml, what do you with the content is upto you. I dont understand by unparsed content ?, is that the data contained within two element pairs ?, if that is the case you should be able to get it in your character handler. Pavan James Whetstone wrote: Hi, I'm working on a project where, in addition to parsing all the elements in an XML stream, I need to obtain all the unparsed XML content between 2 tags. Here's an example: data more data So with this example, I want to extract all the XML content between the tags and then parse the stream using standard handlers. Can anyone make some suggestion here or point me in the right direction? Is this even possible with the API? From what I can tell, it isn't really designed to extract untranslated XML bytes like this. The reason I want to do this is because I want to parse the stream and save the data to a database in two ways: I want to save the data as individual columns AND as an entire XML document for quick retrieval. Ideally, I'd like to store the XML document without having to re-copy the content to a separate buffer. Thanks, James _______________________________________________ Expat-discuss mailing list Expat-discuss at libexpat.org http://mail.libexpat.org/mailman/listinfo/expat-discuss --------------------------------- Everyone is raving about the all-new Yahoo! Mail beta. --------------------------------- Sponsored Link Mortgage rates near 39yr lows. $510,000 Mortgage for $1,698/mo - Calculate new house payment From jzhang at ximpleware.com Sat Nov 18 20:28:05 2006 From: jzhang at ximpleware.com (Jimmy Zhang) Date: Sat, 18 Nov 2006 11:28:05 -0800 Subject: [Expat-discuss] code for parsing content References: Message-ID: <012201c70b47$ae27dca0$0d02a8c0@ximpleware> Can someone point me to the expat code that parses character data? I would like to know the line number in the .c and .h files and how it checks the validity of the character as defined in XML spec... Thanks, jz From karl at waclawek.net Sat Nov 18 22:25:30 2006 From: karl at waclawek.net (Karl Waclawek) Date: Sat, 18 Nov 2006 16:25:30 -0500 Subject: [Expat-discuss] code for parsing content In-Reply-To: <012201c70b47$ae27dca0$0d02a8c0@ximpleware> References: <012201c70b47$ae27dca0$0d02a8c0@ximpleware> Message-ID: <455F7A4A.200@waclawek.net> Jimmy Zhang wrote: > Can someone point me to the expat code that > parses character data? I would like to know > the line number in the .c and .h files and how it > checks the validity of the character as defined > in XML spec... > > It's mostly done through macros in xmltok.c. Example: UTF8_INVALID2 There are also lookup tables defined in asciitab.h, latin1tab.h, utf8tab.h, nametab.h, etc. Karl From ramamurthy.suresh at wipro.com Tue Nov 21 11:46:57 2006 From: ramamurthy.suresh at wipro.com (ramamurthy.suresh at wipro.com) Date: Tue, 21 Nov 2006 16:16:57 +0530 Subject: [Expat-discuss] Reg: Using Expat in NetBSD Message-ID: <438662DA48DCAA41B1DF648BD4BD76C005A774A4@CHN-SNR-MBX01.wipro.com> Hi, Can Expat be used in NetBSD. I was successfull in installing in linux. I'm trying to use only the Library to link with an C file where i have used the Expat API's. Used gcc with the LD Library linkage. Is that possible to do the same in NetBSD? If not how do to it? Thanks, Suresh R. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com From ramamurthy.suresh at wipro.com Tue Nov 21 12:28:48 2006 From: ramamurthy.suresh at wipro.com (ramamurthy.suresh at wipro.com) Date: Tue, 21 Nov 2006 16:58:48 +0530 Subject: [Expat-discuss] Reg: Using Expat in NetBSD Message-ID: <438662DA48DCAA41B1DF648BD4BD76C005A77543@CHN-SNR-MBX01.wipro.com> Hi, Can Expat be used in NetBSD. I was successfull in installing in linux. I'm trying to use only the Library to link with an C file where i have used the Expat API's. Used gcc with the LD Library linkage. Is that possible to do the same in NetBSD? If not how do to it? Thanks, Suresh R. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com From howachen at gmail.com Fri Nov 24 07:42:46 2006 From: howachen at gmail.com (howard chen) Date: Fri, 24 Nov 2006 14:42:46 +0800 Subject: [Expat-discuss] Expat VS XML::Twig (Perl) Message-ID: Currently I have around 1.2GB raw XML file to be processed using XML::Twig (a perl module, which is also using expat), need around 2 hours to parse and write to a CSV file. (My code is quite optimized) If i use the expat directly, how many % in time I can gain? From karl at waclawek.net Fri Nov 24 14:52:45 2006 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 24 Nov 2006 08:52:45 -0500 Subject: [Expat-discuss] Expat VS XML::Twig (Perl) In-Reply-To: References: Message-ID: <4566F92D.3000207@waclawek.net> howard chen wrote: > Currently I have around 1.2GB raw XML file to be processed using > XML::Twig (a perl module, which is also using expat), need around 2 > hours to parse and write to a CSV file. (My code is quite optimized) > > If i use the expat directly, how many % in time I can gain? > _______________________________________________ > My guess is it will take less than half an hour. Karl From swalker at bynari.net Tue Nov 28 00:48:47 2006 From: swalker at bynari.net (Shawn Walker) Date: Mon, 27 Nov 2006 17:48:47 -0600 Subject: [Expat-discuss] How to have expat handle and Message-ID: <456B795F.1090207@bynari.net> I have some data in the XML file that has and , expat bails out of the parsing and return "not well-formed (invalid token)". What do I need to do to get expat to allow those to be parsed? Thanks, Shawn