From dmb at mrc-dunn.cam.ac.uk Sun May 16 18:50:49 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Sun May 16 18:37:15 2004 Subject: [Expat-discuss] junk after document element at line 2053 Message-ID: junk after document element at line 2053, column 0, byte 107114 at /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser.pm line 185 I get the above where the first xml document ends and the next begins. I am trying to parse the file with perl XML::Parser I want the parser to simply keep going past the first document and onto the second... Could I just wrap the whole file in XML document tags? Sorry for my ignorance, but how can I do this? Suppose file1, file2 and file3 all contain multiple concatenated XML documents, how do I create a fourth file (file4) to 'pull in' file[1-3] ? This sounds familiar, but I have ~ zero XML experience. Thanks for any suggestions, Dan. From Greg.Martin at TELUS.COM Mon May 17 17:43:42 2004 From: Greg.Martin at TELUS.COM (Greg Martin) Date: Mon May 17 17:43:56 2004 Subject: [Expat-discuss] junk after document element at line 2053 Message-ID: A well-formed XML document has only one top level tag (as you've discovered). I think that you can only have a prolog at the beginning of a document (which would probably justify the name prolog) which would mean that if you wrapped three files in a top-level tag and any had prolog's it probably wouldn't be well-formed either. If there was the possibility of any of the files having a prolog you might be better off to instantiate a new parser for each file. -----Original Message----- From: expat-discuss-bounces@libexpat.org [mailto:expat-discuss-bounces@libexpat.org]On Behalf Of Dan Bolser Sent: Sunday, May 16, 2004 4:51 PM To: expat-discuss@libexpat.org Subject: [Expat-discuss] junk after document element at line 2053 junk after document element at line 2053, column 0, byte 107114 at /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser.pm line 185 I get the above where the first xml document ends and the next begins. I am trying to parse the file with perl XML::Parser I want the parser to simply keep going past the first document and onto the second... Could I just wrap the whole file in XML document tags? Sorry for my ignorance, but how can I do this? Suppose file1, file2 and file3 all contain multiple concatenated XML documents, how do I create a fourth file (file4) to 'pull in' file[1-3] ? This sounds familiar, but I have ~ zero XML experience. Thanks for any suggestions, Dan. _______________________________________________ Expat-discuss mailing list Expat-discuss@libexpat.org http://mail.libexpat.org/mailman/listinfo/expat-discuss From dmb at mrc-dunn.cam.ac.uk Tue May 18 04:55:47 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Tue May 18 04:42:43 2004 Subject: [Expat-discuss] junk after document element at line 2053 In-Reply-To: Message-ID: On Mon, 17 May 2004, Greg Martin wrote: >A well-formed XML document has only one top level tag (as you've >discovered). I think that you can only have a prolog at the beginning of >a document (which would probably justify the name prolog) which would >mean that if you wrapped three files in a top-level tag and any had >prolog's it probably wouldn't be well-formed either. If there was the >possibility of any of the files having a prolog you might be better off >to instantiate a new parser for each file. Yup, I found this out too... (I guess by prolog you mean something like:- Sadly this occurs for every XML document in the file, and makes the parser unhappy even when I wrap the whole file in a top level tag. In the end I stripped out all the lines like the above from the file (from 1000's of individual XML documents), then I did somthing like cat "" multi_xml_document_files_(with_prologs_removed) "" | my_xml_parser.plx Except that exact syntax won't work, but you get the idea. How could I request some XML::Parser options to make its checking less strict? Is this a bad road to go down? Thanks very much, Dan. > >-----Original Message----- >From: expat-discuss-bounces@libexpat.org >[mailto:expat-discuss-bounces@libexpat.org]On Behalf Of Dan Bolser >Sent: Sunday, May 16, 2004 4:51 PM >To: expat-discuss@libexpat.org >Subject: [Expat-discuss] junk after document element at line 2053 > > > > >junk after document element at line 2053, column 0, byte 107114 at >/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser.pm >line 185 > > >I get the above where the first xml document ends and the next begins. > >I am trying to parse the file with perl XML::Parser > >I want the parser to simply keep going past the first document and onto >the second... > >Could I just wrap the whole file in XML document tags? > >Sorry for my ignorance, but how can I do this? > >Suppose file1, file2 and file3 all contain multiple concatenated XML >documents, how do I create a fourth file (file4) to 'pull in' file[1-3] ? > >This sounds familiar, but I have ~ zero XML experience. > >Thanks for any suggestions, > >Dan. > > >_______________________________________________ >Expat-discuss mailing list >Expat-discuss@libexpat.org >http://mail.libexpat.org/mailman/listinfo/expat-discuss > > > >_______________________________________________ >Expat-discuss mailing list >Expat-discuss@libexpat.org >http://mail.libexpat.org/mailman/listinfo/expat-discuss > From Greg.Martin at TELUS.COM Tue May 18 10:02:09 2004 From: Greg.Martin at TELUS.COM (Greg Martin) Date: Tue May 18 10:10:41 2004 Subject: [Expat-discuss] junk after document element at line 2053 Message-ID: I haven't seen anything in the C API that would allow for ignoring well-formedness. It would seem unlikely that a parser would allow something like that since the spec says : "Validating and non-validating processors alike MUST report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read." (see : http://www.w3.org/TR/REC-xml/ ) I suppose it could be argued that all it says is that violations must be reported - it doesn't say parsing has to fail ... In the C API there is a newer function call XML_ParserReset which will allow the reuse of a parser. The header says that "All handlers are cleared from the parser, except for the unknownEncodingHandler" (see : expat.h) so you would need to re-register your handlers but you wouldn't have the overhead of creating a new parser for each file. -----Original Message----- From: Dan Bolser [mailto:dmb@mrc-dunn.cam.ac.uk] Sent: Tuesday, May 18, 2004 2:56 AM To: Greg Martin Cc: expat-discuss@libexpat.org Subject: RE: [Expat-discuss] junk after document element at line 2053 On Mon, 17 May 2004, Greg Martin wrote: >A well-formed XML document has only one top level tag (as you've >discovered). I think that you can only have a prolog at the beginning of >a document (which would probably justify the name prolog) which would >mean that if you wrapped three files in a top-level tag and any had >prolog's it probably wouldn't be well-formed either. If there was the >possibility of any of the files having a prolog you might be better off >to instantiate a new parser for each file. Yup, I found this out too... (I guess by prolog you mean something like:- Sadly this occurs for every XML document in the file, and makes the parser unhappy even when I wrap the whole file in a top level tag. In the end I stripped out all the lines like the above from the file (from 1000's of individual XML documents), then I did somthing like cat "" multi_xml_document_files_(with_prologs_removed) "" | my_xml_parser.plx Except that exact syntax won't work, but you get the idea. How could I request some XML::Parser options to make its checking less strict? Is this a bad road to go down? Thanks very much, Dan. > >-----Original Message----- >From: expat-discuss-bounces@libexpat.org >[mailto:expat-discuss-bounces@libexpat.org]On Behalf Of Dan Bolser >Sent: Sunday, May 16, 2004 4:51 PM >To: expat-discuss@libexpat.org >Subject: [Expat-discuss] junk after document element at line 2053 > > > > >junk after document element at line 2053, column 0, byte 107114 at >/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser.pm >line 185 > > >I get the above where the first xml document ends and the next begins. > >I am trying to parse the file with perl XML::Parser > >I want the parser to simply keep going past the first document and onto >the second... > >Could I just wrap the whole file in XML document tags? > >Sorry for my ignorance, but how can I do this? > >Suppose file1, file2 and file3 all contain multiple concatenated XML >documents, how do I create a fourth file (file4) to 'pull in' file[1-3] ? > >This sounds familiar, but I have ~ zero XML experience. > >Thanks for any suggestions, > >Dan. > > >_______________________________________________ >Expat-discuss mailing list >Expat-discuss@libexpat.org >http://mail.libexpat.org/mailman/listinfo/expat-discuss > > > >_______________________________________________ >Expat-discuss mailing list >Expat-discuss@libexpat.org >http://mail.libexpat.org/mailman/listinfo/expat-discuss > From dmb at mrc-dunn.cam.ac.uk Tue May 18 10:33:57 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Tue May 18 10:20:24 2004 Subject: [Expat-discuss] junk after document element at line 2053 In-Reply-To: Message-ID: On Tue, 18 May 2004, Greg Martin wrote: >I haven't seen anything in the C API that would allow for ignoring >well-formedness. It would seem unlikely that a parser would allow >something like that since the spec says : "Validating and non-validating >processors alike MUST report violations of this specification's >well-formedness constraints in the content of the document entity and >any other parsed entities that they read." (see : >http://www.w3.org/TR/REC-xml/ ) > >I suppose it could be argued that all it says is that violations must be >reported - it doesn't say parsing has to fail ... In the C API there is >a newer function call XML_ParserReset which will allow the reuse of a >parser. The header says that "All handlers are cleared from the parser, >except for the unknownEncodingHandler" (see : expat.h) so you would need >to re-register your handlers but you wouldn't have the overhead of >creating a new parser for each file. The real snag is the multiple xml documents in each file (or is that what you mean). It would be nice to be able to set a 'severity' switch, so the parser keeps on going regardless. One other thing, I often have to deal with character lines being arbitarily broken over multiple character event calls (even when each string is very short). Is there any way to reset the internal character thingie to ensure this dosn't happen? Else I just use the reworked code I have, building up charater data as it comes and processing on close tag events. Cheers, Dan. > > > >-----Original Message----- >From: Dan Bolser [mailto:dmb@mrc-dunn.cam.ac.uk] >Sent: Tuesday, May 18, 2004 2:56 AM >To: Greg Martin >Cc: expat-discuss@libexpat.org >Subject: RE: [Expat-discuss] junk after document element at line 2053 > > >On Mon, 17 May 2004, Greg Martin wrote: > >>A well-formed XML document has only one top level tag (as you've >>discovered). I think that you can only have a prolog at the beginning of >>a document (which would probably justify the name prolog) which would >>mean that if you wrapped three files in a top-level tag and any had >>prolog's it probably wouldn't be well-formed either. If there was the >>possibility of any of the files having a prolog you might be better off >>to instantiate a new parser for each file. > >Yup, I found this out too... (I guess by prolog you mean something like:- > > > > >Sadly this occurs for every XML document in the file, and makes the parser >unhappy even when I wrap the whole file in a top level tag. > >In the end I stripped out all the lines like the above from the file >(from 1000's of individual XML documents), then I did somthing like > >cat "" multi_xml_document_files_(with_prologs_removed) "" | my_xml_parser.plx > >Except that exact syntax won't work, but you get the idea. > >How could I request some XML::Parser options to make its checking less >strict? Is this a bad road to go down? > >Thanks very much, >Dan. > >> >>-----Original Message----- >>From: expat-discuss-bounces@libexpat.org >>[mailto:expat-discuss-bounces@libexpat.org]On Behalf Of Dan Bolser >>Sent: Sunday, May 16, 2004 4:51 PM >>To: expat-discuss@libexpat.org >>Subject: [Expat-discuss] junk after document element at line 2053 >> >> >> >> >>junk after document element at line 2053, column 0, byte 107114 at >>/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser.pm >>line 185 >> >> >>I get the above where the first xml document ends and the next begins. >> >>I am trying to parse the file with perl XML::Parser >> >>I want the parser to simply keep going past the first document and onto >>the second... >> >>Could I just wrap the whole file in XML document tags? >> >>Sorry for my ignorance, but how can I do this? >> >>Suppose file1, file2 and file3 all contain multiple concatenated XML >>documents, how do I create a fourth file (file4) to 'pull in' file[1-3] ? >> >>This sounds familiar, but I have ~ zero XML experience. >> >>Thanks for any suggestions, >> >>Dan. >> >> >>_______________________________________________ >>Expat-discuss mailing list >>Expat-discuss@libexpat.org >>http://mail.libexpat.org/mailman/listinfo/expat-discuss >> >> >> >>_______________________________________________ >>Expat-discuss mailing list >>Expat-discuss@libexpat.org >>http://mail.libexpat.org/mailman/listinfo/expat-discuss >> > > > From karl at waclawek.net Tue May 18 10:32:13 2004 From: karl at waclawek.net (Karl Waclawek) Date: Tue May 18 10:32:20 2004 Subject: [Expat-discuss] junk after document element at line 2053 References: Message-ID: <002301c43ce4$e8ed4390$9e539696@citkwaclaww2k> ----- Original Message ----- From: "Dan Bolser" To: "Greg Martin" Cc: Sent: Tuesday, May 18, 2004 10:33 AM > The real snag is the multiple xml documents in each file (or is that what > you mean). It would be nice to be able to set a 'severity' switch, so the > parser keeps on going regardless. As Greg already stated, a conforming XML parser *must* report wll-formed ness errors, and in general, it is not possible to continue since a reasonable behaviour cannot always be defined. Examples: - How should the parser continue if it encounters a start tag, without an end tag? Should it ignore it? Should it read on past the parent element's end tag to see if it has been misplaced? - How should the parser deal with an extra '<' character in the character data stream? Is it the start of an element, or just a character? > One other thing, I often have to deal with character lines being > arbitarily broken over multiple character event calls (even when each > string is very short). Is there any way to reset the internal character > thingie to ensure this dosn't happen? Else I just use the reworked code I > have, building up charater data as it comes and processing on close tag > events. This is by design. Your current appraoch is the correct one: accumulate character data until the next end tag is encountered. Karl From dmb at mrc-dunn.cam.ac.uk Tue May 18 11:01:29 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Tue May 18 10:47:57 2004 Subject: [Expat-discuss] junk after document element at line 2053 In-Reply-To: <002301c43ce4$e8ed4390$9e539696@citkwaclaww2k> Message-ID: On Tue, 18 May 2004, Karl Waclawek wrote: > >----- Original Message ----- >From: "Dan Bolser" >To: "Greg Martin" >Cc: >Sent: Tuesday, May 18, 2004 10:33 AM > >> The real snag is the multiple xml documents in each file (or is that what >> you mean). It would be nice to be able to set a 'severity' switch, so the >> parser keeps on going regardless. > >As Greg already stated, a conforming XML parser *must* report >wll-formed ness errors, and in general, it is not possible >to continue since a reasonable behaviour cannot always be defined. >Examples: >- How should the parser continue if it encounters a start tag, > without an end tag? Should it ignore it? Should it read on past the > parent element's end tag to see if it has been misplaced? >- How should the parser deal with an extra '<' character in the character > data stream? Is it the start of an element, or just a character? I agree both these cases should not be ignored, but finding another start tag after the 'final' end tag... why not have an option to just open it and continue parsing? It would save me wedging my data between two rather artificial and clumsy dummy 'outer' start and end tags. An option could be called, 'assume outer tags' or something. And discovering more 'prolog' data (I think it is called that), why not have an option to re initalize the parser with this new information, or just ignore it ... These could be reported as 'found more prolog, assuming new document' if the appropriate option were set. I know the above sounds a bit strange, but I think it is quite perlish (and therefore normal to a perl programmer) - Perhaps none of your mind though. > >> One other thing, I often have to deal with character lines being >> arbitarily broken over multiple character event calls (even when each >> string is very short). Is there any way to reset the internal character >> thingie to ensure this dosn't happen? Else I just use the reworked code I >> have, building up charater data as it comes and processing on close tag >> events. > >This is by design. Your current appraoch is the correct one: accumulate >character data until the next end tag is encountered. Thanks for the clarification. This one always trips me up! Cheers, Dan. >Karl > > >_______________________________________________ >Expat-discuss mailing list >Expat-discuss@libexpat.org >http://mail.libexpat.org/mailman/listinfo/expat-discuss > From karl at waclawek.net Tue May 18 10:58:26 2004 From: karl at waclawek.net (Karl Waclawek) Date: Tue May 18 10:58:30 2004 Subject: [Expat-discuss] junk after document element at line 2053 References: Message-ID: <004a01c43ce8$928e8730$9e539696@citkwaclaww2k> ----- Original Message ----- From: "Dan Bolser" To: "Karl Waclawek" Cc: Sent: Tuesday, May 18, 2004 11:01 AM > >As Greg already stated, a conforming XML parser *must* report > >wll-formed ness errors, and in general, it is not possible > >to continue since a reasonable behaviour cannot always be defined. > >Examples: > >- How should the parser continue if it encounters a start tag, > > without an end tag? Should it ignore it? Should it read on past the > > parent element's end tag to see if it has been misplaced? > >- How should the parser deal with an extra '<' character in the character > > data stream? Is it the start of an element, or just a character? > > I agree both these cases should not be ignored, but finding another start > tag after the 'final' end tag... why not have an option to just open it > and continue parsing? It would save me wedging my data between two rather > artificial and clumsy dummy 'outer' start and end tags. An option could be > called, 'assume outer tags' or something. In general, the parser won't know how to continue. To my knowledge there is not a single XML parser implementation that offers the ability to continue after a well-formedness error. > And discovering more 'prolog' data (I think it is called that), why not > have an option to re initalize the parser with this new information, or > just ignore it ... The parser can't do that, but you can - at least on the C-API level. All you need to know is where exactly in the input stream the document ends and a new one begins. Then simply feed the parser all input up to that point, then call XML_ParserReset() and continue feeding it more input until the document ends again, and so on. > These could be reported as 'found more prolog, assuming new document' if > the appropriate option were set. The problem is - *you* have to know where a document ends, the parser doesn't. Other than that it should be possible - see above. > I know the above sounds a bit strange, but I think it is quite perlish > (and therefore normal to a perl programmer) - Perhaps none of your mind > though. I know knothing about the Perl API for Expat, my suggestions are based on Expat itself (C-API). Karl From mirod at xmltwig.com Tue May 18 11:16:35 2004 From: mirod at xmltwig.com (Michel Rodriguez) Date: Tue May 18 11:18:19 2004 Subject: [Expat-discuss] junk after document element at line 2053 In-Reply-To: <004a01c43ce8$928e8730$9e539696@citkwaclaww2k> References: <004a01c43ce8$928e8730$9e539696@citkwaclaww2k> Message-ID: Hi, Coming late in the discussion... Is there any chance you could use the Stream_Delimiter option when you create the XML::Parser? >From the docs: ? Stream_Delimiter This is an Expat option. It takes a string value. When this string is found alone on a line while parsing from a stream, then the parse is ended as if it saw an end of file. The intended use is with a stream of xml documents in a MIME multi- part format. The string should not contain a trailing newline. This means you would have to insert the delimiter in the file, but this could be handled by piping the initial file through a simple Perl script that would substitute the prolog by the delimiter + prolog, as in (untested): my $MARK= "<>"; open( XML_IN, "perl -p -e's{(new( Stream_Delimiter => $MARK)->parse( \*XML_IN); Of course this has all the usual disadvantages of non-XML solutions: you assume that both $MARK and ' > -----Original Message----- > From: expat-discuss-bounces@libexpat.org > [mailto:expat-discuss-bounces@libexpat.org]On Behalf Of Dan Bolser > Sent: Tuesday, May 18, 2004 9:01 AM > To: Karl Waclawek > Cc: expat-discuss@libexpat.org > Subject: Re: [Expat-discuss] junk after document element at line 2053 > > > On Tue, 18 May 2004, Karl Waclawek wrote: > > > > >----- Original Message ----- > >From: "Dan Bolser" > >To: "Greg Martin" > >Cc: > >Sent: Tuesday, May 18, 2004 10:33 AM > > > >> The real snag is the multiple xml documents in each file > (or is that what > >> you mean). It would be nice to be able to set a 'severity' > switch, so the > >> parser keeps on going regardless. > > > >As Greg already stated, a conforming XML parser *must* report > >wll-formed ness errors, and in general, it is not possible > >to continue since a reasonable behaviour cannot always be defined. > >Examples: > >- How should the parser continue if it encounters a start tag, > > without an end tag? Should it ignore it? Should it read on past the > > parent element's end tag to see if it has been misplaced? > >- How should the parser deal with an extra '<' character in > the character > > data stream? Is it the start of an element, or just a character? > > I agree both these cases should not be ignored, but finding > another start > tag after the 'final' end tag... why not have an option to > just open it > and continue parsing? It would save me wedging my data > between two rather > artificial and clumsy dummy 'outer' start and end tags. An > option could be > called, 'assume outer tags' or something. > > And discovering more 'prolog' data (I think it is called > that), why not > have an option to re initalize the parser with this new > information, or > just ignore it ... > > These could be reported as 'found more prolog, assuming new > document' if > the appropriate option were set. > You're now talking about a multiple-document parser instead of a document parser :-) Really, if that is what you are looking for you can wrap the expat functions in your own and provide that functionality. I personally believe the reason for expat's longevity and popularity is that it is small, fast and reliable. It's certainly why I use it. It is easy to wrap it's functionality - I use C++ and have several wrappers I use. I don't use perl but it seems to me it "should" be simple to break your document as you read a toplevel closure and restart or reinstantiate the parser. Adding this functionality to expat might make it simple for those special cases but it would also impact the larger body of cases where it isn't desirable. From karl at waclawek.net Tue May 18 11:39:26 2004 From: karl at waclawek.net (Karl Waclawek) Date: Tue May 18 11:39:38 2004 Subject: [Expat-discuss] junk after document element at line 2053 References: Message-ID: <00a501c43cee$4d486690$9e539696@citkwaclaww2k> > functionality. I personally believe the reason for expat's > longevity and popularity is that it is small, fast and reliable. And quite conformant! Karl From dmb at mrc-dunn.cam.ac.uk Tue May 18 12:05:29 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Tue May 18 11:52:07 2004 Subject: [Expat-discuss] junk after document element at line 2053 In-Reply-To: <00a501c43cee$4d486690$9e539696@citkwaclaww2k> Message-ID: On Tue, 18 May 2004, Karl Waclawek wrote: > >> functionality. I personally believe the reason for expat's >> longevity and popularity is that it is small, fast and reliable. > >And quite conformant! :) OK guys, point taken! Thanks for the suggestions and discussions. In the end I preprocessed my 'multiple-document' to remove prolog and bunged a ...<\Start> around the whole stream. Thanks again, Dan. > >Karl > >_______________________________________________ >Expat-discuss mailing list >Expat-discuss@libexpat.org >http://mail.libexpat.org/mailman/listinfo/expat-discuss > From alokk at cybage.com Wed May 19 00:35:43 2004 From: alokk at cybage.com (alok kumar) Date: Wed May 19 00:40:04 2004 Subject: [Expat-discuss] my parser code Message-ID: <002c01c43d5a$bfc75840$d808a8c0@cybage> hi i m using expat parser to parse xml file, but when i display the output on the console its showing some boxes. I am working on Symbian 6.1.Here is the snipet of code... This is an example avaliable...... // HelloWorld.cpp // // Copyright (c) 2000 Symbian Ltd. All rights reserved. #include "CommonFramework.h" #include "expat.h" #include "f32file.h" #include "utf.h" #include "ExpatImpl.h" #include "myprinter.cpp" LOCAL_C void doExampleL() { CMyXML p(console); if (! p.Create()) { _LIT(ErrorText, "Couldn't allocate memory for parser\n"); console->Printf(ErrorText); User::Exit(-1); } RFs fs; User::LeaveIfError(fs.Connect()); _LIT(KFileName, "c:\\hello.xml"); RFile f; f.Open(fs, KFileName, EFileStreamText|EFileWrite|EFileShareAny); //TFileText ft; //ft.Set(f); TBuf8<256> fileBuf; CnvUtfConverter uC; p.EnableElementHandler(); int i=0; for (;;) { int done=0; int sif; int err; i++; err = f.Read(fileBuf); if(fileBuf.Length()==0) { done=1; } if (p.Parse((const char *)fileBuf.Ptr(), fileBuf.Length(), done) == XML_STATUS_ERROR) { _LIT(parseErr, "Error at line %d,\ncolumn %d"); console->Printf(parseErr, i-1,p.GetCurrentColumnNumber(), p.GetErrorCode()); } if (err!=KErrNone || done==1) break; } } //Myprinter.cpp //#include "CommonFramework.h" #include "expat.h" #include "f32file.h" #include "utf.h" #include "ExpatImpl.h" #include #include class CMyXML : public CExpatImpl { private: int depth; public: CConsoleBase* console; // write all your messages to this // Constructor CMyXML (CConsoleBase* c) { depth=0; console=c; } // Invoked by CExpatImpl after the parser is created void OnPostCreate () { // Enable all the event routines we want EnableStartElementHandler (); EnableEndElementHandler (); // Note: EnableElementHandler will do both start and end EnableCharacterDataHandler (false); } // Start element handler void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs) { TBuf<256> elementName; TBuf<256> attrName; TBuf<256> attrData; TBuf<256> elementData; int i,j; for (i=0; iPrintf(indent); } elementName.Copy((TUint16 *)pszName); console->Printf(elementName); for (i = 0; papszAttrs[i]; i += 2) { attrName.Copy((TUint16 *)papszAttrs[i]); attrData.Copy((TUint16 *)papszAttrs[i+1]); _LIT(eol, "\n"); console->Printf(eol); for (j=0; jPrintf(indent); } _LIT(indent, " "); console->Printf(indent); console->Printf(attrName); _LIT(sep, ": "); console->Printf(sep); console->Printf(attrData); } _LIT(eol, "\n"); console->Printf(eol); depth++; } // End element handler void OnEndElement (const XML_Char *pszName) { depth--; } }; I had included expat dll and lib in include directly. Can anyone can guide me through, it would be great help. Best Regards Alok Kumar From prashants at ness-gsg.com Wed May 19 06:57:00 2004 From: prashants at ness-gsg.com (Prashant Sharma) Date: Wed May 19 06:58:19 2004 Subject: [Expat-discuss] 64bit Expat Message-ID: <013a01c43d90$09257050$8d06010a@portal.com> Hi, i wanted to compile expat on windows 2003 using microsoft sdk for 64 bit, does any one know how it can be done, or even if i can get the binaries it would be good. Thanks, Prashant Sharma From Swapna at tatatel.co.in Wed May 19 07:18:39 2004 From: Swapna at tatatel.co.in (Swapna@tatatel.co.in) Date: Wed May 19 07:19:54 2004 Subject: [Expat-discuss] Installation of Simple C Expat Wrapper Message-ID: Hi, I have installed Expat and tested it with the example program given on Tru64 Unix. That works fine. I wanted to install and use the Simple C Expat Wrapper on tru64 Unix. I am having a problem in installing them. It says that it is unable to find the Expat library. I have checked all the environmental variables setting like PATH, LD_LIBRARY_PATH etc. They are fine. Is there a way out or am i missing out something while installing. Can anyone who has tried it out before please help me ??? Thanks & Regards Swapna Dasari From karl at waclawek.net Wed May 19 08:58:58 2004 From: karl at waclawek.net (Karl Waclawek) Date: Wed May 19 08:59:05 2004 Subject: [Expat-discuss] Re: my parser code References: <002c01c43d5a$bfc75840$d808a8c0@cybage> Message-ID: <001b01c43da1$0cdf0d30$9e539696@citkwaclaww2k> ----- Original Message ----- From: "alok kumar" To: Cc: "Karl Waclawek" ; "Greg Martin" Sent: Wednesday, May 19, 2004 12:35 AM > hi > i m using expat parser to parse xml file, but when i display the output on the console its showing some boxes. > I am working on Symbian 6.1.Here is the snipet of code... > This is an example avaliable...... I know nothing about Symbian, unfortunately. Could it be that the console cannot show Unicode characters? Could it be that you are using the wrong type of Dll (libexpat.dll for UTF-8, libexpatw.dll for UTF-16)? Karl From karl at waclawek.net Wed May 19 09:00:46 2004 From: karl at waclawek.net (Karl Waclawek) Date: Wed May 19 09:00:54 2004 Subject: [Expat-discuss] 64bit Expat References: <013a01c43d90$09257050$8d06010a@portal.com> Message-ID: <002301c43da1$4d06d960$9e539696@citkwaclaww2k> ----- Original Message ----- From: "Prashant Sharma" To: Sent: Wednesday, May 19, 2004 6:57 AM > Hi, > i wanted to compile expat on windows 2003 using microsoft sdk for 64 bit, > does any one know how it can be done, > or even if i can get the binaries it would be good. What are your problems? Karl From prashants at ness-gsg.com Wed May 19 09:08:09 2004 From: prashants at ness-gsg.com (Prashant Sharma) Date: Wed May 19 09:09:24 2004 Subject: [Expat-discuss] 64bit Expat References: <013a01c43d90$09257050$8d06010a@portal.com> <002301c43da1$4d06d9 60$9e539696@citkwaclaww2k> Message-ID: <014901c43da2$588c6a10$8d06010a@portal.com> Hi, I was not able to build it, i'm using platform SDK, there is no makefile, i used vc6 to generate the make file, but when compiled with nmake it was giving compilation errors. Microsoft (R) Program Maintenance Utility Version 7.10.2240.8 Copyright (C) Microsoft Corporation. All rights reserved. bscmake.exe @C:\DOCUME~1\Psharma\LOCALS~1\Temp\nmB2D.tmp 'bscmake.exe' is not recognized as an internal or external command, operable program or batch file. NMAKE : fatal error U1077: 'bscmake.exe' : return code '0x1' Stop. Thanks, Prashant ----- Original Message ----- From: "Karl Waclawek" To: Sent: Wednesday, May 19, 2004 6:30 PM Subject: Re: [Expat-discuss] 64bit Expat > > ----- Original Message ----- > From: "Prashant Sharma" > To: > Sent: Wednesday, May 19, 2004 6:57 AM > > > > Hi, > > i wanted to compile expat on windows 2003 using microsoft sdk for 64 bit, > > does any one know how it can be done, > > or even if i can get the binaries it would be good. > > What are your problems? > > Karl > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss From karl at waclawek.net Wed May 19 09:26:26 2004 From: karl at waclawek.net (Karl Waclawek) Date: Wed May 19 09:26:38 2004 Subject: [Expat-discuss] 64bit Expat References: <013a01c43d90$09257050$8d06010a@portal.com> <002301c43da1$4d06d960$9e539696@citkwaclaww2k> <014901c43da2$588c6a10$8d06010a@portal.com> Message-ID: <003701c43da4$e2eab520$9e539696@citkwaclaww2k> ----- Original Message ----- From: "Prashant Sharma" To: "Karl Waclawek" ; Sent: Wednesday, May 19, 2004 9:08 AM > Hi, > I was not able to build it, > i'm using platform SDK, there is no makefile, i used vc6 to generate the > make file, but when compiled with nmake it was giving compilation errors. > Microsoft (R) Program Maintenance Utility Version 7.10.2240.8 > Copyright (C) Microsoft Corporation. All rights reserved. > > bscmake.exe @C:\DOCUME~1\Psharma\LOCALS~1\Temp\nmB2D.tmp > 'bscmake.exe' is not recognized as an internal or external command, > operable program or batch file. > NMAKE : fatal error U1077: 'bscmake.exe' : return code '0x1' > Stop. I know nothing about the 64bit SDK, but isn't there a batch file to run (like VCVARS32.bat for VC++ 6.0) that sets up the environment properly? Karl From tinskip at widevine.com Fri May 21 15:16:37 2004 From: tinskip at widevine.com (=?ISO-8859-1?Q?Thom=E1s_Inskip?=) Date: Fri May 21 15:16:57 2004 Subject: [Expat-discuss] Text data handler Message-ID: <61925D5C-AB5B-11D8-AD56-0003937578E2@widevine.com> I am new to Expat (been using it for a few hours), and have run into a snag. I am hoping someone can tell me what I am doing wrong. I am trying to parse some pretty simple ascii-encoded XML. I have set my element start and end handlers via XML_SetElementHandler. Those get called just fine. I have also set the character data handler via XML_SetCharacterDataHandler, which is not getting called. A sample xml string which would cause the failure would be : some text I would have expected my character data handler to be called with "some text" when parsing the xml stream in question, but it is not. Am I using the wrong handler? Is there another way to accomplish this. Any help would be appreciated. Oh, I am using Expat 1.95.7 compiled with gcc 2.96 From karl at waclawek.net Fri May 21 15:23:36 2004 From: karl at waclawek.net (Karl Waclawek) Date: Fri May 21 15:23:49 2004 Subject: [Expat-discuss] Text data handler References: <61925D5C-AB5B-11D8-AD56-0003937578E2@widevine.com> Message-ID: <007401c43f69$1d2b9d10$9e539696@citkwaclaww2k> ----- Original Message ----- From: "Thom?s Inskip" To: Sent: Friday, May 21, 2004 3:16 PM > I am new to Expat (been using it for a few hours), and have run into a > snag. I am hoping someone can tell me what I am doing wrong. > > I am trying to parse some pretty simple ascii-encoded XML. I have set > my element start and end handlers via XML_SetElementHandler. Those get > called just fine. I have also set the character data handler via > XML_SetCharacterDataHandler, which is not getting called. A sample xml > string which would cause the failure would be : > some text > I would have expected my character data handler to be called with "some > text" when parsing the xml stream in question, but it is not. Am I > using the wrong handler? Is there another way to accomplish this. Any > help would be appreciated. The way you describe it it should work. Why don't you post (a short relevant piece of) your code? Karl From tinskip at widevine.com Fri May 21 15:44:57 2004 From: tinskip at widevine.com (=?ISO-8859-1?Q?Thom=E1s_Inskip?=) Date: Fri May 21 15:45:44 2004 Subject: [Expat-discuss] Text data handler In-Reply-To: <007401c43f69$1d2b9d10$9e539696@citkwaclaww2k> References: <61925D5C-AB5B-11D8-AD56-0003937578E2@widevine.com> <007401c43f69$1d2b9d10$9e539696@citkwaclaww2k> Message-ID: <56D6C3E8-AB5F-11D8-AD56-0003937578E2@widevine.com> Sure thing. // creates parser, sets handler XMLConnectionEncoder::XMLConnectionEncoder(EncodedConnection &connection) : ConnectionEncoder(connection) { mParser = XML_ParserCreate(NULL); XML_SetUserData(mParser, this); XML_SetElementHandler(mParser, XMLConnectionEncoder::Element_Start, XMLConnectionEncoder::Element_End); XML_SetCharacterDataHandler(mParser, XMLConnectionEncoder::Element_Text); } void XMLConnectionEncoder::Decode(EncodedConnectionBuffer &source) { XML_Parse(mParser, &source[0], source.size(), false); } void XMLCALL XMLConnectionEncoder::Element_Start(void *userData, const XML_Char *name, const XML_Char **attributes) { XMLConnectionEncoder *encoder = (XMLConnectionEncoder *)userData; const XML_Char **attrPtr = attributes; EM_Element *theElement = new EM_Element; theElement->SetName(name); while (*attrPtr != NULL) { theElement->AddAttribute(attrPtr[0], attrPtr[1]); attrPtr += 2; } encoder->mStack.push_back(theElement); encoder->mText.clear(); } void XMLCALL XMLConnectionEncoder::Element_End(void *userData, const XML_Char *name) { XMLConnectionEncoder *encoder = (XMLConnectionEncoder *)userData; if (encoder->mStack.size() > 0) { EM_Element *theElement = encoder->mStack.back(); encoder->mStack.pop_back(); if (encoder->mText.size() > 0) theElement->SetValue(encoder->mText); if (encoder->mStack.size() > 0) { encoder->mStack.back()->AddElement(theElement); delete theElement; } else encoder->Receive(theElement); } else { // Do someething else here Dprintf("Mismatched calls to XMLConnectionEndoder::Element_Start and XMLConnectionEndoder::Element_End"); } } void XMLCALL XMLConnectionEncoder::Element_Text(void *userData, const XML_Char *text, int len) { XMLConnectionEncoder *encoder = (XMLConnectionEncoder *)userData; encoder->mText.append(text, len); } That's all of it, actually. The last three member functions of XMLConnectionEncoder (Element_*) are static. An example of the xml being parsed: 3c 45 6e 63 72 79 70 74 41 73 73 65 74 20 53 65 < 49 6e 70 75 74 46 69 6c 65 3e 2f 68 6f 6d 65 2f InputFile>/home/ 74 6f 6d 61 73 2f 6d 70 65 67 2f 63 6e 6e 2e 6d tomas/mpeg/cnn.m 70 67 3c 2f 49 6e 70 75 74 46 69 6c 65 3e 3c 4f pg/var/t 6d 70 2f 30 30 30 30 30 30 30 30 32 30 2e 6d 70 mp/0000000020.mp 67 3c 2f 4f 75 74 70 75 74 46 69 6c 65 3e 3c 41 g > ----- Original Message ----- > From: "Thom?s Inskip" > To: > Sent: Friday, May 21, 2004 3:16 PM > > >> I am new to Expat (been using it for a few hours), and have run into a >> snag. I am hoping someone can tell me what I am doing wrong. >> >> I am trying to parse some pretty simple ascii-encoded XML. I have set >> my element start and end handlers via XML_SetElementHandler. Those >> get >> called just fine. I have also set the character data handler via >> XML_SetCharacterDataHandler, which is not getting called. A sample >> xml >> string which would cause the failure would be : >> some text >> I would have expected my character data handler to be called with >> "some >> text" when parsing the xml stream in question, but it is not. Am I >> using the wrong handler? Is there another way to accomplish this. >> Any >> help would be appreciated. > > The way you describe it it should work. > Why don't you post (a short relevant piece of) your code? > > Karl > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss From karl at waclawek.net Fri May 21 15:58:39 2004 From: karl at waclawek.net (Karl Waclawek) Date: Fri May 21 15:58:48 2004 Subject: [Expat-discuss] Text data handler References: <61925D5C-AB5B-11D8-AD56-0003937578E2@widevine.com> <007401c43f69$1d2b9d10$9e539696@citkwaclaww2k> <56D6C3E8-AB5F-11D8-AD56-0003937578E2@widevine.com> Message-ID: <008d01c43f6e$02f687c0$9e539696@citkwaclaww2k> ----- Original Message ----- From: "Thom?s Inskip" To: "Karl Waclawek" Cc: Sent: Friday, May 21, 2004 3:44 PM > Sure thing. I don't see anything wrong with your code. Are you sure that XMLConnectionEncoder::Element_Text doesn't get called? Or is there just no text appended? Karl From tinskip at widevine.com Fri May 21 16:29:06 2004 From: tinskip at widevine.com (=?ISO-8859-1?Q?Thom=E1s_Inskip?=) Date: Fri May 21 16:29:21 2004 Subject: [Expat-discuss] Text data handler In-Reply-To: <008d01c43f6e$02f687c0$9e539696@citkwaclaww2k> References: <61925D5C-AB5B-11D8-AD56-0003937578E2@widevine.com> <007401c43f69$1d2b9d10$9e539696@citkwaclaww2k> <56D6C3E8-AB5F-11D8-AD56-0003937578E2@widevine.com> <008d01c43f6e$02f687c0$9e539696@citkwaclaww2k> Message-ID: <81AABA80-AB65-11D8-AD56-0003937578E2@widevine.com> It turns out the problem is very different, and I had made an invalid assumption. There are two "outermost" xml elements being parsed, one after the other. The first one contains no element data (only attributes), but the second one does. So I am dealing with something that looks like: some text But without the linefeeds. I had assumed that the problem was that my text callback handler was the problem because I already knew that my element start and end handlers were being called (for "a" and its sub-elements). The manual says something about needing a separate parser for each "document". What is the exact definition of a document? Is a document a data stream, or is the above two separate documents because there are two top-level elements (one defined by element a, and the other one by element d)? Perhaps Expat was not meant to do what I am asking it to do. On May 21, 2004, at 3:58 PM, Karl Waclawek wrote: > > ----- Original Message ----- > From: "Thom?s Inskip" > To: "Karl Waclawek" > Cc: > Sent: Friday, May 21, 2004 3:44 PM > > >> Sure thing. > > > > I don't see anything wrong with your code. > Are you sure that XMLConnectionEncoder::Element_Text > doesn't get called? Or is there just no text appended? > > Karl > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss From tinskip at widevine.com Fri May 21 16:50:06 2004 From: tinskip at widevine.com (=?ISO-8859-1?Q?Thom=E1s_Inskip?=) Date: Fri May 21 16:50:56 2004 Subject: [Expat-discuss] Text data handler In-Reply-To: <81AABA80-AB65-11D8-AD56-0003937578E2@widevine.com> References: <61925D5C-AB5B-11D8-AD56-0003937578E2@widevine.com> <007401c43f69$1d2b9d10$9e539696@citkwaclaww2k> <56D6C3E8-AB5F-11D8-AD56-0003937578E2@widevine.com> <008d01c43f6e$02f687c0$9e539696@citkwaclaww2k> <81AABA80-AB65-11D8-AD56-0003937578E2@widevine.com> Message-ID: <70DC7A24-AB68-11D8-AD56-0003937578E2@widevine.com> Yep. That was it. A misunderstanding of what constitutes a "document". I got around the problem by priming the parser with a fake document start tag. Not a pretty fix, but after that it did what I expected it to. Thanks for your help. On May 21, 2004, at 4:29 PM, Thom?s Inskip wrote: > It turns out the problem is very different, and I had made an invalid > assumption. There are two "outermost" xml elements being parsed, one > after the other. The first one contains no element data (only > attributes), but the second one does. So I am dealing with something > that looks like: > > > > > > some text > > > But without the linefeeds. I had assumed that the problem was that my > text callback handler was the problem because I already knew that my > element start and end handlers were being called (for "a" and its > sub-elements). > > The manual says something about needing a separate parser for each > "document". What is the exact definition of a document? Is a > document a data stream, or is the above two separate documents because > there are two top-level elements (one defined by element a, and the > other one by element d)? Perhaps Expat was not meant to do what I am > asking it to do. From Greg.Martin at TELUS.COM Fri May 21 16:45:14 2004 From: Greg.Martin at TELUS.COM (Greg Martin) Date: Fri May 21 17:21:29 2004 Subject: [Expat-discuss] Text data handler Message-ID: > -----Original Message----- > From: expat-discuss-bounces@libexpat.org > [mailto:expat-discuss-bounces@libexpat.org]On Behalf Of Thom?s Inskip > Sent: Friday, May 21, 2004 2:29 PM > To: Karl Waclawek > Cc: expat-discuss@libexpat.org > Subject: Re: [Expat-discuss] Text data handler > > > It turns out the problem is very different, and I had made an invalid > assumption. There are two "outermost" xml elements being parsed, one > after the other. The first one contains no element data (only > attributes), but the second one does. So I am dealing with something > that looks like: > > > > > > some text > > > But without the linefeeds. I had assumed that the problem > was that my > text callback handler was the problem because I already knew that my > element start and end handlers were being called (for "a" and its > sub-elements). > > The manual says something about needing a separate parser for each > "document". What is the exact definition of a document? Is > a document > a data stream, or is the above two separate documents because > there are > two top-level elements (one defined by element a, and the > other one by > element d)? Perhaps Expat was not meant to do what I am asking it to > do. > A couple of comments. First to your question ... a document can only have one top-level element. You should check the return value of XML_Parse for failure (I suspect it would have returned an error in your case) and if you are passing all your data in one pass the final parameter of XML_Parse should be true to indicate that it is your final call (or you should make a final call to XML_Parse of length 0 and a fourth parameter == true). From lshen at cisco.com Fri May 21 18:18:11 2004 From: lshen at cisco.com (Shen, Lin) Date: Fri May 21 18:18:15 2004 Subject: [Expat-discuss] Text data handler Message-ID: <6677B3346233B94EBB11C060935101202E34F9@vtg-um-e2k1.sj21ad.cisco.com> I want to parse multiple documents one by one. What I'm doing right now is to call XML_ParseCreate() once at the very beginning, and then call XML_Parse() for each document. I've tried setting the 4th parameter to both FALSE and TRUE, but always get "junk after document element at line" error when the second document gets fed into the parser. Thanks Lin shen Cisco Systems > -----Original Message----- > From: expat-discuss-bounces@libexpat.org > [mailto:expat-discuss-bounces@libexpat.org] On Behalf Of Greg Martin > Sent: Friday, May 21, 2004 1:45 PM > To: expat-discuss@libexpat.org > Subject: RE: [Expat-discuss] Text data handler > > > > > > -----Original Message----- > > From: expat-discuss-bounces@libexpat.org > > [mailto:expat-discuss-bounces@libexpat.org]On Behalf Of > Thom?s Inskip > > Sent: Friday, May 21, 2004 2:29 PM > > To: Karl Waclawek > > Cc: expat-discuss@libexpat.org > > Subject: Re: [Expat-discuss] Text data handler > > > > > > It turns out the problem is very different, and I had made > an invalid > > assumption. There are two "outermost" xml elements being > parsed, one > > after the other. The first one contains no element data (only > > attributes), but the second one does. So I am dealing with > something > > that looks like: > > > > > > > > > > > > some text > > > > > > But without the linefeeds. I had assumed that the problem > > was that my > > text callback handler was the problem because I already > knew that my > > element start and end handlers were being called (for "a" and its > > sub-elements). > > > > The manual says something about needing a separate parser for each > > "document". What is the exact definition of a document? Is > > a document > > a data stream, or is the above two separate documents because > > there are > > two top-level elements (one defined by element a, and the > > other one by > > element d)? Perhaps Expat was not meant to do what I am > asking it to > > do. > > > > A couple of comments. First to your question ... a document > can only have > one top-level element. You should check the return value of > XML_Parse for > failure (I suspect it would have returned an error in your > case) and if > you are passing all your data in one pass the final parameter > of XML_Parse > should be true to indicate that it is your final call (or you > should make a > final call to XML_Parse of length 0 and a fourth parameter == true). > > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > From Greg.Martin at TELUS.COM Fri May 21 18:24:12 2004 From: Greg.Martin at TELUS.COM (Greg Martin) Date: Fri May 21 18:24:41 2004 Subject: [Expat-discuss] Text data handler Message-ID: > -----Original Message----- > From: Shen, Lin [mailto:lshen@cisco.com] > Sent: Friday, May 21, 2004 4:18 PM > To: Greg Martin; expat-discuss@libexpat.org > Subject: RE: [Expat-discuss] Text data handler > > > I want to parse multiple documents one by one. > What I'm doing right now is to call XML_ParseCreate() once at > the very beginning, and then call XML_Parse() for each > document. I've tried setting the 4th parameter to both FALSE > and TRUE, but always get "junk after document element at > line" error when the second document gets fed into the parser. > Right, you need to call XML_ParserReset and then re-register your handlers before calling XML_Parse again. You can call XML_Parse as many times as you want on a single document for the same parser but must re-initialise the parser before starting a new document. From lshen at cisco.com Fri May 21 18:33:34 2004 From: lshen at cisco.com (Shen, Lin) Date: Fri May 21 18:33:37 2004 Subject: [Expat-discuss] Text data handler Message-ID: <6677B3346233B94EBB11C0609351012001FC8B7C@vtg-um-e2k1.sj21ad.cisco.com> What's the difference between resetting a parser and destroying and re-creating a parser for parsing a different document? I guess it's mainly performance. Will a parser context be lost when it's reset? Lin shen Cisco Systems > -----Original Message----- > From: expat-discuss-bounces@libexpat.org > [mailto:expat-discuss-bounces@libexpat.org] On Behalf Of Greg Martin > Sent: Friday, May 21, 2004 3:24 PM > To: expat-discuss@libexpat.org > Subject: RE: [Expat-discuss] Text data handler > > > > > > -----Original Message----- > > From: Shen, Lin [mailto:lshen@cisco.com] > > Sent: Friday, May 21, 2004 4:18 PM > > To: Greg Martin; expat-discuss@libexpat.org > > Subject: RE: [Expat-discuss] Text data handler > > > > > > I want to parse multiple documents one by one. > > What I'm doing right now is to call XML_ParseCreate() once at > > the very beginning, and then call XML_Parse() for each > > document. I've tried setting the 4th parameter to both FALSE > > and TRUE, but always get "junk after document element at > > line" error when the second document gets fed into the parser. > > > > Right, you need to call XML_ParserReset and then re-register your > handlers before calling XML_Parse again. You can call XML_Parse as > many times as you want on a single document for the same parser but > must re-initialise the parser before starting a new document. > > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > From Greg.Martin at TELUS.COM Fri May 21 18:47:17 2004 From: Greg.Martin at TELUS.COM (Greg Martin) Date: Fri May 21 19:22:49 2004 Subject: [Expat-discuss] Text data handler Message-ID: > -----Original Message----- > From: Shen, Lin [mailto:lshen@cisco.com] > Sent: Friday, May 21, 2004 4:34 PM > To: Greg Martin; expat-discuss@libexpat.org > Subject: RE: [Expat-discuss] Text data handler > > > What's the difference between resetting a parser and destroying and > re-creating a parser for parsing a different document? I guess it's > mainly performance. Will a parser context be lost when it's reset? > It reduces memory allocation. According to the header file: all handlers are cleared from the parser, except for the unknownEncodingHandler. The parser's external state is re-initialized except for the values of ns and ns_triplets. From tinskip at widevine.com Fri May 21 21:27:44 2004 From: tinskip at widevine.com (=?ISO-8859-1?Q?Thom=E1s_Inskip?=) Date: Fri May 21 21:27:58 2004 Subject: [Expat-discuss] Document boundaries (was Re: Text data handler) In-Reply-To: References: Message-ID: <39AADF6E-AB8F-11D8-AD56-0003937578E2@widevine.com> >> > > Right, you need to call XML_ParserReset and then re-register your > handlers before calling XML_Parse again. You can call XML_Parse as > many times as you want on a single document for the same parser but > must re-initialise the parser before starting a new document. > > The thing is that I am implementing a pretty generic transaction-oriented communications protocol; requests go in one direction, and responses are sent back. Those transactions are encoded as XML. The transactions go in each direction in blocks of data, which may contain multiple transactions, or portions of a transaction. I'd rather not have to pre-parse the stream to figure out where each transaction (document) starts and ends before I pass it on to the parser. Is it possible to call XML_ParserReset from within a handler (such as and end element handler)? Probably not a good idea, huh? If I could then I would just call it when I reach the end of the top-level element (document). What I've done for now is just prime the parser with "" so that all of the transactions are considered to be subelements of "Document". What I worry about is this: if there is some screwy XML in the stream, the parser may never recover and I won't be able to parse past the error point, rendering any further transactions binary waste. How good is Expat at recovering from errors? I couldn't find any info to that regards. From karl at waclawek.net Fri May 21 23:59:08 2004 From: karl at waclawek.net (Karl Waclawek) Date: Fri May 21 23:59:06 2004 Subject: [Expat-discuss] Document boundaries (was Re: Text data handler) References: <39AADF6E-AB8F-11D8-AD56-0003937578E2@widevine.com> Message-ID: <001f01c43fb1$227cebf0$0200a8c0@karlglen188> ----- Original Message ----- From: "Thom?s Inskip" To: "Greg Martin" Cc: Sent: Friday, May 21, 2004 9:27 PM > >> > > > > Right, you need to call XML_ParserReset and then re-register your > > handlers before calling XML_Parse again. You can call XML_Parse as > > many times as you want on a single document for the same parser but > > must re-initialise the parser before starting a new document. > > > > > The thing is that I am implementing a pretty generic > transaction-oriented communications protocol; requests go in one > direction, and responses are sent back. Those transactions are encoded > as XML. The transactions go in each direction in blocks of data, which > may contain multiple transactions, or portions of a transaction. I'd > rather not have to pre-parse the stream to figure out where each > transaction (document) starts and ends before I pass it on to the > parser. I don't know of a parser that can handle that. You pretty much have to tell the parser where the document ends, as they are all geared towards processing one document only. Also, consider this: as the parser is reading past the end tag it is still legal to have comments, processing instructions and whitespace. So, unless the parser encounters anything illegal it will consider everything part of the document until it sees - let's say - the XML declaration of the next document. However, nothing told the parser that this is where the next document starts - so it will evaluate it from the point of view of having another XML declaration after the end tag, which is illegal. This means, start and end of a document has to be determined outside of the data stream. > Is it possible to call XML_ParserReset from within a handler (such as > and end element handler)? Probably not a good idea, huh? If I could > then I would just call it when I reach the end of the top-level element > (document). I think that might give you an access violation. > What I've done for now is just prime the parser with "" so > that all of the transactions are considered to be subelements of > "Document". What I worry about is this: if there is some screwy XML in > the stream, the parser may never recover and I won't be able to parse > past the error point, rendering any further transactions binary waste. > How good is Expat at recovering from errors? I couldn't find any info > to that regards. Expat does not recover from well-formednes violations. These are fatal errors. IMO, your best bet is to have separators in the data stream (like null characters), and scan for them to detect the end of document. Then submit each chunk between separators as a separate document, resetting the parser in between. Karl From yingkw at 163.com Sun May 23 03:52:42 2004 From: yingkw at 163.com (=?gb2312?B?zfXTqr+1?=) Date: Sun May 23 12:06:11 2004 Subject: [Expat-discuss] about xml parsing! Message-ID: <002901c4409a$fa15a400$bfc71cde@WYK> aGkhDQoNCkknbSBkZXZlbG9waW5nIHNvbWUgYXBwbGljYXRpb24gb24gTk9LSUEncyBzZXJpZXMg NjAgcGxhdGZvcm0gd2l0aCBWaXN1YWwgQysrLiBJIGNob29zZSBleHBhdCBhcyB0aGUgWE1MIHBh ZXNlciBpbiBteSBhcHBsaWNhdGlvbi4NCkJ1dCBub3cgaSBtZWV0IG1hbnkgZGlmZmljdWx0eS4g Q2FuIHUgdGVsbCBtZSBob3cgY2FuIGkgaW50ZWdyYXRlIHRoZSBleHBhdCB0byB0aGUgU3ltYmlh biA2MCBhbmQgU2VyaWVzIDYwJ3MgU0RLIGZvciBWaXN1YWwgQysrLg0KQ2FuIHUgZ2l2ZSBhbnkg ZXhhbXBsZXMgYWJvdXQgWE1MIGZpbGVzJyBwYXJzaW5nIGFuZCBkaXNwbGF5IHRoZSBkYXRhIGkg bmVlZCBpbiB0aGUgc2NyZWVuPw0KVGhhbmsgdSENCldhaXRpbmcgdXIgcmVzcG9uZCE6RA== From karl at waclawek.net Sun May 23 13:34:38 2004 From: karl at waclawek.net (Karl Waclawek) Date: Sun May 23 13:34:40 2004 Subject: [Expat-discuss] about xml parsing! References: <002901c4409a$fa15a400$bfc71cde@WYK> Message-ID: <001201c440ec$39363590$0300a8c0@karlglen188> > I'm developing some application on NOKIA's series 60 platform with Visual C++. I choose expat as the XML paeser in my application. > But now i meet many difficulty. Can u tell me how can i integrate the expat to the Symbian 60 and Series 60's SDK for Visual C++. > Can u give any examples about XML files' parsing and display the data i need in the screen? > Thank u! > Waiting ur respond!:D The expat distribution contains a few examples that show you how to parse an XML file. Thes examples are called elements, outline and xmlwf. Easy to build under MS VC++ on Windows. Unfortunately I can't help you with Symbian, as I know nothing about it. Karl From Greg.Martin at TELUS.COM Tue May 25 10:00:01 2004 From: Greg.Martin at TELUS.COM (Greg Martin) Date: Tue May 25 10:00:30 2004 Subject: [Expat-discuss] RE: Document boundaries (was Re: Text data handler) Message-ID: > -----Original Message----- > From: Thom?s Inskip [mailto:tinskip@widevine.com] > Sent: Friday, May 21, 2004 7:28 PM > To: Greg Martin > Cc: expat-discuss@libexpat.org > Subject: Document boundaries (was Re: Text data handler) > > > >> > > > > Right, you need to call XML_ParserReset and then re-register your > > handlers before calling XML_Parse again. You can call XML_Parse as > > many times as you want on a single document for the same parser but > > must re-initialise the parser before starting a new document. > > > > > The thing is that I am implementing a pretty generic > transaction-oriented communications protocol; requests go in one > direction, and responses are sent back. Those transactions > are encoded > as XML. The transactions go in each direction in blocks of > data, which > may contain multiple transactions, or portions of a transaction. I'd > rather not have to pre-parse the stream to figure out where each > transaction (document) starts and ends before I pass it on to the > parser. > Remember that you don't have to pass the entire document in at once - so you can parse on the fly without knowing in advance where the end of the document is and when you do find the end token (tag) indicate that you are at the final call and then reset the parser upon return. It does mean you have to scan the stream but you don't have to wait for a terminating tag before parsing. If expat returns a parsing error before you reach the last tag you will have to read until the tag in order to clean up the stream. Psuedo something like: while((sz = read(my_sock, buf, BUF_SIZE)) > 0) { int pos = scan_buffer_for_tag(buf, sz, end_tag, done); int parse_sz = pos == -1 ? sz : pos; int retval = XML_Parse(the_parser, buf, parse_sz, done); if(done && no_errors(retval)) { reset_parser(); done = false; pos = scan_buffer_for_tag(buf, sz, end_tag, done); parse_sz = pos == -1 ? sz : pos; retval = XML_Parse(the_parser, buf + parse_sz, sz - parse_sz, done); check_for_errors(retval); } else handle_errors(); } From m_biswas at mailinator.com Wed May 26 12:28:27 2004 From: m_biswas at mailinator.com (Mohun Biswas) Date: Wed May 26 12:53:02 2004 Subject: [Expat-discuss] how to build for Win32/MSVC? Message-ID: <40B4C5AB.9070305@mailinator.com> I'm porting a Unix application which uses Expat to Windows. To date I've followed the advice in the README "Windows users should use the expat_win32bin package" and thus am linking with the precompiled binary "libexpatMT.lib" from Expat 1.95.7. But I'm running into linker warning LNK4098 which refers to mismatched library configurations, and experiments reveal that it's the expat link that triggers it. Therefore I'd like to know if there's a published way of building expat myself for Windows using MSVC 6 or 7? The warning doesn't stop the program from working, and I gather I could suppress it with the right combination of /NODEFAULTLIBS flags, but I'd prefer to have a clean, warning- and hack-free build. Does anyone have a HOWTO or Makefile or similar? Thanks, Mohun Biswas From karl at waclawek.net Wed May 26 13:33:35 2004 From: karl at waclawek.net (Karl Waclawek) Date: Wed May 26 13:33:45 2004 Subject: [Expat-discuss] how to build for Win32/MSVC? References: <40B4C5AB.9070305@mailinator.com> Message-ID: <007601c44347$92a25e30$9e539696@citkwaclaww2k> ----- Original Message ----- From: "Mohun Biswas" To: Sent: Wednesday, May 26, 2004 12:28 PM > I'm porting a Unix application which uses Expat to Windows. To date I've > followed the advice in the README "Windows users should use the > expat_win32bin package" and thus am linking with the precompiled binary > "libexpatMT.lib" from Expat 1.95.7. Would you rather link against the Dll than the static library? > But I'm running into linker warning > LNK4098 which refers to mismatched library configurations, and > experiments reveal that it's the expat link that triggers it. Therefore > I'd like to know if there's a published way of building expat myself for > Windows using MSVC 6 or 7? Very simple. Just open the Workspace file expat.dsw in MS VC++ 6 and build the projects you want. > The warning doesn't stop the program from working, and I gather I could > suppress it with the right combination of /NODEFAULTLIBS flags, but I'd > prefer to have a clean, warning- and hack-free build. Does anyone have a > HOWTO or Makefile or similar? Not necessary. Building is GUI oriented and simple under MS VC++ 6/7. Karl