[Expat-discuss] XML_ERROR_JUNK_AFTER_DOC_ELEMENT - How to resolve

Nick MacDonald nickmacd at gmail.com
Tue Sep 19 02:50:56 CEST 2006


Terry:

The rules for XML are very clear... only one document per file.  eXpat
is very specific about enforcing the correctness of XML files. You are
trying to process multiple documents in one file (STDIN.)

You'd need to write some sort of a filter to create a new parser for
the start of each document.  If you could guarantee that every XML
file would start with the option <?xml version> tag then you would
have a basis for your filter.

Otherwise it might be much easier to extract your zip file into
multiple files in a temporary directory, and then clean up afterward,
although that would have issues with keeping files in the same order
as the zip file unless you have then named them in the order they
would be processed from the file system.

Of course, if you're clever, you might be able to look for the error
and know that its not a "real" error then ignore it and start a new
parser at the correct place inside your buffer.

Good luck on your project...

Nick


On 9/18/06, Terry Ebaugh <tebaugh at gmail.com> wrote:
> I've just started working with expat.  I have xml files that are gzipped.  I
> gzcat them and pipe them to my parser.
> I am getting the XML_ERROR_JUNK_AFTER_DOC_ELEMENT error message and I'm
> unsure how to resolve.  I was under the impression that it
> is caused by extracter character after a document root close tag.  I tried
> stripping the chars after that close tag but that doesnt seem to work.  Is
> this caused by a new document starting immediately after the first one has
> finished?
>
> Does anyone have any suggestions?
>
> Here is the error message and what was in the buffer:
>
> Parse error:file:1:row:4:column:0:reason:junk after document element
> BUFFER = nter></usage></dataSet></metrics>
>
> <?xml version='1.0' encoding='UTF-8'?>
> <metrics version="3.0" cr
>
>
>
> My main loop where I read stdin and call the parser is below:
>
> /***********************************************************************/
>   /* Read stdin                                                          */
>   /***********************************************************************/
>   for (;;) {
>     len = (int)fread(buff, 1, BUFFSIZE-1, stdin);
>     if (ferror(stdin)) {
>       fprintf(stderr,"Error reading stdin\n");
>       exit(-2);
>     }
>     done = feof(stdin);
>     //if nothing read then exit so AI doesnt blow up
>     if ((len == 0) && (done) && (cur_file_num==0))
>        break;
>
>    if(XML_Parse(p, buff, strlen(buff), done) == XML_STATUS_ERROR){
>           fprintf(stderr, "\nParse error at
> host:%s:file:%d:row:%d:column:%d:reason:%s\n",
>                  host, cur_file_num, XML_GetCurrentLineNumber(p),
>                  XML_GetCurrentColumnNumber(p),
> XML_ErrorString(XML_GetErrorCode(p)));
>           fprintf(stderr,"BUFFER = %s\n",buff);
>           exit(-3);
>      }
>     if(done)
>       break;
>     }
>
>     /* Free memory used by the parser */
>     if(p) {
>       XML_ParserFree(p);
>     }
>   return 0;
> }


More information about the Expat-discuss mailing list