[Expat-discuss] Bug? Illegal parameter reference error for valid document

Karl Waclawek karl@waclawek.net
Tue Nov 6 17:50:03 2001


> Karl,
> 
> I'm sorry if I was vague, but you misunderstood me.  It's not the size of the 
> buffer per se that matters, it's how much of the file you read in at a time and 
> send to the parser.  Let me see if I can make this clearer.

I think I understand you, but when I set the buffer size to 1, I automatically
read one byte at a time and send one byte at a time to the parser.

> 
> Let's say we're parsing "test.xml":
> 
>    <?xml version="1.0" standalone="no"?>
>    <!DOCTYPE test SYSTEM "test.dtd">
>    <thing>My name is &bob.</thing>
> 
> , "test.dtd":
>    <!ENTITY % TESTent SYSTEM "test.ent">
>    %TESTent;
>    <!ENTITY bob %myname;>
>    <!ELEMENT thing (#PCDATA)>
> 
> and "test.ent":
>    <!ENTITY % myname "&#x22;Bob&#x22;">
>    
<snip> some code </snip>
 
> You would think this should work.  When the external parser is told to parse the 
> file "test.dtd" it reads the entire file into the buffer and parses it.  It then 
> sees an external reference to the file "test.ent" and spawns off another 
> external parser to parse it.  However, when that parser returns and parsing of 
> "test.dtd" continues it discards the rest of the XML stored in the buffer and 
> reads in a new buffer (which is empty because the end of the file has been 
> reached).  This causes the entity declaration for 'bob' to go unparsed, and when 
> the main parser encounters the XML containing '&bob' it will generate an error.

Actually, your example above works for me. I replaced the dot after &bob. with
a semi-colon, renamed the DOCTYPE name to "thing", and got an error free parse
with a buffer size of 16KByte.

> 
> To get around this problem instead of reading the file in one buffer-sized chunk 
> at a time, read it in one line at a time, that way when the external parser for 
> "test.ent" is done the next buffer that will be parsed by the external parser 
> for "test.dtd" will be the line containing the entity declaration for 'bob'.
> 
> If we do this the external parsing loop will look something like this (again, 
> ignoring errors for clarity):
> 
>    // Parse until the file is done
>    while (!feof(ext))
>    {
>       // Read sizeof(buff) number of characters the file into BUFF 
>       // until a new-line character is read or EOF is encountered
>       // and store the number of characters successfuly read into LENGTH.
>       length = strlen(fgets(buff, sizeof(buff), ext));
>       // Parse the contents of BUFF.
>       XML_Parse(e, buff, length, length == 0);
>    }
>    
> One could also parse one character of the file at a time (possibly using 
> fgetc(3S)) and be able to catch all of the external references, but this would 
> probably be less efficient.

It was clearer, but I think I don't have the same problem as you.
Actually, I think your problem may not exist in my version of Expat (1.95.2).

Regards,

Karl