[Expat-bugs] [ expat-Bugs-1064174 ] XML_GetBuffer / ParseBuffer bug

SourceForge.net noreply at sourceforge.net
Thu Nov 11 16:51:42 CET 2004


Bugs item #1064174, was opened at 2004-11-10 17:20
Message generated for change (Comment added) made by kwaclaw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1064174&group_id=10127

Category: None
Group: None
Status: Open
Resolution: Rejected
Priority: 5
Submitted By: Werner BEROUX (wernight)
Assigned to: Nobody/Anonymous (nobody)
Summary: XML_GetBuffer / ParseBuffer bug

Initial Comment:
I saw the bug by a little different code than the one
under but this one bug and can be checked really fast.

I do:
Assigne EnableStartElementHandler,
EnableEndElementHandler, EnableCharacterDataHandler().

XML_ParserCreate_MM(NULL, NULL, NULL);
XML_GetBuffer(257);
int nLength = file.Read(pszBuffer, 256);
pszBuffer[nLength] = '\0'; // Don't forget this.
XML_ParseBuffer(nLength, nLength < 257); // Returns false.

Note that when I set buffer to 512 it works. I'm
calling the XML_GetBuffer() each time in the loop and
normally I don't put a closing '\0'.

Configuration:
- I'm using CExpatImpl from CodeProject. Which is only
a wrapper.
- eXpat 1.95.8 with XML_UNICODE_WCHAR_T and XML_STATIC
defined.
- PC AMD64 - WinXP

----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2004-11-11 10:51

Message:
Logged In: YES 
user_id=290026

I haven't looke at your character data handler in detail, but
I get the feeling that you expect character data to be 
reported as one chunk for everything between start and end 
tag.

However, Expat may break up character data into multiple 
call-backs, depending on whether there are line breaks, or 
buffer boundaries, or character references, etc.
The proper way is to accumulate character data in a buffer, 
until the end tag for an element occurs, and then process 
them.

----------------------------------------------------------------------

Comment By: Werner BEROUX (wernight)
Date: 2004-11-11 10:35

Message:
Logged In: YES 
user_id=298482

I don't get any error code. But it sends me datas twice.
I get "file1.png" "file2.png" and ".png" which is obviously
a part of the 2 previous ones. Only why buffer size is 256.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2004-11-11 10:26

Message:
Logged In: YES 
user_id=290026

In my code I have tested buffer sizes down to 1 without 
problems.

What is the error code you get?

If you attach a small app that I can simply compile
in Visual Studio, then I will try it out.

----------------------------------------------------------------------

Comment By: Werner BEROUX (wernight)
Date: 2004-11-11 10:04

Message:
Logged In: YES 
user_id=298482

Sorry, forgot to check the button.
Make sure you've set buffer size to 256. To you read 256
chars each time.

It may be my fault, but then why would it work with a
greater buffer?

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2004-11-11 09:57

Message:
Logged In: YES 
user_id=290026

Where can I find what you are talking about.
There is no file attached with your code.

Btw, I get no error parsing the attached XML file with my 
code.

----------------------------------------------------------------------

Comment By: Werner BEROUX (wernight)
Date: 2004-11-11 09:42

Message:
Logged In: YES 
user_id=298482

Oh yea right. That's why I get immediate error with that. Ok
I give you the real one joined.

I've added some extra comment lines 27, 39, 181 to 185.

To have the XML function without wrapper, just add XML_. I
also checked with the simple XML_Parse function and got the
same bug. I checked the xml file in IE and Firefox, got no
error.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2004-11-11 09:27

Message:
Logged In: YES 
user_id=290026

Yes, I can see what you mean.
However, something else seems wrong:
Have you tried to call XMLParseBuffer with 256 instead of 
257? nlength < 257 should always be true, even if you could 
read the whole buffer, so that the first buffer you pass will 
already be indicated as the last one and part of the file 
remains unprocessed (if larger than 256 bytes), causing the 
parser to report an error.

You could also call it with nlength = 0, that should work as 
well.



----------------------------------------------------------------------

Comment By: Werner BEROUX (wernight)
Date: 2004-11-11 07:08

Message:
Logged In: YES 
user_id=298482

Look back and you'll see that I've inserted the '\0' at
character 256 and I'm asking XML_ParseBuffer() tu parse only
up to 255.

I discovered the bug by not using the end '\0' the problem
was then only visible in the interpretation of the datas
where I got too much calls to OnCharsData (something like this).

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2004-11-10 17:40

Message:
Logged In: YES 
user_id=290026

It is wrong to append null to the end of the buffer.
this inserts an invalid character in the document.
Leave it out and try again. It should work.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1064174&group_id=10127


More information about the Expat-bugs mailing list