[Expat-bugs] [ expat-Bugs-1162302 ] Fault CharacterDataHandler if LF starts data

SourceForge.net noreply at sourceforge.net
Tue Apr 19 19:52:00 CEST 2005


Bugs item #1162302, was opened at 2005-03-13 00:26
Message generated for change (Comment added) made by kwaclaw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1162302&group_id=10127

Category: None
Group: Not a Bug
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fault CharacterDataHandler if LF starts data

Initial Comment:
When a tag's data start with LF (0x0A), I get 
XML_CharacterDataHandler with len=1 and s=0x0A 
instead of the actual data that comes after the LF.

On the attached example I get the following calls to 
XML_CharacterDataHandler :
- len =1, s=0x0A
- len=6, s="closed"
- len =1, s=0x0A
- len =1, s=0x0A
The last two calls for the function are problematic - I 
don't get the actual data that comes after the LF.

The problem happens when the tag's data start with LF 
but has more characters after the LF.

on Expat-1.95.8 created from 
expat_win32bin_1_95_8.exe


----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2005-04-19 13:52

Message:
Logged In: YES 
user_id=290026

Closing this issue - no follow-up from poster.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2005-03-14 18:35

Message:
Logged In: YES 
user_id=290026

Just to clarify - in Expat, a contiguous string of characters
does not necessarily have to be reported through exactly one
characterData() call-back. Often, line-breaks determine the
boundary between call-backs. In the attached example, the
character data for the <note> element will likely be reported
through three call-backs, as there are two line-breaks.

----------------------------------------------------------------------

Comment By: Mike Rosky (mike-rosky)
Date: 2005-03-14 08:20

Message:
Logged In: YES 
user_id=1238831

I think that you (original sender) should examine your code, 
when and for which cases you're calling chardata handler. 
Please note that basically you have to (simplified) reset
chardata buffer at the start element point and accumulate
chardata value every time character data handler is invoked
until parser calls the end element handler for that element.

You can't suppose that chardata handler gets whole value
in one call - actually in your case it's called twice because of 
CRLF, the first call returns CRLF (and eventually preceding 
chars), the second the rest of chardatas. Chardatas can even 
cross two source readings, which leads to same effect.

You can try it when you add a character data handler to
the outline.c example, where chardata handler just prints
out the current part enclosed in brackets or something.

Mike


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-03-13 23:36

Message:
Logged In: NO 

The attached example has LF (0x0A) in the following cases:
- before the text "sip:pep at example.com"
- before the text "Full state presence document"
In these 2 cases I receive XML_CharacterDataHandler  with 
len=1 and s=0x0A only. The function is not called for the 
data AFTER the LF (In the example to "Full state presence 
document").
Is there a way to overcome this problem?

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2005-03-13 08:54

Message:
Logged In: YES 
user_id=290026

Your attached example does not have any LF directly
before or after the string "closed". Please clarify your problem.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1162302&group_id=10127


More information about the Expat-bugs mailing list