[Expat-discuss] expat parsing destructively?

Mohun Biswas m_biswas at mailinator.com
Wed Nov 14 16:13:12 CET 2007


Boris Kolpackov wrote:
> Mohun Biswas <m_biswas at mailinator.com> writes:
> 
>> I still say it would be really great if expat had an optional
>> destructive-parsing mode. For one thing it seems in keeping with the
>> spirit of SAX where you chew through the data once linearly and never
>> look back (except with your own state variables of course).
> 
> One obvious problem with this approach is that the attribute values
> can contain entity references (e.g., &amp;, etc) that need to be
> expanded. I suppose the parser could replace them in the original
> buffer and zap the gap by moving the rest of the value forward but,
> as you can see, it gets really messy.

Boris,

At first I didn't know what you were talking about because even 
currently the task of replacing standard entity references has to 
involve copying the attribute string into a buffer and using a state 
machine approach while walking through the buffer, and using an 
overlapping copy (e.g. 'strcpy(foo, foo+3)') to "zap the gap". I say 
this without looking at the code but it's hard to see how else it could 
work. This is easy because the standard entity references are all 
"shrinking", meaning '&amp;' shrinks to '&'. And in that case the 
question of whether the buffer being operated on lives on the stack or 
the heap or within the document seems unimportant.

However, when I realize that XML allows for custom entity references, 
such as a &copyright; which expands to a potentially longer string, I 
see that things do get messy. Not that this couldn't be solved as you 
describe but there may well be other similar issues.

In any case I don't have the cycles nor the expertise to contribute work 
to the idea and it would be poor form to complain without offering to 
help out, so I'll let it go. I've gotten my parsing code working fine; 
it's just not quite as elegant in terms of memory management as one 
might wish.

Thanks,
MB



More information about the Expat-discuss mailing list