[XML-SIG] Content is split into two

"Martin v. Löwis" martin at v.loewis.de
Sat Apr 5 22:51:30 CEST 2008


> Wow, totally unexpected. Wonder why it's designed as it is? This is
> especially weird to me since the string size isn't big (small buffer)
> and this add a bit of complexity to the text processing.

There are two reasons:
1. Efficiency. The parser reads a block of input into a buffer, and then
   parses out of this buffer. If the buffer is exhausted, it first
   passes the data to the application, rather than having to grow the
   buffer if the text content is not complete (which would involve
   copying the data, potentially several times).
2. Correctness. If you have an entity reference (such as © in HTML)
   in your input, the parser needs to tell the application what the
   source entity is (ie. what system and public identifier it has). If
   it would return all data in a single buffer, the source data would
   be distributed across different entities, making it impossible to
   refer to the source with a single URL.

HTH,
Martin


More information about the XML-SIG mailing list