[Expat-discuss] Discussion: InternalEntityRefHandler

Karl Waclawek karl@waclawek.net
Sun Jun 16 11:50:02 2002


In the meantime I had a private discussion with Rolf Ade,
and he helped me clarify (and also correct) my position.

Here is what I believe the proposed InternalEntityRefhandler
should do (I leave Rolf's objections for him to explain):

- For SAX compliance I need a StartInternalEntityHandler
  and an EndInternalEntityHandler instead of just an
  InternalEntityRefhandler.

- The StartInternalEntityHandler returns a boolean-type
  value that determines if the entity will be expanded or not;
  if there is no expansion, then the EndInternalEntityHandler
  will not be called.

- It may not make much sense to have this "optional expansion"
  available for parameter entity references too, but I would
  leave it in for the following reasons:
  a) I haven't seen a good argument why it will never make sense
     (anyone please feel free to come up with one)
  b) It is extra work to disable it, and because of b)
     I see no reason to invest more effort to not have  it
  c) This capability is analog to an existing one for the
     ExternalEntityRefhandler, since there, expansion
     can be suppressed within the handler - and I like symmetry <g>
  d) I don't like function arguments that depend on other
     function arguments - i.e. "ignore return value if PE".
     However, I recognize that this can often not be avoided 
     in realistic implementations.

- An entity reference whose expansion is suppressed will
  not be reported through another handler - since that
  is essentially double reporting. This applies mostly to
  reporting an internal GE "literally" through the character handler,
  if expansion was suppressed. For PEs there would not be a "character
  handler" anyway, and for external GEs there is no way to tell
  the parser to report it again, so we would have an inconsistency too.

- The literal entity value (with character references expanded)
  will be passed to the StartInternalEntityHandler.
  Main reason: the information is available, and is easy to pass.
  However, I do not know of a really good use case for it,
  but left it in, just in case there is one. Again, feel free
  anyone to come up with pros and cons.

- The ability to modify this entity value was discussed, because
  suppression of PE references leads to WF errors in many cases,
  so it would make sense to add a "replace entity value" capability.
  But at this moment I don't think it would be worth the effort.

- There is possible conflict with SetDefaulthandler and
  SetDefaultHandlerExpand, since these *also* try to determine
  if internal entities will be expanded.
  For backwards compatibility, it should work like this:
  a) StartInternalEntityHandler is set:
     SetDefaulthandler and SetDefaultHandlerExpand have no effect
     on internal entity expansion, they just set the default handler.
  b) StartInternalEntityHandler is NOT set:
     SetDefaulthandler and SetDefaulthandlerExpand behave as
     they used to - no change.
  In the future, SetDefaultHandlerExpand should be deprecated,
  and SetDefaultHandler should loose its function to suppress
  internal entity expansion. 

- *Unclear Issue*:

There is a statement in the SAX specs that pertains
to the topic of this thread, but I don't quite understand it:

<SAX-Quote>
Because of the streaming event model that SAX uses, some entity boundaries
cannot be reported under any circumstances:
  a.. general entities within attribute values
  b.. parameter entities within declarations
These will be silently expanded, with no indication of where the original entity boundaries were.
</SAX_Quote>

This would mean, that this should not be possible in Expat.
I agree that this is true when parsing entity declarations
or attribute values in declarations, since at this point
the replacement text is not constructed. However, whenever
the replacement text is built, this should be possible, or
is there something obvious I am not understanding?

Karl