[Expat-discuss] Discussion: InternalEntityRefHandler

Karl Waclawek karl@waclawek.net
Thu Jun 13 11:08:04 2002


> >> > 2) allow the application to return a value that
> >> >    indicates if the entity should be expanded or not
> >> I've mixed emotions about that. What exactly should this return value
> >> control? For a GE, if the return value is true, the replacement text
> >> is reported again throu the characterDataHandler? There may be sense
> >> in this, but what's the sense in returning false for a PE? Can't
> >> imagin a sensible reason, does anybody else?
> >
> > Good points, but it isn't the character handler, it's potentially all content
> > handlers. For GEs there have been requests to suppress expansion of certain
> > types of entities, e.g. predefined ones. For PEs - I don't know.
> > However, I am also missing a good argument for *not* wanting
> > to ever ignore/suppress them.
>
> Don't see, how any handler beside the characterDataHandler could be
> able to handle not expanded entities in a sensible way.

Well, you said "... the replacement text is reported again through
the character data handler". I understood this as meaning: when expanded,
since otherwise - IMO - just the reference itself should be reported as
character data, but not the replacement text - that one is ignored.
This applies to a GE. For a PE nothing will be reported, unless
the default handler is set.

> Maybe a few examples could make things a bit clearer. Consider
> a document like this:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <!DOCTYPE foo SYSTEM "1.ent">
> <foo/>
>
>
> with 1.ent:
>
> <!-- 1 -->
> <!ENTITY % draft 'INCLUDE' >
> <![%draft;[
> <!ELEMENT book (comments*, title, body, supplements?)>
> ]]>
>
> <!-- 2 -->
> <!ENTITY % someElement  "<!ELEMENT element ANY>">
> %someElement;
>
> <!-- 3 -->
> <!ENTITY % fooContent "EMPTY" >
> <!ELEMENT foo %fooContent;>
>
> <!-- 4 -->
> <!ENTITY % someBarChilds "boo,baz">
> <!ELEMENT bar (foe,%someBarChilds;,bom)>
>
> <!-- 5 -->
> <!ENTITY % pub    "&#xc9;ditions Gallimard" >
> <!ENTITY   rights "All rights reserved" >
> <!ENTITY   book   "La Peste: Albert Camus, &#xA9; 1947 %pub; &rights;" >
>
>
> What should happen in examples 1 if the
> InternalEntityRefHandler return 0? Skip the hole conditional section??
> Completely skipping the PE as well as leaving the PE in unexpanded
> form results in not wellformed XML, in this case.

I would say that this is simply a risk one has to take.
I am not sure that there is a skipping algorithm that will always
preserve wellformedness. I would say: returning 0 means
"skip the PE" no matter what the consequences.

> Example 2 is probably the least critical. If InternalEntityRefHandler
> returns 0, the unexpanded PE goes throu defaultHandler?

Yes.

> In Example 3 and 4 I guess the InternalEntityRefHandler must be called
> befor the elementDeclHandler.

Yes.

> If the InternalEntityRefHandler returns
> 0, what should the elementDeclHandler return: how does the
> %fooContent; fit into a XML_Content, same Question for the bar
> content?

This will generate a WF error, I would assume.
As above - IMO a skipped PE amounts to "nothing".

> Even example 5 doesn't looks like a sensible case. Sure,
> InternalEntityRefHandler should be called befor the
> EntityDeclHandler. If it returns 0, the replacement text reported
> throu the EntityDeclHandler would be "La Peste: Albert Camus, =A9 1947
> %pub; &rights;"? But the XML rec says clearly in 4.5: "The replacement
> text is the content of the entity, after replacement of character
> references and parameter-entity references."

I don't think the InternalEntityRefhandler should concern itself with
character references at all, since they are not entities.
About the other two entites: &rights; is a GE and not recogized here,
so it won't be reported as GE and become part of the stored entity value.
%pub; is a PE, and if skipped, will simply be replaced with "nothing".
Therefore, we can only skip %pub; and the literal entity value
(not the replacement text!) will be "La Peste: Albert Camus, © 1947 &rights;"
Note: the spec says in Appendix D:
...
the XML processor will recognize the character references when it parses
the entity declaration, and resolve them before storing the following
string as the value of the entity
...

> Still, this optional not expanding of PE's feels to me like a can of
> worms. But OK, do it, I have no problem to let the
> InternalEntityRefHandler return always 1 for PE's. There may be
> valuable use cases for this, that I just don't see at the moment.

If one could alter the replacement text...

> >> > 1) There is possible interference with the SetDefaultHandler
> >> > and SetDefaulthandlerExpand functions. The former will
> >> > turn off expansion of internal general entities.
> >> > However, this could now also be done with the InternalEntityRefHandl=
> er.
> >> > So, what happens when expansion is turned off?
> > [..]
> > Even with the InternalEntityRefHandler we might leave both in,
>
> Yes. And without InternalEntityRefHandler set, let them behave as
> now. Easiest way for backward compatibility.

Yes, I just wanted to get rid of the extra data member, but I see
that could break some apps.

> > but declare one of them as deprecated (and possibly make them
> > behave the same - but that might break some apps).
>
> Yes, get rid of one of them, on the long run. But make them only
> behave the same way, if InternalEntityRefHandler is set.

Sounds reasonable.

> Call InternalEntityRefHandler
> >> with entity value NULL, if skippedEntityHandler isn't
> >> set. (Well, this way an InternalEntityRefHandler could supersede the
> >> skippedEntityHandler..).
> >
> > That's an interesting idea - only have an InternalEntityRefHandler,
> > and set entityValue = NULL if skipped (when not an error, of course).
>
> Suddenly, I feel happy about that the skippedEntityHandler hasn't made
> it into 1.95.3. But I'm not perfect sure. Let's hear other opinions.

Check my other message - maybe not such a good idea.

> > Anyway, you have raised doubts in my mind if it makes sense to
> > report the entity value at all. What does it mean examining it?
> > Isn't that what the parser does?
>
> Depends. For sure, the parser will not be able to examining the
> semantic of the replacement text, this could only be done by the
> application. But the application could always manage his own entity
> value lookup table (using entityDeclHandler etc.), if needed.  There
> may be a minimal speed penalty, if the InternalEntityRefHandler
> returns 0 (do not expand), because of the in this case 'unnecessary'
> entity value lookup, but this may only be measurable in extrem
> cases. On the other hand it would be a nice service, to get the entity
> value without managing an own lookup table. Could live with both.

I general, my thinking is: if the info is there, and it is no effort
to provide it, then why not? I am waiting for a good argument against it.

Karl