
Karl Waclawek karl at waclawek.net
Fri Mar 14 09:39:20 EST 2003

> Hello,
> Would it be possible to include a simple patch in Expat to report an 
> unresolved external general entity in attribute value at it's original 
> position inside the attribute value rather than the default handler.

AFAIK, *external* general entity references are not allowed in attributes.
(section 4.4.4 in the XML 1.0 specs http://www.w3.org/TR/REC-xml.html#forbidden)

But assuming you mean internal general entities, then it seems you are
running into a limitation that exists for all SAX-like parsers: they do not
report entity boundaries for general entities in attribute values and
parameter entities in declarations.

> In Function 'appendAttributeValue'
>  in case XML_TOK_ENTITY_REF:
> I replace
>           if ((pool == &tempPool) && defaultHandler)
>      reportDefault(parser, enc, ptr, next);
> by
>           if ((pool == &tempPool) && defaultHandler)
>     {
>       const char *ent;
>       for (ent = ptr; ent < next; ent++) {
> if (!poolAppendChar(pool, ent[0]))
>   return XML_ERROR_NO_MEMORY; 
>       }
>     }

If I understand correctly, you want "&myEntity;" to show up verbatim
in the reported attribute value?

That would a a problem, because the string "&myEntity;" could be
generated without an actual entity reference, like here:
<elm att="ABC&amp;myEntity;DEF"> where the attribute value would
be "ABC&myEntity;DEF".

> We use Expat for our open-source browser/editor Amaya and in this case,
> we need to receive the whole entity value in the elementStartHandler
> and let the application decide how to manage external entities.

I hate to say that, but as it currently stands, a fully DOM compliant
parser would be better for you.

However, we have been discussing to improve Expat in this regard,
but it would be implemented as part of a new Expat API, after
version 2.0 has been released.

There are several options, the most efficient being the insertion
of an illegal XML character (like Unicode 0xFFFF) followed by
the entity name or even a pointer. An extra API call could be used
to pass that name or pointer and retrieve the entity value.

Another one would be to turn attribute reporting into a streaming style,
with a startAttribute, endAttribute callback, and between them we would
have  character and start/endEntity callbacks. However that looks like
a lot of calls just for reporting an attribute.

We haven't discussed this thoroughly though, and anybody who comes
up with a better idea is welcome to share it with us.


More information about the Expat-discuss mailing list