[Expat-discuss]

Laurent Carcone Laurent.Carcone at inrialpes.fr
Fri Mar 14 16:33:07 EST 2003


> 
> > Hello,
> > 
> > Would it be possible to include a simple patch in Expat to report an 
> > unresolved external general entity in attribute value at it's original 
> > position inside the attribute value rather than the default handler.
> 
> AFAIK, *external* general entity references are not allowed in attributes.
> (section 4.4.4 in the XML 1.0 specs http://www.w3.org/TR/REC-xml.html#forbidden)

You made the right assumption :-)

> But assuming you mean internal general entities, then it seems you are
> running into a limitation that exists for all SAX-like parsers: they do not
> report entity boundaries for general entities in attribute values and
> parameter entities in declarations.

I read some mails in the archives and it is what I was afraid of.

> 
> > In Function 'appendAttributeValue'
> >  in case XML_TOK_ENTITY_REF:
> > 
> > I replace
> >           if ((pool == &tempPool) && defaultHandler)
> >      reportDefault(parser, enc, ptr, next);
> > by
> >           if ((pool == &tempPool) && defaultHandler)
> >     {
> >       const char *ent;
> >       for (ent = ptr; ent < next; ent++) {
> > if (!poolAppendChar(pool, ent[0]))
> >   return XML_ERROR_NO_MEMORY; 
> > 
> >       }
> >     }
> 
> If I understand correctly, you want "&myEntity;" to show up verbatim
> in the reported attribute value?
> 
> That would a a problem, because the string "&myEntity;" could be
> generated without an actual entity reference, like here:
> <elm att="ABC&amp;myEntity;DEF"> where the attribute value would
> be "ABC&myEntity;DEF".

It's precisely the problem we encountered. To fix it, we use a special 
character followed by the entity name (like in your first option).
In fact, my real patch is :
          if ((pool == &tempPool) && defaultHandler)
	    {
	      const char *ent;
	      if (!poolAppendChar(pool, START_ENTITY))
		return XML_ERROR_NO_MEMORY;
	      for (ent = ptr+1; ent < next; ent++) {
		if (!poolAppendChar(pool, ent[0]))
		  return XML_ERROR_NO_MEMORY; 
	      }
	    }
where START_ENTITY is a special character shared by Expat and the application.

> 
> > We use Expat for our open-source browser/editor Amaya and in this case,
> > we need to receive the whole entity value in the elementStartHandler
> > and let the application decide how to manage external entities.
> 
> I hate to say that, but as it currently stands, a fully DOM compliant
> parser would be better for you.
> 
> However, we have been discussing to improve Expat in this regard,
> but it would be implemented as part of a new Expat API, after
> version 2.0 has been released.
> 
> There are several options, the most efficient being the insertion
> of an illegal XML character (like Unicode 0xFFFF) followed by
> the entity name or even a pointer. An extra API call could be used
> to pass that name or pointer and retrieve the entity value.
> 
> Another one would be to turn attribute reporting into a streaming style,
> with a startAttribute, endAttribute callback, and between them we would
> have  character and start/endEntity callbacks. However that looks like
> a lot of calls just for reporting an attribute.
> 
> We haven't discussed this thoroughly though, and anybody who comes
> up with a better idea is welcome to share it with us.
> 
> Karl
> 

I'm looking forward this discussion

Thank you for your response.

Laurent

> 
> _______________________________________________
> Expat-discuss mailing list
> Expat-discuss at libexpat.org
> http://mail.libexpat.org/mailman/listinfo/expat-discuss
> 







More information about the Expat-discuss mailing list