[Expat-discuss] Expansion of entity references

Oberholtzer,Stephen stephen.oberholtzer at freedompay.com
Fri Apr 2 16:49:28 EST 2004


I'm sorry if this has been discussed recently, the mail archive seems to lack a search function :(

Additionally, the URL in the expat 1.95.7 README file, as well as on SF, appears to be out of date: http://mail.libexpat.org/mailman-21/listinfo/

I'm trying to work on an application that transforms an XML file a certain way (and when I say
'transforms' I am not referring to the sort of thing XSLT is meant to accomplish).  The end result is that
I need the ability to stream in XML *without* expanding &entityrefs;.

I'm working with XML::Parser::Expat v2.34, and I unfortunately don't know what version of expat itself 
(it's from a prebuilt PPM package, and I don't seem to have any way of asking expat its version).
Anyway, Expat is giving me almost exactly what I need, when I specify NoExpand => 1.
(For those unfamiliar, it has this effect:

  if (cbv->no_expand)
    XML_SetDefaultHandler(parser, dflthndl);
  else
    XML_SetDefaultHandlerExpand(parser, dflthndl);
)

It's unfortunately expanding charrefs (&#123, &#x40) and the five builtins amp/apos/quot/lt/gt, 
but I can handle that. What I can't handle is the way it's handling entities inside attribute values:
it's triggering my default handler for the entity prior to giving me the "start element" event
and then 'expanding' said entity into a blank string.

If anyone's wondering who the heck would put entities into attribute values, go to http://lxr.mozilla.org/seamonkey/find?string=.xul and pick any XUL file. The entire Mozilla UI depends
on entities in attribute values to make the locale stuff work. (switch where the DTD reference
points and you switch languages!) 

Obtaining charrefs and builtins in character sections unexpanded would be nice, but 
I really need a way to get the attribute values with entities unexpanded -- and it needs *all* of them unexpanded, including the builtins.  Consider this xml fragment:
    <xyz foo='bar&amp;baz;&quux;' />
If I get this back with the &amp; preexpanded, I will see
    <xyz foo='bar&baz;&quux;' />
and attempt to expand &baz; (which is incorrect; there was no &baz; entityref in the script)

Can anyone please help me at all? (I'm not afraid of patching stuff.)

-- 
Stevie-O

Real programmers use COPY CON PROGRAM.EXE

This E-Mail is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law.  If you have received this communication in error, please do not distribute it and notify us immediately by telephone: 610-902-9000 and delete the original message.





More information about the Expat-discuss mailing list