[XML-SIG] Please resolve external parameter entity references

Steven Work steve@renlabs.com
01 Feb 2000 15:33:07 -0800


May I weigh in on the feature list question?  For many purposes the
core XML processor should resolve external parameter entity
references; expat currently doesn't.  W3C only *requires* this of a
validating parser, and that appears to be expat's justfication for
skipping them.  I'd like to argue that a good general-use
*non-validating* parser should do it too, at least optionally; and I
don't think it would bloat code measurably or slow things down any
when there are no external parameter entity references, or when the
option is turned off.

Why does this matter?  Here's one example.

I find myself logging (accumulating) information in XML-derived
formats pretty frequently these days.  The only way I know to do this
in a strictly append-only and atomic way is this:

1. Start with a (unchanging) top-level document like "log.xml" here:

  <?xml version="1.0" standalone="no"?>
  <!DOCTYPE log SYSTEM "log.dtd" [
  <!ENTITY % log.decls SYSTEM "log.decls">
  <!ENTITY   log.ents    SYSTEM "log.ents">
  %log.decls;
  ]>
  <log>
  &log.ents;
  </log>

2. For each "thing" to log, do these steps in order:

  a. Write a well-formed chunk of XML, valid within a <log> entity, to
     a uniquely-named new file.

  b. Append something like this to "log.decls":

       <!ENTITY unique-name SYSTEM "unique-name">

  c. Append something like this to "log.ents":

       &unique-name;

If you can assume the writes in 2b and 2c are atomic (happen to
completion without other writes to the same file intervening; for
small writes on most systems this is an OK assumption) then "log.xml"
remains valid at all times -- no need for locks or other interprocess
communications to avoid scrambling the data, even with many processes
writing data "simultaneously."

But to process "log.xml" I have to fall back from the very-fast expat,
usually to an ESIS parser chewing the data stream from nsgmls in a
separate process (validating xmlproc works too but it's even slower).
These systems don't need validating parsers, but the to my knowledge
the XML developer community hasn't built any good non-validating
parsers that don't just ignore external parameter entity references.

Only they can't ignore them entirely (Section 5.1 of the W3C
recommendation requires a non-validating parser to notice when it has
chosen NOT to read an external parameter entity, so it can know at
what point it is absolved of its responsibility to process entity
declarations or attribute-list declarations that come later).  So
there's essentially no speed cost to having the *option* of reading
external parameter entities, and choosing *not* to.  And you're
already expanding internally-declared parameter entities, so it won't
add a measurable amount of code to do so from another file.

I think I'm talking myself into patching expat.  Would some kind soul
please point out flaws in the above, so I can save myself the trouble?
-- 
Steven Work
Renaissance Labs
steve@renlabs.com
360 647-1833