[Expat-discuss] Expat and malicious XML

Tue Nov 19 19:55:54 2002

Recently we got notified by Sanctum Inc. about a security
vulnerability in Expat. As it turns out, any conforming parser
is vulnerable, so this is not an Expat problem.

The attack basically consists of including an ordered list
of internal entity declarations, each having two or more
references to the previous entity. For instance, using
two references each time will result in a total of sum(i=1 to N)2^i
internal entity references to resolve for N such entity declarations.
This is at least a CPU hog, but can also turn into a memory
hog if parameter entities are used.

The question is, should there anything be done in Expat
to counter such an attack. Should the counter measures
be outside of Expat? How important is it to have some counter
measure features in Expat, if it can also be done without touching Expat.

Here is what we have considered so far:

1) Have Expat limit the number of open internal entities
   (i.e. limit nesting level of entities), adding an
   API to set that level.

IMO, this excludes too many "proper" XML documents,
but is not a big deal to implement.

2) Have Expat calculate the "fan-out" of an entity while
   parsing the DTD, according to this recursive algorithm:

   - start with fan-out = 0
   - for each entity reference, increment fan-out by 
     (1 + fan-out of referenced entity)

   And add an API to set a fan-out threshold.

Depending on the order of appearance of entity declarations
(general entity references don't require an existing declaration
since they are not resolved at DTD parsing time) this may require
the implementation to manage a list of unresolved fan-outs
until the end of the DTD is reached. So the implementation
is more involved here. But this will calculate the true impact
of the attack more accurately.

For parameter entity references this will be accompanied
by using up memory, so in this case one may have to stop
parsing at the first entity declaration exceeding the threshold.

3) Without touching Expat:
   Put parser in its own thread, assign a proper thread priority
   and kill thread if a certain timeout threshold has passed.

This can make coding more tricky, but has the advantage to
work for any (as yet unknown) attack that makes the parser
eat CPU cycles or memory.

4) Like 3) but with modified memory handler API to allow monitoring
   of memory allocations for individual parser instances.

Does anybody have additional suggestions and/or comments to the above?
What would be your preferences (considering that some one has
to find the time to actually implement any new features in Expat)?

Karl