[Python-Dev] Algoritmic Complexity Attack on Python

Scott A Crosby scrosby@cs.rice.edu
29 May 2003 15:33:12 -0500


Hello. We have analyzed this software to determine its vulnerability
to a new class of DoS attacks that related to a recent paper. ''Denial
of Service via Algorithmic Complexity Attacks.''

This paper discusses a new class of denial of service attacks that
work by exploiting the difference between average case performance and
worst-case performance. In an adversarial environment, the data
structures used by an application may be forced to experience their
worst case performance. For instance, hash tables are usually thought
of as being constant time operations, but with large numbers of
collisions will degrade to a linked list and may lead to a 100-10,000
times performance degradation. Because of the widespread use of hash
tables, the potential for attack is extremely widespread. Fortunately,
in many cases, other limits on the system limit the impact of these
attacks.

To be attackable, an application must have a deterministic or
predictable hash function and accept untrusted input. In general, for
the attack to be signifigant, the applications must be willing and
able to accept hundreds to tens of thousands of 'attack
inputs'. Because of that requirement, it is difficult to judge the
impact of these attack without knowing the source code extremely well,
and knowing all ways in which a program is used.

As part of this project, I have examined python 2.3b1, and the hash
function 'string_hash' is deterministic. Thus any script that may hash
untrusted input may vulnerable to our attack. Furthermore, the
structure of the hash functions allows our fast collision generation
algorithm to work. This means that any script written in python that
hashes a large number of keys from an untrusted source is potentially
subject to a severe performance degradation.

Depending on the application or script, this could be a critical DoS.


The solution for these attacks on hash tables is to make the hash
function unpredictable via a technique known as universal
hashing. Universal hashing is a keyed hash function where, based on
the key, one of a large set hash functions is chosen. When
benchmarking, we observe that for short or medium length inputs, it is
comparable in performance to simple predictable hash functions such as
the ones in Python or Perl. Our paper has graphs and charts of our
benchmarked performance.

I highly advise using a universal hashing library, either our own or
someone elses. As is historically seen, it is very easy to make silly
mistakes when attempting to implement your own 'secure' algorithm.

The abstract, paper, and a library implementing universal hashing is
available at   http://www.cs.rice.edu/~scrosby/hash/.

Scott