[Python-bugs-list] [ python-Bugs-231249 ] cgi.py opens too many (temporary) files
noreply@sourceforge.net
noreply@sourceforge.net
Fri, 29 Jun 2001 06:06:39 -0700
Bugs item #231249, was opened at 2001-02-06 04:59
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=231249&group_id=5470
Category: Python Library
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Richard van de Stadt (stadt)
Assigned to: Guido van Rossum (gvanrossum)
Summary: cgi.py opens too many (temporary) files
Initial Comment:
cgi.FieldStorage() is used to get the contents of a webform. It turns out that
for each <input> line, a new temporary file is opened. This causes the script
that is using cgi.FieldStorage() to reach the webserver's limit of number of
opened files, as described by 'ulimit -n'. The standard value for Solaris systems
seems to be 64, so webforms with that many <input> fields cannot be dealt
with.
A solution would seem to use the same temporary filename, since only a
maxmimum one file is (temporarily) used at the same time. I did an "ls|wc -l"
while the script was running, which showed only zeroes and ones.
(I'm using Python for CyberChair, an online paper submission and reviewing system.
The webform under discussion has one input field for each reviewer, stating the
papers he or she is supposed to be reviewing. One conference that is using CyberChair
has almost 140 reviewers. Their system's open file limit is 64. Using the same data on
a system with an open file limit of 260 _is_ able to deal with this.)
----------------------------------------------------------------------
>Comment By: Guido van Rossum (gvanrossum)
Date: 2001-06-29 06:06
Message:
Logged In: YES
user_id=6380
Thanks for reminding me of this patch. I've finally checked
it in, so I can close the bug report!
----------------------------------------------------------------------
Comment By: Richard Jones (richard)
Date: 2001-06-28 23:47
Message:
Logged In: YES
user_id=6405
Sorry for leaving this so long, but I wanted to say that I
tried hacking a solution myself and gave up after it got
too coplex. I have ended up adopting the solution in the
patch here, and it's all working fine!
----------------------------------------------------------------------
Comment By: douglas bagnall (dbagnall)
Date: 2001-06-09 15:06
Message:
Logged In: YES
user_id=107204
This has been causing me trouble too, on various machines.
The patch from 2001-04-12 08:20 fixed the problem, but since
then I haven't been able to upload files bigger than about 1k. I
will try using 2.1 before I investigate that tho.
Guido mentioned another more complicated, less likable, patch
on 2001-04-13, which doesn't seem to have been uploaded.
Or do I just not know where to look?
----------------------------------------------------------------------
Comment By: Richard Jones (richard)
Date: 2001-06-07 22:19
Message:
Logged In: YES
user_id=6405
I've just encountered this bug myself on Mac OS X. The
default number for "ulimit -n" is 256, so you can imagine
that it's a little worrying that I ran out :)
As has been discussed, the multipart/form-data sumission
sends a sub-part for every form name=value pair. I ran
into the bug in cgi.py because I have a select list with
>256 options - which I selected all entries in. This tips
me over the 256 open file limit.
I have two half-baked alternative suggestions for a
solution:
1. use a single tempfile, opened when the multipart
parsing is started. That tempfile may then be passed to
the child FieldStorage instances and used by the
parse_single calls. The child instances just keep track of
their index and length in the tempfile.
2. use StringIO for parts of type "text/plain" and use a
tempfile for all the rest. This has the problem that
someone could cut-paste a core image into a text field
though.
I might have a crack at a patch for approach #1 this
weekend...
----------------------------------------------------------------------
Comment By: Kurt B. Kaiser (kbk)
Date: 2001-04-12 21:04
Message:
Logged In: YES
user_id=149084
The patch posted 11 Apr is a neat and compact solution!
The only thing I can imagine would be a problem would be if a form had a large number of (small) fields
which set the content-length attribute. I don't have an example of such, though. Text fields perhaps? If
that was a realistic problem, a solution might be for make_file() to maintain a pool of temporary files; if the
field (binary or not) turned out to be small a StringIO could be created and the temporary file returned to
the pool.
There are a couple of things I've been thinking about in cgi.py; the patch doesn't seem to change the
situation one way or the other:
There doesn't seem to be any RFC requirement that a file upload be accompanied by a content-length
attribute, regardless of whether it is binary or ascii. In fact, some of the RFC examples I've seen omit it. If
content-length is not specified, the upload will be processed by file.readline(). Can this cause problems for
arbitrary binary files?
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2001-04-12 11:59
Message:
Logged In: YES
user_id=6380
Uploading a new patch, more complicated. I don't like it as
much. But it works even if the caller uses
item.file.fileno().
----------------------------------------------------------------------
Comment By: Kurt B. Kaiser (kbk)
Date: 2001-04-12 10:05
Message:
Logged In: YES
user_id=149084
I have a thought on this, but it will be about 10 hours before I can submit it.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2001-04-11 13:20
Message:
Logged In: YES
user_id=6380
Here's a proposed patch.
Can anyone think of a reason why this should not be checked
in as part of 2.1?
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2001-04-10 11:54
Message:
Logged In: YES
user_id=6380
I wish I'd heard about this sooner.
It does seem a problem and it does make sense to use StringIO unless there's a lot of data.
But we can't fix this in time for 2.1...
----------------------------------------------------------------------
Comment By: A.M. Kuchling (akuchling)
Date: 2001-04-10 10:54
Message:
Logged In: YES
user_id=11375
Unassigning so someone else can take a look at it.
----------------------------------------------------------------------
Comment By: Kurt B. Kaiser (kbk)
Date: 2001-02-18 23:32
Message:
In the particular HTML form referenced it appears that a workaround might be to eliminate the enctype attribute in the <form> tag and take the application/x-www-form-urlencoded default since no files are being uploaded.
When make_file creates the temporary files they are immediately unlinked. There is probably a brief period before the unlink is finalized during which the ls process might see a file; that would account for the output of ls | wc. It appears that the current cgi.py implementation leaves all the (hundreds of) files open until the cgi process releases the FieldStorage object or exits.
My first thought was, if the filename recovered from the header is None have make_file create a StringIO object instead of a temp file. That way a temp file is only created when a file is uploaded. This is not inconsistent with the cgi.py docs.
Unfortunately, RFC2388 4.4 states that a filename is not required to be sent, so it looks like your solution based on the size of the data received is the correct one. Below 1K you could copy the temp file contents to a StringIO and assign it to self.file, then explicitly close the temp file via its descriptor.
If only I understood the module better ::-(( and had a way of tunnel testing it I might have had the temerity to offer a patch. (I'm away for a couple of weeks starting tomorrow.)
----------------------------------------------------------------------
Comment By: A.M. Kuchling (akuchling)
Date: 2001-02-18 14:08
Message:
Ah, I see; the traceback makes this much clearer. When you're uploading a file, everything in the form is sent as
a MIME document in the body; every field is accompanied by
a boundary separator and Content-Disposition header.
In multipart mode, cgi.py copies each field into a temporary file.
The first idea I had was to only use tempfiles for the actual upload field; unfortunately, that doesn't help because the upload field isn't special, and cgi.py has no way to know which it is ahead of time.
Possible second approach: measure the size of the resulting file; if it's less than some threshold (1K? 10K?), read its contents into memory and close the tempfile. This means only the largest fields will require that a file descriptor be kept open. I'll explore this more after beta1.
----------------------------------------------------------------------
Comment By: Richard van de Stadt (stadt)
Date: 2001-02-17 18:37
Message:
I do *not* mean file upload fields. I stumbled upon this with a webform that
contains 141 'simple' input fields like the form you can see here (which 'only'
contains 31 of those input fields):
http://www.cyberchair.org/cgi-cyb/genAssignPageReviewerPapers.py
(use chair/chair to login)
When the maximum number of file descriptors used per process was increased
to 160 (by the sysadmins), the problem did not occur anymore, and the webform
could be processed.
This was the error message I got:
Traceback (most recent call last):
File "/usr/local/etc/httpd/DocumentRoot/ICML2001/cgi-bin/submitAssignRP.py", line 144, in main
File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 504, in __init__
File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 593, in read_multi
File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 506, in __init__
File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 603, in read_single
File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 623, in read_lines
File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 713, in make_file
File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/tempfile.py", line 144, in TemporaryFile
OSError: [Errno 24] Too many open files: '/home/yara/brodley/icml2001/tmp/@26048.61'
I understand why you assume that it would concern *file* uploads, but this is
not the case. As I reported before, it turns out that for each 'simple' <input> field
a temporary file is used in to transfer the contents to the script that uses the
cgi.FieldStorage() method, even if no files are being uploaded. The problem is not
that too many files are open at the same time (which is 1 at most). It is the *amount*
of files that is causing the troubles. If the same temporary file would be used, this
problem would probably not have happened.
My colleague Fred Gansevles wrote a possible solution, but mentioned that this
might introduce the need for protection against a 'symlink attack' (whatever
that may be). This solution(?) concentrates on the open file descriptor's problem,
while Fred suggests a redesign of FieldStorage() would probably be better.
import os, tempfile
AANTAL = 50
class TemporaryFile:
def __init__(self):
self.name = tempfile.mktemp("")
open(self.name, 'w').close()
self.offset = 0
def seek(self, offset):
self.offset = offset
def read(self):
fd = open(self.name, 'w+b', -1)
fd.seek(self.offset)
data = fd.read()
self.offset = fd.tell()
fd.close()
return data
def write(self, data):
fd = open(self.name, 'w+b', -1)
fd.seek(self.offset)
fd.write(data)
self.offset = fd.tell()
fd.close()
def __del__(self):
os.unlink(self.name)
def add_fd(l, n) :
map(lambda x,l=l: l.append(open('/dev/null')), range(n))
def add_tmp(l, n) :
map(lambda x,l=l: l.append(TemporaryFile()), range(n))
def main ():
import getopt, sys
try:
import resource
soft, hard = resource.getrlimit (resource.RLIMIT_NOFILE)
resource.setrlimit (resource.RLIMIT_NOFILE, (hard, hard))
except ImportError:
soft, hard = 64, 1024
opts, args = getopt.getopt(sys.argv[1:], 'n:t')
aantal = AANTAL
tmp = add_fd
for o, a in opts:
if o == '-n':
aantal = int(a)
elif o == '-t':
tmp = add_tmp
print "aantal te gebruiken fd's:", aantal #dutch; English: 'number of fds to be used'
print 'tmp:', tmp.func_name
tmp_files = []
files=[]
tmp(tmp_files, aantal)
try:
add_fd(files,hard)
except IOError:
pass
print "aantal vrije gebruiken fd's:", len(files) #enlish: 'number of free fds'
main()
Running the above code:
python ulimit.py [-n number] [-t]
default number = 50, while using 'real' fd-s for temporary files.
When using the '-t' flag 'smart' temporary files are used.
Output:
$ python ulimit.py
aantal te gebruiken fd's: 50
tmp: add_fd
aantal vrije gebruiken fd's: 970
$ python ulimit.py -t
aantal te gebruiken fd's: 50
tmp: add_tmp
aantal vrije gebruiken fd's: 1020
$ python ulimit.py -n 1000
aantal te gebruiken fd's: 1000
tmp: add_fd
aantal vrije gebruiken fd's: 20
$ python ulimit.py -n 1000 -t
aantal te gebruiken fd's: 1000
tmp: add_tmp
aantal vrije gebruiken fd's: 1020
----------------------------------------------------------------------
Comment By: A.M. Kuchling (akuchling)
Date: 2001-02-16 21:41
Message:
I assume you mean 64 file upload fields, right?
Can you provide a small test program that triggers the problem?
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=231249&group_id=5470