[Python-Dev] py3k, cgi, email, and form-data
MRAB
google at mrabarnett.plus.com
Mon May 11 20:28:20 CEST 2009
Robert Brewer wrote:
> There's a major change in functionality in the cgi module between Python
> 2 and Python 3 which I've just run across: the behavior of
> FieldStorage.read_multi, specifically when an HTTP app accepts a file
> upload within a multipart/form-data payload.
>
> In Python 2, each part would be read in sequence within its own
> FieldStorage instance. This allowed file uploads to be shunted to a
> TemporaryFile (via make_file) as needed:
>
> klass = self.FieldStorageClass or self.__class__
> part = klass(self.fp, {}, ib,
> environ, keep_blank_values, strict_parsing)
> # Throw first part away
> while not part.done:
> headers = rfc822.Message(self.fp)
> part = klass(self.fp, headers, ib,
> environ, keep_blank_values, strict_parsing)
> self.list.append(part)
>
> In Python 3 (svn revision 72466), the whole request body is read into
> memory first via fp.read(), and then broken into separate parts in a
> second step:
>
> klass = self.FieldStorageClass or self.__class__
> parser = email.parser.FeedParser()
> # Create bogus content-type header for proper multipart parsing
> parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))
> parser.feed(self.fp.read())
> full_msg = parser.close()
> # Get subparts
> msgs = full_msg.get_payload()
> for msg in msgs:
> fp = StringIO(msg.get_payload())
> part = klass(fp, msg, ib, environ, keep_blank_values,
> strict_parsing)
> self.list.append(part)
>
> This makes the cgi module in Python 3 somewhat crippled for handling
> multipart/form-data file uploads of any significant size (and since
> the client is the one determining the size, opens a server up for an
> unexpected Denial of Service vector).
>
> I *think* the FeedParser is designed to accept incremental writes,
> but I haven't yet found a way to do any kind of incremental reads
> from it in order to shunt the fp.read out to a tempfile again.
> I'm secretly hoping Barry has a one-liner fix for this. ;)
>
It think what it needs is for the email.parser.FeedParser class to have
a feed_from_file() method, supported by the class BufferedSubFile.
The BufferedSubFile class keeps an internal list of lines. Perhaps it
could also have a list of files, so that when the list of lines becomes
empty it can continue by reading lines from the files instead, dropping
a file from the list when it reaches the end, something like this:
[Module feedparser.py]
...
class BufferedSubFile(object):
...
def __init__(self):
# The last partial line pushed into this object.
self._partial = ''
# The list of full, pushed lines, in reverse order
self._lines = []
# The list of files.
self._files = []
...
...
def readline(self):
while not self._lines and self._files:
data = self._files[0].read(MAX_DATA_SIZE)
if data:
self.push(data)
else:
del self._files[0]
if not self._lines:
if self._closed:
return ''
return NeedMoreData
...
def push_file(self, data_file):
"""Push some new data from a file into this object."""
self._files.append(data_file)
...
and then:
...
class FeedParser:
...
def feed(self, data):
"""Push more data into the parser."""
self._input.push(data)
self._call_parse()
def feed_from_file(self, data_file):
"""Push more data from a file into the parser."""
self._input.push_file(data_file)
self._call_parse()
...
More information about the Python-Dev
mailing list