[Web-SIG] py3k, cgi, email, and form-data

Robert Brewer fumanchu at aminus.org
Wed May 13 05:43:21 CEST 2009


Graham Dumpleton wrote:
> 2009/5/12 Robert Brewer <fumanchu at aminus.org>:
> > There's a major change in functionality in the cgi module between
> Python
> > 2 and Python 3 which I've just run across: the behavior of
> > FieldStorage.read_multi, specifically when an HTTP app accepts a file
> > upload within a multipart/form-data payload.
> >
> > In Python 2, each part would be read in sequence within its own
> > FieldStorage instance. This allowed file uploads to be shunted to a
> > TemporaryFile (via make_file) as needed:
> >
> >     klass = self.FieldStorageClass or self.__class__
> >     part = klass(self.fp, {}, ib,
> >                  environ, keep_blank_values, strict_parsing)
> >     # Throw first part away
> >     while not part.done:
> >         headers = rfc822.Message(self.fp)
> >         part = klass(self.fp, headers, ib,
> >                      environ, keep_blank_values, strict_parsing)
> >         self.list.append(part)
> >
> > In Python 3 (svn revision 72466), the whole request body is read into
> > memory first via fp.read(), and then broken into separate parts in a
> > second step:
> >
> >     klass = self.FieldStorageClass or self.__class__
> >     parser = email.parser.FeedParser()
> >     # Create bogus content-type header for proper multipart parsing
> >     parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type,
> ib))
> >     parser.feed(self.fp.read())
> >     full_msg = parser.close()
> >     # Get subparts
> >     msgs = full_msg.get_payload()
> >     for msg in msgs:
> >         fp = StringIO(msg.get_payload())
> >         part = klass(fp, msg, ib, environ, keep_blank_values,
> >                      strict_parsing)
> >         self.list.append(part)
> >
> > This makes the cgi module in Python 3 somewhat crippled for handling
> > multipart/form-data file uploads of any significant size (and since
> > the client is the one determining the size, opens a server up for an
> > unexpected Denial of Service vector).
> >
> > I *think* the FeedParser is designed to accept incremental writes,
> > but I haven't yet found a way to do any kind of incremental reads
> > from it in order to shunt the fp.read out to a tempfile again.
> > I'm secretly hoping Barry has a one-liner fix for this. ;)
> 
> FWIW, Werkzeug gave up on 'cgi' module for form passing and implements
> its own.
> 
> Not sure whether this issue in Python 3.0 was one of the reasons or
> not. I know one of the reasons was because cgi.FieldStorage is not
> WSGI 1.0 compliant. One of the main reasons that no one actually
> adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't
> been addressed by a proper amendment to WSGI 1.0 specification or a
> new WSGI 1.1 specification to allow a hint to readline().
> 
> The Werkzeug form processing module is properly WSGI 1.0 compliant,
> meaning that Wekzeug is possibly the only major WSGI framework to be
> WSGI compliant.

FWIW, I just added a replacement for the cgi module to CherryPy over the weekend for the same reasons. It's in the python3 branch but will get backported to CherryPy 3.2 for Python 2.x.


Robert Brewer
fumanchu at aminus.org


More information about the Web-SIG mailing list