[Python-Dev] py3k, cgi, email, and form-data (original) (raw)

MRAB google at mrabarnett.plus.com
Mon May 11 20:28:20 CEST 2009


Robert Brewer wrote:

There's a major change in functionality in the cgi module between Python 2 and Python 3 which I've just run across: the behavior of FieldStorage.readmulti, specifically when an HTTP app accepts a file upload within a multipart/form-data payload.

In Python 2, each part would be read in sequence within its own FieldStorage instance. This allowed file uploads to be shunted to a TemporaryFile (via makefile) as needed: klass = self.FieldStorageClass or self.class part = klass(self.fp, {}, ib, environ, keepblankvalues, strictparsing) # Throw first part away while not part.done: headers = rfc822.Message(self.fp) part = klass(self.fp, headers, ib, environ, keepblankvalues, strictparsing) self.list.append(part) In Python 3 (svn revision 72466), the whole request body is read into memory first via fp.read(), and then broken into separate parts in a second step: klass = self.FieldStorageClass or self.class parser = email.parser.FeedParser() # Create bogus content-type header for proper multipart parsing parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib)) parser.feed(self.fp.read()) fullmsg = parser.close() # Get subparts msgs = fullmsg.getpayload() for msg in msgs: fp = StringIO(msg.getpayload()) part = klass(fp, msg, ib, environ, keepblankvalues, strictparsing) self.list.append(part) This makes the cgi module in Python 3 somewhat crippled for handling multipart/form-data file uploads of any significant size (and since the client is the one determining the size, opens a server up for an unexpected Denial of Service vector). I think the FeedParser is designed to accept incremental writes, but I haven't yet found a way to do any kind of incremental reads from it in order to shunt the fp.read out to a tempfile again. I'm secretly hoping Barry has a one-liner fix for this. ;) It think what it needs is for the email.parser.FeedParser class to have a feed_from_file() method, supported by the class BufferedSubFile.

The BufferedSubFile class keeps an internal list of lines. Perhaps it could also have a list of files, so that when the list of lines becomes empty it can continue by reading lines from the files instead, dropping a file from the list when it reaches the end, something like this:

[Module feedparser.py] ... class BufferedSubFile(object): ... def init(self): # The last partial line pushed into this object. self._partial = '' # The list of full, pushed lines, in reverse order self._lines = [] # The list of files. self._files = [] ...

 ...
 def readline(self):
     while not self._lines and self._files:
         data = self._files[0].read(MAX_DATA_SIZE)
         if data:
             self.push(data)
         else:
             del self._files[0]
     if not self._lines:
         if self._closed:
             return ''
         return NeedMoreData
     ...

 def push_file(self, data_file):
     """Push some new data from a file into this object."""
     self._files.append(data_file)

 ...

and then:

... class FeedParser: ... def feed(self, data): """Push more data into the parser.""" self._input.push(data) self._call_parse()

 def feed_from_file(self, data_file):
     """Push more data from a file into the parser."""
     self._input.push_file(data_file)
     self._call_parse()

 ...


More information about the Python-Dev mailing list