Issue 24764: cgi.FieldStorage can't parse multipart part headers with Content-Length and no filename in Content-Disposition (original) (raw)

Created on 2015-07-31 16:07 by Peter Landry, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
cgi_multipart.patch Peter Landry,2015-07-31 16:07 Test case and naive bugfix review
cgi_multipart.patch Peter Landry,2015-08-07 15:12 Improved patch that removes the header when present review
Messages (14)
msg247751 - (view) Author: Peter Landry (Peter Landry) * Date: 2015-07-31 16:07
`cgi.FieldStorage` can't parse a multipart with a `Content-Length` header set on a part: ```Python 3.4.3 (default, May 22 2015, 15:35:46) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.49)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import cgi >>> from io import BytesIO >>> >>> BOUNDARY = "JfISa01" >>> POSTDATA = """--JfISa01 ... Content-Disposition: form-data; name="submit-name" ... Content-Length: 5 ... ... Larry ... --JfISa01""" >>> env = { ... 'REQUEST_METHOD': 'POST', ... 'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY), ... 'CONTENT_LENGTH': str(len(POSTDATA))} >>> fp = BytesIO(POSTDATA.encode('latin-1')) >>> fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1") Traceback (most recent call last): File "", line 1, in File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 571, in __init__ self.read_multi(environ, keep_blank_values, strict_parsing) File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 726, in read_multi self.encoding, self.errors) File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 573, in __init__ self.read_single() File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 736, in read_single self.read_binary() File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 758, in read_binary self.file.write(data) TypeError: must be str, not bytes >>> ``` This happens because of a mismatch between the code that creates a temp file to write to and the code that chooses to read in binary mode or not: * the presence of `filename` in the `Content-Disposition` header triggers creation of a binary mode file * the present of a `Content-Length` header for the part triggers a binary read When `Content-Length` is present but `filename` is absent, `bytes` are written to the non-binary temp file, causing the error above. I've reviewed the relevant RFCs, and I'm not really sure what the correct way to handle this is. I don't believe `Content-Length` is addressed for part bodies in the MIME spec[0], and HTTP has its own semantics[1]. At the very least, I think this behavior is confusing and unexpected. Some libraries, like Retrofit[2], will by default include `Content-Length`, and break when submitting POST data to a python server. I've made an attempt to work in the way I'd expect, and attached a patch, but I'm really not sure if it's the proper decision. My patch kind of naively accepts the existing semantics of `Content-Length` that presume bytes, and treats the creation of a non-binary file as the "bug". [0]: http://www.ietf.org/rfc/rfc2045.txt [1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4 [2]: http://square.github.io/retrofit/
msg247752 - (view) Author: Peter Landry (Peter Landry) * Date: 2015-07-31 16:08
I realized my formatting was poor, making it hard to quickly test the issue. Here's a cleaner version: import cgi from io import BytesIO BOUNDARY = "JfISa01" POSTDATA = """--JfISa01 Content-Disposition: form-data; name="submit-name" Content-Length: 5 Larry --JfISa01""" env = { 'REQUEST_METHOD': 'POST', 'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY), 'CONTENT_LENGTH': str(len(POSTDATA))} fp = BytesIO(POSTDATA.encode('latin-1')) fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1")
msg247753 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-31 16:13
@Pierre Quentel: Hi! Are you still working on CGI? Can you please review this patch? Thanks. -- Previous large change in the cgi module: issue #4953. Pierre helped a lot on this one.
msg247799 - (view) Author: Pierre Quentel (quentel) * Date: 2015-08-01 07:37
Yes, I will be able to review the patch next week 2015-07-31 18:13 GMT+02:00 STINNER Victor <report@bugs.python.org>: > > STINNER Victor added the comment: > > @Pierre Quentel: Hi! Are you still working on CGI? Can you please review > this patch? Thanks. > > -- > > Previous large change in the cgi module: issue #4953. Pierre helped a lot > on this one. > > ---------- > nosy: +quentel > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue24764> > _______________________________________ >
msg248185 - (view) Author: Pierre Quentel (quentel) * Date: 2015-08-07 10:47
I don't really see why there is a Content-Length in the headers of a multipart form data. The specification at http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2 doesn't mention it, and it is absent in the example that looks like the one tested by Peter : Content-Type: multipart/form-data; boundary=AaB03x --AaB03x Content-Disposition: form-data; name="submit-name" Larry --AaB03x Content-Disposition: form-data; name="files"; filename=".txt" Content-Type: text/plain ... contents of .txt ... --AaB03x-- In case a user agent would insert it, I think the best would be to ignore it. That is, inside read_multi(), after headers = parser.close() add these lines : if 'content-length' in headers: del headers['content-length'] It's hard to see the potential side effects but I think it's cleaner than the proposed patch, which is not correct anyway for another reason : the attribute value is set to a bytes objects, instead of a string. Peter, does this make sense ? If so, can you submit another patch ?
msg248198 - (view) Author: Peter Landry (Peter Landry) * Date: 2015-08-07 15:06
Yeah, I think that makes the most sense to me as well. I tried to make a minimum-impact patch, but this feels cleaner. If we remove the Content-Length header, the `limit` kwarg might occur at an odd place in the part itself, but that feels unavoidable if someone submits an incorrect Content-Length for the request itself. I'll re-work the patch and make sure the tests I added still add value. Thanks!
msg248199 - (view) Author: Peter Landry (Peter Landry) * Date: 2015-08-07 15:12
A new patch that simply removes Content-Length from part headers when present.
msg248258 - (view) Author: Pierre Quentel (quentel) * Date: 2015-08-08 09:43
Victor, you can apply the patch and close the issue. Le 7 août 2015 17:12, "Peter Landry" <report@bugs.python.org> a écrit : > > Peter Landry added the comment: > > A new patch that simply removes Content-Length from part headers when > present. > > ---------- > Added file: http://bugs.python.org/file40145/cgi_multipart.patch > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue24764> > _______________________________________ >
msg248306 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-08-08 22:56
Not today. I'm in holiday. Ping me in two weeks or find another core dev.
msg248720 - (view) Author: Pradeep (pradeep) Date: 2015-08-17 07:09
Hi Victor, I found similar problem at https://bugs.launchpad.net/barbican/+bug/1485452, is problem mentioned in the bug is same with mentioned bug?
msg248729 - (view) Author: Peter Landry (Peter Landry) * Date: 2015-08-17 14:20
Pradeep, that error seems to be in Barbican. This bug and patch only addresses content-length headers in MIME multipart messages.
msg248778 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-08-18 17:24
New changeset 11e9f34169d1 by Victor Stinner in branch '3.4': cgi.FieldStorage.read_multi ignores Content-Length https://hg.python.org/cpython/rev/11e9f34169d1 New changeset 5b9209e4c3e4 by Victor Stinner in branch '3.5': (Merge 3.4) cgi.FieldStorage.read_multi ignores Content-Length https://hg.python.org/cpython/rev/5b9209e4c3e4 New changeset 0ff1acc89cf0 by Victor Stinner in branch 'default': (Merge 3.5) cgi.FieldStorage.read_multi ignores Content-Length https://hg.python.org/cpython/rev/0ff1acc89cf0
msg248779 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-08-18 17:26
I applied the fix, thanks Peter for the report and the fix, thanks Pierre for the review. > https://bugs.launchpad.net/barbican/+bug/1485452, > is problem mentioned in the bug is same with mentioned bug? I don't know, but you can try to apply the patch locally if you want, or download the Python 3.4 using Mercurial.
msg281076 - (view) Author: Bert JW Regeer (X-Istence) * Date: 2016-11-18 05:21
I've uploaded a patchset to bug #27777 that fixes this issue by fixing make_file, and doesn't cause Python to throw out the content-length information. It also fixes FieldStorage for when you provide it a non-multipart form submission and there is no list in FS. Please see http://bugs.python.org/issue27777
History
Date User Action Args
2022-04-11 14:58:19 admin set github: 68952
2020-07-20 20:52:04 Rhodri James set nosy: - Rhodri James
2019-12-03 17:23:21 ethan.furman set assignee: ethan.furmannosy: + ethan.furman, Rhodri James
2016-11-18 05:21:51 X-Istence set nosy: + X-Istencemessages: +
2016-06-13 20:03:47 berker.peksag link issue27308 superseder
2015-08-18 17:45:31 vstinner set status: open -> closedresolution: fixed
2015-08-18 17:26:58 vstinner set messages: +
2015-08-18 17:24:39 python-dev set nosy: + python-devmessages: +
2015-08-17 14:20:35 Peter Landry set messages: +
2015-08-17 07:09:04 pradeep set nosy: + pradeepmessages: +
2015-08-10 21:16:26 berker.peksag set nosy: + berker.peksagstage: patch reviewtype: behaviorversions: - Python 3.2, Python 3.3
2015-08-08 22:56:01 vstinner set messages: +
2015-08-08 09:43:05 quentel set messages: +
2015-08-07 15:12:11 Peter Landry set files: + cgi_multipart.patchmessages: +
2015-08-07 15:06:48 Peter Landry set messages: +
2015-08-07 10:48:00 quentel set messages: +
2015-08-01 07:37:06 quentel set messages: +
2015-07-31 16:13:48 vstinner set nosy: + quentelmessages: +
2015-07-31 16:08:47 Peter Landry set messages: +
2015-07-31 16:07:21 Peter Landry create