Issue 12411: cgi.parse_multipart is broken on 3.x (original) (raw)

process

Status: open Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: MHordecki, efosmark, eric.araujo, flox, glyph, gvanrossum, iritkatriel, jonas.wagner, milesck, orsenthil, python-dev, quentel, r.david.murray
Priority: normal Keywords: patch

Created on 2011-06-25 15:09 by jonas.wagner, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
cgi-coverage.diff jonas.wagner,2011-06-25 15:09 review
cgi-coverage-2.diff jonas.wagner,2011-07-04 11:26 review
Messages (14)
msg139082 - (view) Author: Jonas Wagner (jonas.wagner) Date: 2011-06-25 15:09
While writing tests for the cgi module I came across what looks like a conversion bug. cgi.parse_multipart is comparing values it reads from a binary file like with a string literal: line = fp.readline() ... if line.startswith("--"): This patch adds fixes the issue and adds test for it.
msg139649 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-02 13:53
This looks like a conversion bug indeed; network I/O should use bytes. Strange that no-one caught this, but if there was no test and no users, then bugs can slip. See also #11066, #8077, #4953, #6234 (also adding some people from those bugs’ nosy fields).
msg139668 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-07-03 01:26
Indeed, Victor's comments on his patch say that he changed code that was in the posted patch to say 'line.startswith(b'--')', and the original patch did use b'--', but the code he checked in is missing the 'b'. He also asked for more tests. Victor, any chance you can review this patch, since you were the last one to work on the code in question?
msg139742 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2011-07-04 07:55
The patch seems broken to me. In cgi.parse_multipart(), the 'boundary' variable can be a string even though it is concatenated to bytes. Its default value is a string, and a string can be given via the pdict argument. There is no validity check other than valid_boundary(), which allows both string and bytes. Most of the changes to test_cgi.py are entirely unrelated. The one test added which tests cgi.parse_multipart() should fail since it uses a string (not bytes) boundary, while the correct boundary for the test is commented out. I short this patch seems half-baked. IMO reject this patch and fix just the bytes/strings issue with cgi.parse_multipart. Or, as mentioned in the comments, use FieldStorage to implement it and be done with it.
msg139752 - (view) Author: Jonas Wagner (jonas.wagner) Date: 2011-07-04 11:26
Hi Tal, Thanks a lot for your feedback. My primary objective was to increase the test coverage for cgi.py. If it is a problem to have the additional tests in this patch I'm happy to create a new issue with a separate patch. The default value for the boundary was an oversight, sorry for that. You are right regarding the commented out boundary as well, I forgot to refresh the patch. Again, sorry. Do you think valid_boundary should contain a check to ensure it is a byte object?
msg139757 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2011-07-04 12:11
Yes, please submit the other additional tests in a separate issue. The default value for boundary should surely be b"". A simple test should be added where cgi.parse_multipart() uses the default boundary. If valid_boundary() is used only for cgi.parse_multipart() then it should be changed to validate that the boundary is a bytes instance (which would also make it simpler). Otherwise (if vaild_boundary() is also used elsewhere) cgi.parse_multipart() should itself check that the boundary is indeed a bytes instance, throwing a TypeError otherwise. Tip: You should run the relevant tests, making sure they all pass, before submitting a patch. That way you really know that the patch actually works (as far as passing all of the tests). Thanks for adding more stdlib tests :)
msg139803 - (view) Author: Pierre Quentel (quentel) * Date: 2011-07-04 20:44
When the FieldStorage class was fixed there was a discussion in issue 4953 about the module-level functions parse() and parse_multipart(). The code was very similar to methods of the FieldStorage class so the idea was to use FieldStorage inside the functions The patch proposed in issue 11066 replaced the code in parse_multipart by just : def parse_multipart(fp, pdict): return FieldStorage(fp,environ=pdict) Did anyone test it ?
msg180417 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-22 18:39
Twisted would really like to see this bug fixed.
msg180425 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-22 19:16
Does anyone who was on this bug previously (e.g. the original author or the reviewers) know what was holding up the patch? Does it need more review? More tests? Is there any reason to reject fixing this at all? (I hope not.) As far as replacing the whole thing with a call into the other code goes, I'm hesitant if only because we don't have enough unit tests for the edge cases of the implementation that would be deleted, so if the wholesale replacement were to break user code we wouldn't find out until after it's been released. Fixing it seems less risky.
msg180428 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-01-22 19:22
I personally think, that the "grey area" of multipart form encoding and trying to use email's updated features for parsing was holding it, not the tests. This can be submitted IMO after looking at the "related bugs", I shall do a review on this one today.
msg180429 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-22 19:23
Thank you very much Senthil!
msg180456 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-23 11:01
New changeset a46a0dafcb7a by Senthil Kumaran in branch '3.2': Issue #12411: Fix to cgi.parse_multipart to correctly use bytes boundaries and http://hg.python.org/cpython/rev/a46a0dafcb7a New changeset 59ea872d8b6b by Senthil Kumaran in branch '3.3': merge from 3.2 http://hg.python.org/cpython/rev/59ea872d8b6b New changeset 3d7000549eb1 by Senthil Kumaran in branch 'default': merge from 3.3 http://hg.python.org/cpython/rev/3d7000549eb1
msg180457 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-01-23 11:05
I updated the patch addressing Ezio's comments in the review system and also condensed the tests. This fixes the parse_multipart's byte handling at "some" level. The docstring of parse_multipart say that, this should be deprecated in favor of FieldStorage completely. I will have to trace through the argument and see what should be done here and then I shall close this bug.
msg408461 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-12-13 16:26
Senthil, was this last part done in ?
History
Date User Action Args
2022-04-11 14:57:19 admin set status: pending -> opengithub: 56620
2021-12-13 17:35:22 iritkatriel set status: open -> pending
2021-12-13 17:29:21 vstinner set status: pending -> opennosy: - vstinner
2021-12-13 16:26:14 iritkatriel set status: open -> pendingnosy: + iritkatrielmessages: +
2014-02-04 12:15:19 taleinat set nosy: - taleinat
2013-01-23 11:05:33 orsenthil set assignee: orsenthilresolution: fixedmessages: +
2013-01-23 11:01:36 python-dev set nosy: + python-devmessages: +
2013-01-22 19:23:25 gvanrossum set messages: +
2013-01-22 19:22:17 orsenthil set messages: +
2013-01-22 19:16:07 gvanrossum set messages: +
2013-01-22 18:41:29 glyph set nosy: + glyph
2013-01-22 18:39:04 gvanrossum set nosy: + gvanrossummessages: +
2012-07-21 14:15:03 flox set nosy: + orsenthil
2011-07-04 20:44:02 quentel set messages: +
2011-07-04 12:11:23 taleinat set messages: +
2011-07-04 11:26:35 jonas.wagner set files: + cgi-coverage-2.diffmessages: +
2011-07-04 07:55:58 taleinat set nosy: + taleinatmessages: +
2011-07-03 01:26:27 r.david.murray set nosy: + r.david.murraymessages: +
2011-07-02 13:53:27 eric.araujo set nosy: + eric.araujo, efosmark, milesck, MHordecki, flox, quentelmessages: + versions: + Python 3.2
2011-06-25 15:16:37 r.david.murray set nosy: + vstinner
2011-06-25 15:09:07 jonas.wagner create