#6256 (Extraneous newlines in large amounts of POST data) – Django (original) (raw)

#6256 closed (fixed)

Reported by: Owned by: nobody
Component: HTTP handling Version: dev
Severity: Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

I ran into a problem doing some stress testing of newforms-admin. I'd bring up a change page for a model that has approx 700 other models edited inline, make no changes to the data (which was all valid), attempt to "save and continue editing", and get seemingly random validation failures for some of the inline-edited fields. I'd get messages about 1-character fields having data that was too long (3 chars), even though all I could see in the field was 1 character, or a message that my foreign key selection wasn't valid, even though it was, etc. Occasionally the save would work, but 4 out of 5 times it would fail with one or more validation errors. Eventually I tracked the problem down to parse_file_upload in django/http/__init__.py (​http://code.djangoproject.com/browser/django/trunk/django/http/__init__.py#L100). After adding some code to dump the contents of raw_message to a file, plus the sequence of items appended to POST on line 130, I found that while the post data in raw_message looked fine, sometimes the values returned by submessage.get_payload() had extraneous newlines at the beginning of the value.

Since Django is using Python's email routines here, I searched for relevant Python bugs and found this:

​http://bugs.python.org/issue1555570

(Someone else using Django already ran into this! But I can't find anything open in Django's trac that matches....)

The Python bug is still open. The last comment there seems to imply that while the provided patch is probably fine, the issue is really a doc problem and that the doc should make it clear that the Python code is expecting a file that has been opened in universal newline mode, where CRLF will be translated to just LF. For the routine that Django is using to construct the email message, I take that to mean that the string passed to email.message_from_string should have just LFs, not CRLFs as it does now. I tried replacing:

msg = email.message_from_string(raw_message)

with:

msg = email.message_from_string(raw_message.replace('\r\n', '\n')

and it does resolve the problem for my case: six successive saves of this model worked fine, whereas without the fix maybe only one would have worked.

Even though I ran into the problem using newforms-admin, the code here is identical on trunk so that is where I think it should be fixed.

Attachments(2)

Change History(14)

comment:1 by Simon Greenhill <dev@…>, 17 years ago

Triage Stage: Unreviewed → Design decision needed

A nice bit of detective work!

comment:2 by Malcolm Tredinnick, 17 years ago

Triage Stage: Design decision needed → Accepted

Remind me to flog Jdunck with some soggy noodles next time I see him. We have to fix this in Django so that it works with existing Python code, not just in future versions of Python. He should have opened a Django bug.

comment:3 by Karen Tracey <kmtracey@…>, 17 years ago

I've attached the patch described in the initial report. It has the benefit of working, but I rather dislike it from a performance point of view, since it introduces a copy of the whole post data string. But I don't see any way to avoid that, since I don't see any way of getting the file object constructed from the request socket to be opened in "universal newline mode" as described here: ​http://docs.python.org/lib/built-in-funcs.html#built-in-funcs ...I don't even know if that mode is available for file objects backed by sockets, and if it is I don't see that the python SocketServer (which I think is where the file object is created) provides any way to set the mode, nor do I know if using that mode would have adverse effects on other parts of the code. So, here's the simple one liner fix that might at least serve as a starting point. Perhaps it would be better to just do the replace on the post_data and avoid introducing the '\r's when creating the raw_message in the first place; someone more familiar with this code could better make that decision.

comment:6 by Karen Tracey <kmtracey@…>, 17 years ago

Sigh. Indeed, I completely overlooked the case of, say, uploading a (binary) file. Any CRLF sequences in that data should not be tampered with but will be corrupted by the patch. This would mean, then, that the comment in the Python bug that states that this issue is more of a doc problem and that the caller of the parser routines should ensure the 'file' has been opened in universal newline mode is incorrect, right? The 'file', in the case of a multipart mime message, may contain a mixture of text and binary data, and the only code with enough information to figure out which CRLFs are actually newline sequences is the code that is parsing based on boundary markers. Have I got that right? If so, someone should probably mention that in the Python issue tracker.

by Karen Tracey <kmtracey@…>, 17 years ago

Better patch that doesn't corrupt binary data.

comment:8 by Karen Tracey <kmtracey@…>, 17 years ago

Note that the (latest) patch for #2070 completely removes Django's use of the Python email parsing routines. So, assuming the parser provided by #2070 is bug-free, this problem goes away when #2070 goes in. FWIW from a quick look through the parsing code in #2070 and a few tests of that code in situations where I can easily hit this bug with current trunk code, the #2070 patch does indeed fix this problem.

comment:9 by Thomas Güttler, 17 years ago

I somehow don't like the long try: ...except block in this patch (6256-2.diff). I think the own
class should be used always.

comment:11 by Karen Tracey <kmtracey@…>, 17 years ago

Resolution: → fixed
Status: new → closed

r7814 (#2070) fixes this problem by completely removing Django's use of the problematic Parser.

Note: See TracTickets for help on using tickets.