Issue 19806: smtpd crashes when a multi-byte UTF-8 sequence is split between consecutive data packets (original) (raw)

Created on 2013-11-27 06:10 by petri.lehtinen, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
smtpd.patch petri.lehtinen,2013-11-27 06:10 Patch from GitHub
smtpd_undecodable_data_does_not_raise.patch zvyn,2014-07-09 15:33 review
smtpd_undecodable_data_does_not_raiseV2.patch zvyn,2014-07-16 19:56 review
Messages (14)
msg204553 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2013-11-27 06:10
From Illirgway, https://github.com/akheron/cpython/pull/2: Fix utf-8 bug, when symbol byte chain splits on 2 tcp-packets: error: uncaptured python exception, closing channel (:'utf-8' codec can't decode byte 0xd0 in position 1474: unexpected end of data [/usr/lib/python3.3/asyncore.py|read 83] [/usr/lib/python3.3/asyncore.py handle_read_event
msg204577 - (view) Author: (Illirgway) Date: 2013-11-27 12:34
"base64", "quoted-printable" and "7bit" messages only use ASCII range of characters so raw_message.decode('ascii') == raw_message.decode('latin1') == etc. == raw_message.decode('utf-8') due to internal representation of utf-8 characters P.S. Python 3.4.0 beta 1 Lib/smtpd has the same bug and needs the same patch to fix this issue
msg204578 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-11-27 13:14
It occurs to me to wonder why smtpd is receiving 8bit data at all. It isn't advertising the 8BITMIME capability, so the client should be sending only 7bit data. See also issue 19662. I'm not sure that making utf-8 decoding work better is a good idea, but I suppose that since it is a data-dependent exception in a situation that would work with different data, we should go ahead and fix it.
msg204580 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2013-11-27 13:45
Is there a reason why this patch changes the decoding error handler to 'ignore' (from the default 'strict')?
msg204585 - (view) Author: (Illirgway) Date: 2013-11-27 14:17
because "strict" throws an exception (for example, on raw win1251) which brings down production smtp daemon and "replace" embeds "ugly" characters into received (and passed) email messages
msg221208 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-21 23:11
@Petri/Illirgway could one of you produce a new patch including the code change and a test for this?
msg222622 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-07-09 15:33
I do not think that the proposed patch solves the bug because it silently changes binary input. With the patch for issue 19662 a proper solution to avoid this bug has been applied. The only thing left is to prevent the server to raise the exception when in legacy mode. Instead of deleting single bytes from the input (which is what .decode('utf-8', 'ignore') does) I would reply with an error to the client. The attached patch implements and tests this behaviour.
msg222735 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2014-07-11 10:31
I don't know anything about SMTP, but would it make sense to use an incremental decoder for decoding UTF-8?
msg223023 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-14 13:09
As Milan said, the problem doesn't arise in 3.5 with decode_data=False, since there's no decoding. His patch doesn't actually fix the bug for the decode_data=True case, though, since the bug is a *valid* utf-8 sequence getting split across tcp buffers. To fix it, we would need to change the implementation of decode_data. Instead of conditionally decoding in collect_data, we'd need to postpone decoding to found_terminator. This would have the undesirable affect of changing what is in the received_lines attribute, which is why we didn't do it in the decode_data patch. Using an incremental decoder won't solve that problem, since it too would change what gets stored in received_lines. Since decode_data=True is really not a legitimate mode for smtpd (it is an historical accident/bug) and we are planning on removing it eventually, I think we should go ahead and apply Milan's patch as is, since it does improve the error reporting. The message would need to be adjusted though, since it can trigger on valid utf-8 data. It should say that smtpd should be run with decode_data=False in order to fix the decode problem. That would leave the bug as-is in 3.4, but a similar patch with an error message suggesting an upgrade to 3.5/decode_data=True could be applied. That feels a little weird, though :).
msg223263 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-07-16 19:56
Note that in this (and my previous) patch the message is sent to the client (the idea was not to raise an exception). Maybe it would be better to raise an exception with the information you mentioned?
msg223267 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-16 20:30
No, I think the error return is better. Crashing the server on bad input data (which, worse, could be perfectly valid if the server were standards compliant) is a bad thing, I think. That is, if someone has an application that actually works with decode_data=True, I think it is better if it doesn't crash just because of different input data or TCP packet boundaries, as long as the error is reported. I'm willing to be convinced otherwise, but that's how I'm leaning.
msg223274 - (view) Author: Milan Oberkirch (zvyn) * Date: 2014-07-16 21:04
Agreed. It just feels a bit weird to send programming instructions for the server to the client (but it's probably the best we can do here).
msg223278 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-16 21:17
Oh, that's a good point, I hadn't thought of that. Maybe I can come up with a better wording.
msg309754 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-01-10 00:56
I'm closing this as won't fix since smtpd.py is deprecated and will likely not get any future development. Please see aiosmtpd as a much better third party replacement.
History
Date User Action Args
2022-04-11 14:57:54 admin set github: 64005
2018-01-10 00:56:05 barry set status: open -> closednosy: + barrymessages: + resolution: wont fixstage: patch review -> resolved
2014-07-16 21:17:09 r.david.murray set messages: +
2014-07-16 21:04:20 zvyn set messages: +
2014-07-16 20:30:12 r.david.murray set messages: +
2014-07-16 19:56:13 zvyn set files: + smtpd_undecodable_data_does_not_raiseV2.patchmessages: +
2014-07-14 13:09:49 r.david.murray set messages: +
2014-07-11 10:31:13 doerwalter set nosy: + doerwaltermessages: +
2014-07-09 15:34:39 zvyn set nosy: + jesstess
2014-07-09 15:33:02 zvyn set files: + smtpd_undecodable_data_does_not_raise.patchnosy: + zvynmessages: +
2014-06-21 23:11:47 BreamoreBoy set nosy: + BreamoreBoymessages: + versions: + Python 3.5, - Python 3.3
2013-11-27 14:17:07 Illirgway set messages: +
2013-11-27 13:45:19 petri.lehtinen set messages: +
2013-11-27 13:14:24 r.david.murray set nosy: + r.david.murraymessages: +
2013-11-27 12:34:08 Illirgway set nosy: + Illirgwaymessages: + versions: + Python 3.4
2013-11-27 06:10:48 petri.lehtinen create