Issue 23321: Crash in str.decode() with special error handler (original) (raw)

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67510

classification

Title:	Crash in str.decode() with special error handler
Type:	crash	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.4, Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	serhiy.storchaka	Nosy List:	Arfrever, python-dev, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2015-01-25 23:16 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
unicode_decode_call_errorhandler_writer.patch	serhiy.storchaka,2015-01-25 23:16	review

Messages (7)
msg234705 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-01-25 23:16
Debugging build crashes in some circumstances in str.decode() with error handler which produces replacement string with length larger than malformed data. For example the backslashreplace error handler produces 4-character string for every illegal byte. All other standard error handlers produce no longer than 1 character for every illegal unit. Here is a patch which fixes this issue. I'll commit it without review because buildbots are broken without it. This issue is open for reference and post-commit review.
msg234707 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-01-25 23:27
New changeset 2de90090e486 by Serhiy Storchaka in branch '3.4': Issue #23321: Fixed a crash in str.decode() when error handler returned https://hg.python.org/cpython/rev/2de90090e486 New changeset 1cd68b3c46aa by Serhiy Storchaka in branch 'default': Issue #23321: Fixed a crash in str.decode() when error handler returned https://hg.python.org/cpython/rev/1cd68b3c46aa
msg234725 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-01-26 08:40
> Debugging build crashes in some circumstances in str.decode() (...) buildbots are broken without it Is it a regression? Would it be possible to identify the changeset responsible of the regression?
msg234731 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-01-26 10:26
I think the changeset which made decoders to use _PyUnicodeWriter () is responsible of the regression. For example consider b'\x80abc'.decode('utf-8', 'backslashreplace'). The writer reserves string buffer with size 4 (every byte produces at most 1 character). First byte is incorrect and replaced by 4-character string '\\x80'. The writer increases min_length but doesn't resize the buffer because its size is enough to write replacement string. But following writes of ASCII characters cause buffer overflow.
msg234783 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-01-26 22:27
New changeset 1e8937861ee3 by Victor Stinner in branch 'default': Issue #22286, #23321: Fix failing test on Windows code page 932 https://hg.python.org/cpython/rev/1e8937861ee3
msg235160 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-02-01 11:01
If you have no enhancements to my quick fix Victor, may be this issue can be closed.
msg235242 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-02-02 11:25
I closed the issue.

History
Date	User	Action	Args
2022-04-11 14:58:12	admin	set	github: 67510
2015-02-02 11:25:23	vstinner	set	messages: +
2015-02-02 11:25:17	vstinner	set	status: pending -> closedresolution: fixed
2015-02-01 11:01:24	serhiy.storchaka	set	status: open -> pendingmessages: +
2015-01-26 22:27:25	python-dev	set	messages: +
2015-01-26 10:26:42	serhiy.storchaka	set	messages: +
2015-01-26 08:40:12	vstinner	set	messages: +
2015-01-26 06:30:29	Arfrever	set	nosy: + Arfrever
2015-01-25 23:27:45	python-dev	set	nosy: + python-devmessages: +
2015-01-25 23:16:13	serhiy.storchaka	create