Debugging build crashes in some circumstances in str.decode() with error handler which produces replacement string with length larger than malformed data. For example the backslashreplace error handler produces 4-character string for every illegal byte. All other standard error handlers produce no longer than 1 character for every illegal unit. Here is a patch which fixes this issue. I'll commit it without review because buildbots are broken without it. This issue is open for reference and post-commit review.
> Debugging build crashes in some circumstances in str.decode() (...) buildbots are broken without it Is it a regression? Would it be possible to identify the changeset responsible of the regression?
I think the changeset which made decoders to use _PyUnicodeWriter () is responsible of the regression. For example consider b'\x80abc'.decode('utf-8', 'backslashreplace'). The writer reserves string buffer with size 4 (every byte produces at most 1 character). First byte is incorrect and replaced by 4-character string '\\x80'. The writer increases min_length but doesn't resize the buffer because its size is enough to write replacement string. But following writes of ASCII characters cause buffer overflow.