msg300795 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2017-08-24 18:06 |
currently, the following causes an assertion in Modules/_io/textio.c in _io_TextIOWrapper_write_impl() to fail: import codecs import io class BadEncoder(): def encode(self, dummy): return 42 def _get_bad_encoder(dummy): return BadEncoder() quopri = codecs.lookup("quopri") quopri._is_text_encoding = True quopri.incrementalencoder = _get_bad_encoder t = io.TextIOWrapper(io.BytesIO(b'foo'), encoding="quopri") t.write('bar') this is because _io_TextIOWrapper_write_impl() doesn't check whether the value returned by encoder's encode() is a bytes object. (I would open a PR to fix that soon.) |
|
|
msg300801 - (view) |
Author: Vedran Čačić (veky) * |
Date: 2017-08-24 20:11 |
I can't reproduce this on 3.6.0. Is this on 3.7 only? |
|
|
msg300805 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2017-08-24 21:20 |
Just checked on current 3.6 on my Windows 10. The assertion failes, and it is in line 1337. oh my. |
|
|
msg300829 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2017-08-25 07:48 |
As Serhiy pointed out on github, the assertion failure can be easily reproduced by the following: import codecs import io rot13 = codecs.lookup("rot13") rot13._is_text_encoding = True t = io.TextIOWrapper(io.BytesIO(b'foo'), encoding="rot13") t.write('bar') |
|
|
msg300837 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-08-25 13:20 |
Nick can be interested in this. |
|
|
msg300838 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2017-08-25 13:36 |
The proposed fix looks good to me, but it did make me wonder if we might have a missing check in the other direction as well. However, it looks like that case is already fine: ``` >>> hex_codec = codecs.lookup("hex") >>> hex_codec._is_text_encoding = True >>> t = io.TextIOWrapper(io.BytesIO(b'foo'), encoding="hex") >>> t.buffer.write(b'abcd') 4 >>> t.read() Traceback (most recent call last): File "", line 1, in TypeError: decoder should return a string result, not 'bytes' ``` |
|
|
msg300840 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-08-25 14:14 |
May be make an error message more symmetric to the one in the decoder case? |
|
|
msg300853 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-08-25 18:14 |
New changeset a5b4ea15b61e3f3985f4f0748a18f8b888a63532 by Serhiy Storchaka (Oren Milman) in branch 'master': bpo-31271: Fix an assertion failure in io.TextIOWrapper.write. (#3201) https://github.com/python/cpython/commit/a5b4ea15b61e3f3985f4f0748a18f8b888a63532 |
|
|
msg300863 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2017-08-26 08:09 |
all three versions do 'self->pending_bytes_count += PyBytes_GET_SIZE(b);', while 'b' is the object the encoder returned. in 3.6 and 3.7, the implementation of PyBytes_GET_SIZE() includes 'assert(PyBytes_Check(op))', but in 2.7, the implementation is 'PyString_GET_SIZE', which is just 'Py_SIZE(op)'. and so, in 2.7 there isn't an assertion failure. Moreover, 'self->pending_bytes_count' is used only to determine whether a flush is needed. so ISTM that probably the bug's only risk is not flushing automatically after writing. note that whenever _textiowrapper_writeflush() is finally called (when the encoder returned a non-string object), it would raise a TypeError by calling string_join() on a non-string object. do you still think we should backport to 2.7? |
|
|
msg300884 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-08-26 15:47 |
> do you still think we should backport to 2.7? This is not trivial question. On one side, using PyString_GET_SIZE() with non-bytes object definitely is a bug. It is better to catch it earlier rather than hope on failing in the following code. On other side, adding a check for bytes can break existing user code that is passed now by accident. _PyBytes_Join() in 2.7 supports unicode objects. I think that the fix should be backported, and the proper fix should allow bytes and unicode objects. But I left the decision on Benjamin. |
|
|
msg300887 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-08-26 17:29 |
New changeset 9bcbc6cba385a83cac8f1ff430cad7617184d2bc by Serhiy Storchaka (Oren Milman) in branch '3.6': [3.6] bpo-31271: Fix an assertion failure in io.TextIOWrapper.write. (GH-3201) (#3209) https://github.com/python/cpython/commit/9bcbc6cba385a83cac8f1ff430cad7617184d2bc |
|
|
msg303548 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-10-02 16:07 |
Oren, do you mind to create a backport to 2.7? miss-islington can not handle it. |
|
|
msg303558 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2017-10-02 18:50 |
sure |
|
|
msg303579 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2017-10-03 06:57 |
I am not sure, but ISTM that it isn't possible for the encoder to return a unicode and not fail later. This is because _textiowrapper_writeflush() would call _io.BytesIO.write() (after it called _PyBytes_Join()), and bytesio_write() calls PyObject_GetBuffer(), which would raise "TypeError: 'unicode' does not have the buffer interface". |
|
|
msg303899 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-10-08 07:30 |
You are right. But in any case the test should be fixed for 2.7. |
|
|
msg305695 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-11-07 00:17 |
New changeset 30537698b607d53fa9ce18522abb88469d5814b6 by Victor Stinner (Oren Milman) in branch '2.7': [2.7] bpo-31271: Fix an assertion failure in io.TextIOWrapper.write. (GH-3201) (#3951) https://github.com/python/cpython/commit/30537698b607d53fa9ce18522abb88469d5814b6 |
|
|
msg305696 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-11-07 00:19 |
Serhiy: "You are right. But in any case the test should be fixed for 2.7." Done: I merged Oren's PR 3951. It seems like the bug was fixed in all (maintained) branches, so I close the issue. Thank you again Oren Milman for *reporting* and *fixing* the bug! |
|
|