Issue 13056: test_multibytecodec.py:TestStreamWriter is skipped after PEP393 (original) (raw)

Created on 2011-09-29 01:38 by ezio.melotti, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg144583 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2011-09-29 01:38
The test at Lib/test/test_multibytecodec.py:178 checks for len('\U00012345') == 2, and with PEP393 this is always False. I tried to run the tests with a few changes and they seem to work, but the code doesn't raise any exception on c.reset(): ---->8-------->8-------->8-------->8---- import io, codecs s = io.BytesIO() c = codecs.getwriter('gb18030')(s) c.write('123'); s.getvalue() c.write('\U00012345'); s.getvalue() c.write('\U00012345' + '\uac00\u00ac'); s.getvalue() c.write('\uac00'); s.getvalue() c.reset() s.getvalue() ---->8-------->8-------->8-------->8---- Result: >>> import io, codecs >>> s = io.BytesIO() >>> c = codecs.getwriter('gb18030')(s) >>> c.write('123'); s.getvalue() b'123' >>> c.write('\U00012345'); s.getvalue() b'123\x907\x959' >>> # '\U00012345'[0] is the same of '\U00012345' now >>> c.write('\U00012345' + '\uac00\u00ac'); s.getvalue() b'123\x907\x959\x907\x959\x827\xcf5\x810\x851' >>> c.write('\uac00'); s.getvalue() b'123\x907\x959\x907\x959\x827\xcf5\x810\x851\x827\xcf5' >>> c.reset() # is this supposed to raise an error? >>> s.getvalue() b'123\x907\x959\x907\x959\x827\xcf5\x810\x851\x827\xcf5' Victor suggested to wait until multibytecodec gets ported to the new API before fixing this.
msg171346 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2012-09-26 16:47
Victor, do you know if multibytecodec has been ported to the new API yet? If I removed the "if", I still get a failure. test test_multibytecodec failed -- Traceback (most recent call last): File "/home/wolf/dev/py/py3k/Lib/test/test_multibytecodec.py", line 187, in test_gb18030 self.assertEqual(s.getvalue(), b'123\x907\x959') AssertionError: b'123\x907\x959\x907\x959' != b'123\x907\x959'
msg171347 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-09-26 16:57
> Victor, do you know if multibytecodec has been ported to the new API yet? No, it has no. CJK codecs still use the legacy API (Py_UNICODE).
msg184186 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-03-14 20:14
I think these tests have no sense after PEP393. They tests that StreamWriter works with non-BMP characters broken inside surrogate pair. I.e. c.write(s[:i]); c.write(s[i:]) always is same as c.write(s), even if i breaks s inside a surrogate pair. This case is impossible after PEP393.
msg186591 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-04-11 20:41
New changeset 78cd09d2f908 by Victor Stinner in branch 'default': Issue #13056: Reenable test_multibytecodec.Test_StreamWriter tests http://hg.python.org/cpython/rev/78cd09d2f908
msg186592 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-04-11 20:45
CJK decoders use the new Unicode API since the changeset bcecf3910162. "I think these tests have no sense after PEP393. They tests that StreamWriter works with non-BMP characters broken inside surrogate pair. I.e. c.write(s[:i]); c.write(s[i:]) always is same as c.write(s), even if i breaks s inside a surrogate pair. This case is impossible after PEP393." I reenabled tests, but I simplified them to remove parts related to surrogate pairs. Tests are shorter than before, but it's better than no test at all. Can I close the issue or someone wants to improve these tests?

History
Date	User	Action	Args
2022-04-11 14:57:22	admin	set	github: 57265
2013-04-11 20:58:25	ezio.melotti	set	status: open -> closedstage: needs patch -> resolvedresolution: fixedversions: - Python 3.3
2013-04-11 20:45:48	vstinner	set	messages: +
2013-04-11 20:41:22	python-dev	set	nosy: + python-devmessages: +
2013-03-14 20:14:47	serhiy.storchaka	set	messages: +
2013-03-14 03:49:04	ezio.melotti	set	nosy: + serhiy.storchakaversions: + Python 3.4
2012-09-26 16:57:10	vstinner	set	messages: +
2012-09-26 16:47:21	ezio.melotti	set	keywords: + 3.3regressionmessages: +
2011-09-29 01:47:23	vstinner	set	components: + Unicode
2011-09-29 01:38:20	ezio.melotti	create