Issue 27154: Regression in file.writelines behavior (original) (raw)

Issue27154

Created on 2016-05-29 17:41 by snaury, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (6)
msg266605 - (view) Author: Alexey Borzenkov (snaury) Date: 2016-05-29 17:41
There's a regression in file.writelines behavior for binary files when writing unicode strings, which seems to have first appeared in Python 2.7.7. The problem is that when writing unicode strings the internal representation (UCS2 or UCS4) is written instead of the actual text, which also directly contradicts documentation, which states "This is equivalent to calling write() for each string". However on Python 2.7.7+ they are no longer equivalent: >>> open('testfile.bin', 'wb').writelines([u'Hello, world!']) >>> open('testfile.bin', 'rb').read() 'H\x00e\x00l\x00l\x00o\x00,\x00 \x00w\x00o\x00r\x00l\x00d\x00!\x00' >>> open('testfile.bin', 'wb').write(u'Hello, world!') >>> open('testfile.bin', 'rb').read() 'Hello, world!' This code worked correctly no Python 2.7.6.
msg266628 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-05-29 20:21
Any chance you could bisect to figure out which changeset caused the problem? I'm surprised that something like this would happen, we aren't in general making changes at that level to python2 any more.
msg266630 - (view) Author: Alexey Borzenkov (snaury) Date: 2016-05-29 20:28
Didn't need to bisect, it's very easy to find the problematic commit, since writelines doesn't change that often: https://hg.python.org/releases/2.7.11/rev/db842f730432 The old code was buggy in a sense that it always called PyObject_AsCharBuffer due to the way the condition is structured, but this bugginess was what allowed it to work correctly with unicode objects. After the commit unicode objects are treated like any other buffer, and that's why internal UCS2 or UCS4 representation gets written to the file.
msg266636 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-05-29 20:40
Thanks.
msg372535 - (view) Author: Zackery Spytz (ZackerySpytz) * (Python triager) Date: 2020-06-28 21:36
Python 2 is EOL, so I think this issue should be closed.
msg373073 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-07-06 07:43
Removing 'b' and 'u', writelines([s]) and write(s) both now read as s.
History
Date User Action Args
2022-04-11 14:58:31 admin set github: 71341
2020-07-06 07:43:51 terry.reedy set status: open -> closednosy: + terry.reedymessages: + resolution: out of datestage: resolved
2020-06-28 21:36:05 ZackerySpytz set nosy: + ZackerySpytzmessages: +
2018-09-23 15:16:52 xtreak set nosy: + xtreak
2016-05-29 21:02:52 serhiy.storchaka set nosy: + serhiy.storchaka
2016-05-29 20:40:11 r.david.murray set nosy: + pitroumessages: +
2016-05-29 20:28:46 snaury set messages: +
2016-05-29 20:21:36 r.david.murray set nosy: + r.david.murraymessages: +
2016-05-29 18:33:57 socketpair set nosy: + socketpair
2016-05-29 17:41:19 snaury create