Issue 27154: Regression in file.writelines behavior (original) (raw)
Issue27154
Created on 2016-05-29 17:41 by snaury, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (6) | ||
---|---|---|
msg266605 - (view) | Author: Alexey Borzenkov (snaury) | Date: 2016-05-29 17:41 |
There's a regression in file.writelines behavior for binary files when writing unicode strings, which seems to have first appeared in Python 2.7.7. The problem is that when writing unicode strings the internal representation (UCS2 or UCS4) is written instead of the actual text, which also directly contradicts documentation, which states "This is equivalent to calling write() for each string". However on Python 2.7.7+ they are no longer equivalent: >>> open('testfile.bin', 'wb').writelines([u'Hello, world!']) >>> open('testfile.bin', 'rb').read() 'H\x00e\x00l\x00l\x00o\x00,\x00 \x00w\x00o\x00r\x00l\x00d\x00!\x00' >>> open('testfile.bin', 'wb').write(u'Hello, world!') >>> open('testfile.bin', 'rb').read() 'Hello, world!' This code worked correctly no Python 2.7.6. | ||
msg266628 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2016-05-29 20:21 |
Any chance you could bisect to figure out which changeset caused the problem? I'm surprised that something like this would happen, we aren't in general making changes at that level to python2 any more. | ||
msg266630 - (view) | Author: Alexey Borzenkov (snaury) | Date: 2016-05-29 20:28 |
Didn't need to bisect, it's very easy to find the problematic commit, since writelines doesn't change that often: https://hg.python.org/releases/2.7.11/rev/db842f730432 The old code was buggy in a sense that it always called PyObject_AsCharBuffer due to the way the condition is structured, but this bugginess was what allowed it to work correctly with unicode objects. After the commit unicode objects are treated like any other buffer, and that's why internal UCS2 or UCS4 representation gets written to the file. | ||
msg266636 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2016-05-29 20:40 |
Thanks. | ||
msg372535 - (view) | Author: Zackery Spytz (ZackerySpytz) * ![]() |
Date: 2020-06-28 21:36 |
Python 2 is EOL, so I think this issue should be closed. | ||
msg373073 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2020-07-06 07:43 |
Removing 'b' and 'u', writelines([s]) and write(s) both now read as s. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:31 | admin | set | github: 71341 |
2020-07-06 07:43:51 | terry.reedy | set | status: open -> closednosy: + terry.reedymessages: + resolution: out of datestage: resolved |
2020-06-28 21:36:05 | ZackerySpytz | set | nosy: + ZackerySpytzmessages: + |
2018-09-23 15:16:52 | xtreak | set | nosy: + xtreak |
2016-05-29 21:02:52 | serhiy.storchaka | set | nosy: + serhiy.storchaka |
2016-05-29 20:40:11 | r.david.murray | set | nosy: + pitroumessages: + |
2016-05-29 20:28:46 | snaury | set | messages: + |
2016-05-29 20:21:36 | r.david.murray | set | nosy: + r.david.murraymessages: + |
2016-05-29 18:33:57 | socketpair | set | nosy: + socketpair |
2016-05-29 17:41:19 | snaury | create |