Issue 6788: codecs.open on Win32 does not force binary mode (original) (raw)

Created on 2009-08-27 04:17 by EnigmaCurry, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
codecs_bug.py EnigmaCurry,2009-08-27 13:29 Doctests for codecs.open
Messages (4)
msg91995 - (view) Author: Ryan McGuire (EnigmaCurry) Date: 2009-08-27 04:17
Opening a UTF-8 encoded file with unix newlines ("\n") on Win32: codecs.open("whatever.txt","r","utf-8").read() replaces the newlines ("\n") with CR+LF ("\r\n"). The docs specifically say that : "Files are always opened in binary mode, even if no binary mode was specified. This is done to avoid data loss due to encodings using 8-bit values. This means that no automatic conversion of '\n' is done on reading and writing." And yet, opening the file with an explicit binary mode resolves the situation: codecs.open("whatever.txt","rb","utf-8").read() This reads the file with the original newlines unmodified. The implementation of codecs.open and the documentation are out of sync.
msg91999 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-08-27 09:36
Ryan McGuire wrote: > > New submission from Ryan McGuire <python.org@enigmacurry.com>: > > Opening a UTF-8 encoded file with unix newlines ("\n") on Win32: > > codecs.open("whatever.txt","r","utf-8").read() > > replaces the newlines ("\n") with CR+LF ("\r\n"). > > The docs specifically say that : > > "Files are always opened in binary mode, even if no binary mode was > specified. This is done to avoid data loss due to encodings using 8-bit > values. This means that no automatic conversion of '\n' is done on > reading and writing." > > And yet, opening the file with an explicit binary mode resolves the > situation: > > codecs.open("whatever.txt","rb","utf-8").read() > > This reads the file with the original newlines unmodified. > > The implementation of codecs.open and the documentation are out of sync. The implementation looks like this: if encoding is not None and \ 'b' not in mode: # Force opening of the file in binary mode mode = mode + 'b' in both Python 2 and 3, so I'm not sure what could be causing this.
msg92001 - (view) Author: Ryan McGuire (EnigmaCurry) Date: 2009-08-27 13:29
Uploading a doctest for this. The tests are successful on Linux using Python 2.6 They fail on Win32 with Python 2.6
msg92101 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-08-31 06:33
I think your test is invalid: it creates the file in "w" mode, so \n are written as two bytes \r\n on the disk. codecs.open just reads them back.
History
Date User Action Args
2022-04-11 14:56:52 admin set github: 51037
2010-04-29 18:00:18 terry.reedy set status: pending -> closed
2009-08-31 06:33:35 amaury.forgeotdarc set status: open -> pendingnosy: + amaury.forgeotdarcmessages: + resolution: not a bug
2009-08-27 13:29:29 EnigmaCurry set files: + codecs_bug.pymessages: +
2009-08-27 09:36:43 lemburg set nosy: + lemburgmessages: +
2009-08-27 04:17:07 EnigmaCurry create