Issue 6788: codecs.open on Win32 does not force binary mode (original) (raw)

Created on 2009-08-27 04:17 by EnigmaCurry, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
codecs_bug.py	EnigmaCurry,2009-08-27 13:29	Doctests for codecs.open

Messages (4)
msg91995 - (view)	Author: Ryan McGuire (EnigmaCurry)	Date: 2009-08-27 04:17
Opening a UTF-8 encoded file with unix newlines ("\n") on Win32: codecs.open("whatever.txt","r","utf-8").read() replaces the newlines ("\n") with CR+LF ("\r\n"). The docs specifically say that : "Files are always opened in binary mode, even if no binary mode was specified. This is done to avoid data loss due to encodings using 8-bit values. This means that no automatic conversion of '\n' is done on reading and writing." And yet, opening the file with an explicit binary mode resolves the situation: codecs.open("whatever.txt","rb","utf-8").read() This reads the file with the original newlines unmodified. The implementation of codecs.open and the documentation are out of sync.
msg91999 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2009-08-27 09:36
Ryan McGuire wrote: > > New submission from Ryan McGuire <python.org@enigmacurry.com>: > > Opening a UTF-8 encoded file with unix newlines ("\n") on Win32: > > codecs.open("whatever.txt","r","utf-8").read() > > replaces the newlines ("\n") with CR+LF ("\r\n"). > > The docs specifically say that : > > "Files are always opened in binary mode, even if no binary mode was > specified. This is done to avoid data loss due to encodings using 8-bit > values. This means that no automatic conversion of '\n' is done on > reading and writing." > > And yet, opening the file with an explicit binary mode resolves the > situation: > > codecs.open("whatever.txt","rb","utf-8").read() > > This reads the file with the original newlines unmodified. > > The implementation of codecs.open and the documentation are out of sync. The implementation looks like this: if encoding is not None and \ 'b' not in mode: # Force opening of the file in binary mode mode = mode + 'b' in both Python 2 and 3, so I'm not sure what could be causing this.
msg92001 - (view)	Author: Ryan McGuire (EnigmaCurry)	Date: 2009-08-27 13:29
Uploading a doctest for this. The tests are successful on Linux using Python 2.6 They fail on Win32 with Python 2.6
msg92101 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-08-31 06:33
I think your test is invalid: it creates the file in "w" mode, so \n are written as two bytes \r\n on the disk. codecs.open just reads them back.

History
Date	User	Action	Args
2022-04-11 14:56:52	admin	set	github: 51037
2010-04-29 18:00:18	terry.reedy	set	status: pending -> closed
2009-08-31 06:33:35	amaury.forgeotdarc	set	status: open -> pendingnosy: + amaury.forgeotdarcmessages: + resolution: not a bug
2009-08-27 13:29:29	EnigmaCurry	set	files: + codecs_bug.pymessages: +
2009-08-27 09:36:43	lemburg	set	nosy: + lemburgmessages: +
2009-08-27 04:17:07	EnigmaCurry	create