Issue 29817: File IO r+ read, write, read causes garbage data write. (original) (raw)

Issue29817

Created on 2017-03-15 09:47 by jan, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg289657 - (view) Author: Jan Pijpers (jan) Date: 2017-03-15 09:47
In Python 2.7.12 when reading, writing and subsequently reading again from a file, python seems to write garbage. For example when running this in python IDLE: import os testPath = r"myTestFile.txt" ## Make sure the file exists and its empty with open(testPath,"w") as tFile: tFile.write("") print "Our Test File: ", os.path.abspath(testPath ) with open(testPath, "r+") as tFile: ## First we read the file data = tFile.read() ## Now we write some data tFile.write('Some Data') ## Now we read the file again tFile.read() When now looking at the file the data is the following: Some Data @ sb d Z d d l m Z d d d ・ ・ YZ e d k r^ d d l m Z e d d d d e ・n d S( s9 Implement Idle Shell history mechanism with History ... As mentioned in the comments on stack overflow ( see link ) this might be a buffer overrun but I am not sure. Also I guess this could be used as a security vulnerability... http://stackoverflow.com/questions/40373457/python-r-read-write-read-writes-garbage-to-a-file?noredirect=1#comment72580538_40373457
msg289675 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-03-15 13:39
This is a bug in the C runtime's handling of "r+" mode with buffering. The CRT FILE stream's internal _cnt field, from the POV of the write() call, is the number of bytes that can be written to the internal buffer before it's full. The default buffer size is 4096 bytes. Thus after writing "Some Data", _cnt is at 4096 - 9 == 4087 bytes. On the other hand, from the POV of the subsequent read() call, this means there are 4087 bytes in the buffer available to be read. If you change your code to keep the result, you'll see that it is indeed 4087 bytes. After the read, _cnt is at 0 and the stream's internal _ptr and _base pointers indicate a full buffer, which gets flushed to disk when the file is closed. If you change your code to print os.path.getsize(testPath) after the file is closed, then you should see that the size is 4096 bytes -- exactly one buffer. If you open the file with buffering=512, then this changes to 503 bytes read and creates a 512 byte file. Can and should Python do anything to work around this problem in the CRT? Or should this issue simply be closed as 3rd party? I'm inclined to close it.
msg289676 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-03-15 14:32
Also, this is a Python 2 only issue. The problem doesn't happen in Python 3.6 (at least in my quick experiment). I'm not 100% sure if this is because the internal implementation of IO changed in 3.x, or if it's just because we're now using a newer CRT which has fixed the issue. I agree that there's no point in Python trying to work around this behaviour.
History
Date User Action Args
2022-04-11 14:58:44 admin set github: 74003
2017-03-15 14:32:50 paul.moore set status: open -> closedstage: resolved
2017-03-15 14:32:25 paul.moore set resolution: third partymessages: +
2017-03-15 13:39:51 eryksun set nosy: + eryksunmessages: +
2017-03-15 09:48:09 jan set title: File IO read, write, read causes garbage data write. -> File IO r+ read, write, read causes garbage data write.
2017-03-15 09:47:30 jan create