[Python-Dev] Why do we flush before truncating? (original) (raw)

Guido van Rossum guido at python.org
Sat Sep 6 12:53:36 EDT 2003


http://www.python.org/sf/801631

gives a failing program on Windows, paraphrased: f = file('test.dat', 'wb') f.write('1234567890') # 10 bytes f.close() f = file('test.dat','rb+') f.read(5) print f.tell() # prints 5, as expected f.truncate() # leaves the file at 10 bytes print f.tell() # prints 10

The problem is that fileobject.c's filetruncate() calls fflush() before truncating. The C standard says that the effect of calling fflush() is undefined if the most recent operation on a stream opened for update was an input operation. The stream is indeed opened for update here, and the most recent operation performed by the user was indeed a read. It so happens that MS's fflush() changes the file position then. But the user didn't call fflush(), Python did, so we can't blame the user for relying on undefined behavior here. The problem can be repaired inside filetruncate() by seeking back to the original file position after the fflush() call -- but the original file position isn't always available now, so I'd also have to add another call to portableftell() before the fflush() to find it. So that gets increasingly complicated. Much simpler would be to remove this block of code (which does fix the test program's problem on Windows, by simply getting rid of the undefined operation): /* Flush the file. */ PyBEGINALLOWTHREADS errno = 0; ret = fflush(f->ffp); PyENDALLOWTHREADS if (ret != 0) goto onioerror; I don't understand why we're flushing the file. ftruncate() isn't a standard C function, so the standard sheds no light on why we might be doing that. AFAICT, POSIX/SUS doesn't give a reason to flush either: http://www.opengroup.org/onlinepubs/007904975/functions/ftruncate.html

ftruncate() is not a standard C function; it's a standard Unix system call. It works on a file descriptor (i.e. a small int), not on a stream (i.e. a FILE *). The fflush() call is necessary if the last call was a write, because in that case the stream's buffer may contain data that the OS file descriptor doesn't have yet.

But ftruncate() is irrelevant, because on Windows, it is never called; there's a huge #ifdef MS_WINDOWS block containing Windows specific code, starting with the comment

/* MS _chsize doesn't work if newsize doesn't fit in 32 bits,
   so don't even try using it. */

and the ftruncate() call is made in the #else part.

It also looks like the MS_WINDOWS specific code block does attempt to record the current file position and seek back to it -- however it does this after fflush() has already messed with it. So perhaps moving the fflush() call into the #else part and doing something Windows-specific instead of calling fflush() to ensure the buffer is flushed inside the MS_WINDOWS part would be the right solution.

I just realize that I have always worked under the assumption that fflush() after a read is a no-op; I just checked the 89 std and it says it is undefined. (I must have picked up that misunderstanding from some platform-specific man page.) This can be fixed by doing a ftell() followed by an fseek() call; this is required to flush the buffer if there was unwritten output data in the buffer, and is always allowed.

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list