[Python-Dev] Why do we flush before truncating? (original) (raw)

Tim Peters tim.one at comcast.net
Sat Sep 6 23:40:16 EDT 2003

Previous message: [Python-Dev] Why do we flush before truncating?
Next message: [Python-Dev] Why do we flush before truncating?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[Neil Schemenauer]

The fflush call as been there forever. The truncate method was added in 2.36 by Guido. I think the code was actually from Jim Roskind:

http://groups.google.com/groups?selm=199412070213.SAA06932%40infoseek.com He says: Note that since the underlying ftruncate operates on a file descriptor (believe it or not), it was necessary to fflush() the stream before performing the truncate. I thought about doing a seek() as well, but could not find compelling reason to move the stream pointer. That still gives me no clue as to why the fflush() was deemed necessary.

Ack, I glossed over the fileno() call in our file_truncate(). It's usually a Very Bad Idea to mix stream I/O and lower-level I/O operations without flushing your pants off, but I'm having a hard time thinking of a specific reason for doing so in the truncate case. Better safe than trying to out-think all possible implementations, though!

I found this posting:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=35E 0DB62.1BDD2D30%40taraz.kz&rnum=1 but, AFACK, the reason the program is not working the way the poster expects is the missing fflush() call before the lseek() call (not the fflush() before the ftruncate()).

I think that's right. In the Python case, I verified in a debugger that the file position is 5 immediately before the fflush() call, and 10 immediately after it. It's surprising, but apparently OK by the C std.

[Guido

ftruncate() is not a standard C function;

I suppose that clarifies my immediately preceding

ftruncate() isn't a standard C function,

it's a standard Unix system call.

Yes, and I gave a link to the current POSIX/SUS ftruncate() specification.

It works on a file descriptor (i.e. a small int), not on a stream (i.e. a FILE *).

Right, and I missed that, primarily because Windows doesn't have ftruncate() so I wasn't looking at that part of the code.

The fflush() call is necessary if the last call was a write, because in that case the stream's buffer may contain data that the OS file descriptor doesn't have yet.

I'm not really clear on why that should matter in the specific case of truncating a file, but will just live with it.

But ftruncate() is irrelevant, because on Windows, it is never called; there's a huge #ifdef MSWINDOWS block containing Windows specific code ...

Right, I wrote that code. Windows has no way to say "here's a file, change the size to such-and-such"; the only way is to set the file pointer to the desired size, and then call the no-argument Win32 SetEndOfFile(); Python used to use the MS C _chsize() function, but that did insane things when passed a "large" size; the SetEndOfFile() code was introduced as part of fixing Python's Windows largefile support.

... It also looks like the MSWINDOWS specific code block does attempt to record the current file position and seek back to it

Yes, because the file position must be changed on Windows in order to change the file size, but Python's docs promise that file.truncate() doesn't change the current position (which is natural behavior under POSIX ftruncate() but strained on Windows).

-- however it does this after fflush() has already messed with it.

Note that in the Windows test case, it's not simply that the current position wasn't preserved across the file.truncate() call, it's also that the file didn't change size. It's very easy to fix the former while leaving the latter broken.

So perhaps moving the fflush() call into the #else part and doing something Windows-specific instead of calling fflush() to ensure the buffer is flushed inside the MSWINDOWS part would be the right solution.

I just realize that I have always worked under the assumption that fflush() after a read is a no-op; I just checked the 89 std and it says it is undefined. (I must have picked up that misunderstanding from some platform-specific man page.) This can be fixed by doing a ftell() followed by an fseek() call; this is required to flush the buffer if there was unwritten output data in the buffer, and is always allowed.

That's what I was hoping to avoid, but I don't care anymore: after staring it some more, I'm convinced that the current file_truncate() endures a ridiculous amount of complexity trying to gain a tiny bit of speed in what has to be a rare operation.

Previous message: [Python-Dev] Why do we flush before truncating?
Next message: [Python-Dev] Why do we flush before truncating?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list