Issue 15723: Python breaks OS' append guarantee on file writes (original) (raw)
Issue15723
Created on 2012-08-18 20:19 by bsdphk, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (10) | ||
---|---|---|
msg168528 - (view) | Author: Poul-Henning Kamp (bsdphk) | Date: 2012-08-18 20:19 |
When a file is opened in append mode, the operating system guarantees that all write(2) system calls atomically appended their payload to the file. At least on FreeBSD, Python breaks this guarantee, by chopping up large writes into multiple write(2) syscalls to the OS. Try running this program using ktrace/truss/strace or a similar system-call tracing facility: fo = open("/tmp/_bogus", "ab", 0) fo.write(bytearray(1024*1024)) fo.close() Instead of one single megabyte write, I see 1024 kilobyte writes. (BTW: Why only one kilobyte ? That is an incredible pessimisation these days...) I leave it to the python community to decide if this should be fixed, or merely pointed out in documentation (os.write() is a workaround) | ||
msg168529 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2012-08-18 20:22 |
> When a file is opened in append mode, the operating system guarantees > that all write(2) system calls atomically appended their payload to the > file. Does it? I don't see such strong guarantees in http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html In any case, Python 2 uses fwrite() not write(), so it may be the explanation. Do you observe the same behaviour when using io.open() instead of open()? (io.open() is the Python 3 IO stack backported to Python 2) | ||
msg168530 - (view) | Author: Poul-Henning Kamp (bsdphk) | Date: 2012-08-18 20:24 |
Yes, it does: If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation. | ||
msg168531 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2012-08-18 20:25 |
Ah, sorry. I was stupidly looking for "atomic" and only found the pipe-specific remarks. (the other points remain, though :-)) | ||
msg168532 - (view) | Author: Poul-Henning Kamp (bsdphk) | Date: 2012-08-18 20:30 |
I have not tried io.open(), nor would I suspect most users would realize that they needed to do so, in order to get the canonical behaviour from an operation called "write" on a file opened in "append" mode. IMO: If pythons file.write() does not give the guarantee POLA would indicate, it's either a bug or a doc-issue, no matter how many workarounds might exist. But I have neither a clue to the aspirational goals of python, nor to what it might take to fix this, so it's entirely your call. | ||
msg168533 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2012-08-18 20:36 |
> I have not tried io.open(), nor would I suspect most users would > realize that they needed to do so, in order to get the canonical > behaviour from an operation called "write" on a file opened in > "append" mode. The reason I'm asking is that open() is the same as io.open() in Python 3.x, which is currently the main development line. That said, I can find the results myself. Python 2 is in bugfix mode, so it's impossible to rewrite the I/O routines to use unbuffered I/O instead of C buffered I/O. > IMO: If pythons file.write() does not give the guarantee POLA would > indicate, it's either a bug or a doc-issue, no matter how many > workarounds might exist. What do you call POLA? > But I have neither a clue to the aspirational goals of python, nor to > what it might take to fix this, so it's entirely your call. Well as I said, Python 2 will be pretty much impossible to fix (we call fwrite() with the argument, not write()). Python 3 is a different story since we use our own buffering layer and then C's unbuffered API. As a sidenote, do you know if writev() has the same guarantee as write()? POSIX doesn't seem to say so. | ||
msg168534 - (view) | Author: Poul-Henning Kamp (bsdphk) | Date: 2012-08-18 20:50 |
POLA = Principle Of Least Astonishment We use that a lot in architectural decision in FreeBSD :-) As I said: You deal with this as you see fit. If all python2 gets is a doc- or errata-notice, that's perfectly fine with me. I interpret "The writev() function shall be equivalent to write(), except as described below." as writev() giving the same atomic append guarantee. In FreeBSD, write() is implemented using writev() and I expect that is the obvious and thus common way it is done. (You seem to be right with respect to the 1024: That is indeed still the BUFSIZ on FreeBSD, I'll work on getting that changed.) | ||
msg168539 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2012-08-18 23:14 |
Even if we write in chunks, if we are calling the OS write function and O_APPEND is set, wouldn't be satisfying the condition? Or, rather, the OS would be. That is, I don't really see a guarantee of an *atomic* write in the quoted description. | ||
msg168540 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2012-08-18 23:20 |
> Even if we write in chunks, if we are calling the OS write function > and O_APPEND is set, wouldn't be satisfying the condition? Or, > rather, the OS would be. That is, I don't really see a guarantee of > an *atomic* write in the quoted description. I'm not sure it's guaranteed to be atomic at the hardware level, but as AFAIU the updates should be atomic as seen from other processes on the same machine (i.e. filesystem cache coherency). As a side-note, I've just tested under Linux with the following script: with open("foo", "ab") as f: f.write(b"abcd") f.write(b"x" * (1024 ** 2)) Results: - on 2.7, the write buffers get sliced up (the glibc's fwrite() doesn't care about atomicity): write(3, "abcdxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 4096) = 4096 write(3, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 1044480) = 1044480 - on 3.2 and 3.3, our home-grown buffering respects the original buffers: write(3, "abcd", 4) = 4 write(3, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 1048576) = 1048576 (but that's purely out of luck, since we didn't design it with that goal :-)) | ||
msg168991 - (view) | Author: Charles-François Natali (neologix) * ![]() |
Date: 2012-08-24 09:48 |
I wouldn't rely on O_APPEND too much: - it won't work on NFS, and probably other non-local filesystems - it doesn't actually guarantee atomicity, because even though the the file offset and the write is done with locking, there is still the possibility of partial write |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:34 | admin | set | github: 59928 |
2019-12-29 05:04:43 | gvanrossum | set | status: open -> closedresolution: not a bugstage: resolved |
2012-09-10 01:13:41 | jcea | set | nosy: + jcea |
2012-08-24 09:55:08 | schmir | set | nosy: + schmir |
2012-08-24 09:48:24 | neologix | set | nosy: + neologixmessages: + |
2012-08-18 23:20:05 | pitrou | set | messages: + |
2012-08-18 23:14:01 | r.david.murray | set | nosy: + r.david.murraymessages: + |
2012-08-18 20:50:31 | bsdphk | set | messages: + |
2012-08-18 20:36:39 | pitrou | set | messages: + |
2012-08-18 20:30:08 | bsdphk | set | messages: + |
2012-08-18 20:25:13 | pitrou | set | messages: + |
2012-08-18 20:24:23 | bsdphk | set | messages: + |
2012-08-18 20:22:37 | pitrou | set | nosy: + pitroumessages: + |
2012-08-18 20:19:36 | bsdphk | create |