Issue 30228: Open a file in text mode requires too many syscalls (original) (raw)
Example:
with open("x", "w", encoding="utf-8") as fp: fp.write("HERE") fp.close()
syscalls:
14249 open("x", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3 14249 fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0 14249 ioctl(3, TCGETS, 0x7fff07d43330) = -1 ENOTTY (Inappropriate ioctl for device) 14249 lseek(3, 0, SEEK_CUR) = 0 14249 lseek(3, 0, SEEK_CUR) = 0 14249 lseek(3, 0, SEEK_CUR) = 0 14249 write(3, "HERE", 4) = 4 14249 close(3) = 0
I only expected 3 syscalls: open, write, close.
- fstat() is used by the FileIO constructor to check if the file is a directory or not, and to get the block size
- ioctl() comes from open() which checks if the file is a TTY or not, to decide how to configure buffering
- the first lseek() is used by the BuffererWriter constructor to initialize the private abs_pos attribute
- the second lseek() is used by the TextIOWrapper constructor to check if the underlying file object (buffered writer) is seekable or not
- the last lseek() is used to create the cookie object in TextIOWrapper constructor
Can we maybe reduce the number of lseek() to a single syscall?
For example, BuffererWriter constructor calls FileIO.tell(): can't this method set the seekable attribute depending on lseek() success, as the FileIO.seekable property?
You could call buffer.seek(0, SEEK_CUR) rather than buffer.tell() for avoiding a system call for readable stream. But this looks as a shamanism too.
Note: Buffered.seek(0, SEEK_CUR) only has a fast-path for readable file: it cannot be used to optimize open(filename, "w") (BufferedWriter.seek() isn't optimized).
Microbenchmark on Fedora 26 for https://github.com/python/cpython/pull/1385
Working directly uses ext4, the filesystem operations are likely cached in memory, so syscalls should be very fast.
$ ./python -m perf timeit --inherit=PYTHONPATH 'open("x.txt", "w").close()' -o open_ref.json -v $ ./python -m perf timeit --inherit=PYTHONPATH 'open("x.txt", "w").close()' -o open_patch.json -v $ ./python -m perf compare_to open_ref.json open_patch.json Mean +- std dev: [open_ref] 18.6 us +- 0.2 us -> [open_patch] 18.2 us +- 0.2 us: 1.02x faster (-2%)
Microbenchmark using a btrfs filesystem mounted on NFS over wifi: not significant!
$ ./python -m perf timeit --inherit=PYTHONPATH 'open("nfs/x.txt", "w").close()' --append open_patch.json -v $ ./python -m perf timeit --inherit=PYTHONPATH 'open("nfs/x.txt", "w").close()' --append open_patch.json -v haypo@selma$ ./python -m perf compare_to open_ref.json open_patch.json -v Mean +- std dev: [open_ref] 17.8 ms +- 1.0 ms -> [open_patch] 17.8 ms +- 1.0 ms: 1.00x faster (-0%) Not significant!
Note: open().close() is 1000x slower over NFS!
According to strace, on NFS, open() and close() are slow, but syscalls in the middle are as fast as syscalls on a local filesystem.
Well, it's hard to see a significant speedup, even on NFS. So I abandon my change.