Issue 4194: default subprocess.Popen buffer size (original) (raw)

Issue4194

process

Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: LambertDW, bgh, georg.brandl, giampaolo.rodola, gregory.p.smith, guettli, l0nwlf, pitrou, sameerd, vstinner, wplappert
Priority: normal Keywords:

Created on 2008-10-24 17:54 by skip.montanaro, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
popentest.py skip.montanaro,2008-10-24 17:54
Messages (27)
msg75173 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-10-24 17:54
I noticed a colleague at work today checked in a change to his code to switch back from subprocess.Popen to os.popen. I asked him about it and he reported that subprocess.Popen was about 10x slower than os.popen. I asked him for a simple test case, which is attached. Here are my results with Python 2.4 through 2.7 (aka CVS HEAD): tmp% python2.4 popentest.py time with os.popen : 0.09 time with subprocess.Popen : 2.27 tmp% python2.5 popentest.py time with os.popen : 0.03 time with subprocess.Popen : 1.52 tmp% python2.6 popentest.py time with os.popen : 0.038824 time with subprocess.Popen : 1.517056 tmp% python2.7 popentest.py time with os.popen : 0.033746 time with subprocess.Popen : 1.512331 These times are on my Mac laptop, all writing to the local disk. It seems there was a bit of improvement in the 2.5 release but that it is still miserably slow when compared with os.popen. His original test used time.clock() as the timer. I changed to time.time() but got essentially the same result. Skip
msg75177 - (view) Author: Winfried Plappert (wplappert) Date: 2008-10-24 18:55
Hi Skip, I find different measurements om Windows/XP: I copied the script and ran it under Python 2.5.2 and Python 2.6 (as downloaded from http://python.org/ftp/python/2.6/python-2.6.msi): Plappert@action-time /cygdrive/e/tmp $ python ./timing.py # 2.5.2 time with os.popen : 0.296999931335 time with subprocess.Popen : 0.203000068665 in other words: identical or subprocess.Popen even faster. and it is getting even better with Python2.6: Plappert@action-time /cygdrive/e/tmp $ python26 timing.py # 2.6 time with os.popen : 0.266000032425 time with subprocess.Popen : 0.18700003624 So could that be a Mac OS issue?
msg75178 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-10-24 19:16
Good question. I don't think it's MacOSX-specific. The original problem raised its ugly head on Solaris 10. I don't have quite as many versions of Python available there, but the relative performance is still bad, roughly 0.05 for os.popen and 0.82 for subprocess.Popen. Not as bad as on the Mac, but certainly not as good as what you saw.
msg75179 - (view) Author: Sameer (sameerd) Date: 2008-10-24 19:25
The subprocess module does different things depending on whether the systems are Windows or Posix. This could explain the bad performance on Solaris and the Mac and the good performance on Windows. If this is correct, then we should see poor performance on Linux also. Also, I don't have a box to test that out.
msg75180 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-10-24 19:35
Good suggestion Sameer. I tried it out with Python 2.5 on a Linux host here and saw essentially identical results for the two alternatives (~ 0.08s). S
msg75181 - (view) Author: David W. Lambert (LambertDW) Date: 2008-10-24 19:58
cygwin Python 2.5.1 (similar) time with os.popen : 0.439999818802 time with subprocess.Popen : 0.361000061035 linux python 2.4.2 (similar) time with os.popen : 0.0344939231873 time with subprocess.Popen : 0.0354421138763 linux python3K (both awful) time with os.popen : 0.715471029282 time with subprocess.Popen : 1.58627915382
msg75182 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-10-24 20:44
I don't expect Python3 to be all that great io performance-wise yet. Still, for me on the Mac os.popen beats subprocess.Popen pretty handily: % python3.0 popentest.py time with os.popen : 0.874988 time with subprocess.Popen : 2.112976
msg75184 - (view) Author: Winfried Plappert (wplappert) Date: 2008-10-24 21:52
Here are my figures from a different processor on Linux (Ubuntu): wplapper@lin-wpl:~/python$ python2.5 popentest.py time with os.popen : 0.0478880405426 time with subprocess.Popen : 0.0596849918365 wplapper@lin-wpl:~/python$ python2.6 popentest.py time with os.popen : 0.0481159687042 time with subprocess.Popen : 0.0592260360718 uname -a: ... 2.6.24-19-generic #1 SMP Wed Aug 20 22:56:21 UTC 2008 i686 GNU/Linux
msg75190 - (view) Author: Winfried Plappert (wplappert) Date: 2008-10-24 22:17
Yes, I can confirm that the performance is lousy on Solaris. Solaris-9/Python 2.5.1: time with os.popen : 0.124045133591 time with subprocess.Popen : 1.60335588455 Solaris-9/Python 2.6: time with os.popen : 0.115752220154 time with subprocess.Popen : 1.61768198013 Solaris-10/Python 2.4.4: time with os.popen : 0.124130010605 time with subprocess.Popen : 1.45624995232 Solaris-10/Python 2.6: time with os.popen : 0.113790988922 time with subprocess.Popen : 1.42739701271 All machines are idendital in processor speed: v240.
msg75204 - (view) Author: Winfried Plappert (wplappert) Date: 2008-10-25 08:38
Hi is the dramatic difference on Solaris-10 / Python2.6: I dtraced the popentest.py and counted syscalls: with os_popen: read = 243 with process:Popen read = 589018 That explains a lot! The rest of the system calls are similir in count and appearance. I have not looked into the C-source (yet).
msg75205 - (view) Author: Winfried Plappert (wplappert) Date: 2008-10-25 08:39
s/Hi is/Hi, here is/ :)
msg75208 - (view) Author: Winfried Plappert (wplappert) Date: 2008-10-25 13:23
The created testfile size is 588890 bytes, which implies that subprocess.Popen reads the file in completely unbuffered mode, one byte at a time. If I modify the popentest.py programme by specifying a bufsize of 1_000_000, the execution time drops quite a bit: time with os.popen 0.069 sec time with subprocess.Popen 0.118 sec This Solaris 9/Python 2.6. I think I have got it: Lib/subprocess.py should have a default bufsize of -1, not 0! I tested it by modifying the subprocess.Popen call and here is the result: time with os.popen 0.086 sec time with subprocess.Popen 0.080 sec See also Modules/posixmodule.c.
msg75209 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-10-25 13:50
Using a nonzero bufsize parameter makes all the difference in the world: Using the default (bufsize=0 ==> unbuffered): % python popentest.py time with os.popen : 0.035032 time with subprocess.Popen : 1.496455 Creating the Popen object with bufsize=8192: % python popentest.py time with os.popen : 0.034642 time with subprocess.Popen : 0.041376 Creating the Popen object with bufsize=1 (==> line buffered): % python popentest.py time with os.popen : 0.034658 time with subprocess.Popen : 0.04211 Maybe the default for that parameter shouldn't be zero? Skip
msg75211 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-10-25 14:11
>> Using a nonzero bufsize parameter makes all the difference in the >> world: ... In fact, looking at posix_popen in posixmodule.c it appears the default value for bufsize there is -1, implying that I/O is fully buffered. Even if the default bufsize value for subprocess.Popen is not changed its documentation should be corrected to reflect the different default values for bufsize between os.popen and subprocess.Popen: Replacing os.popen* ------------------- pipe = os.popen(cmd, mode='r', [bufsize]) ==> pipe = Popen(cmd, shell=True, bufsize=bufsize, stdout=PIPE).stdout If you use the default bufsize with os.popen you must set bufsize to -1 when creating a Popen object. Otherwise your I/O will be unbuffered and you will be disappointed at the I/O performance.
msg75218 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-10-25 19:03
I've been thinking about it, and I think even though it would be a slight change to the API, I agree with Winfried that the default value for bufsize should be -1, not 0. In my own use of os.popen and friends, almost all the time I use them to process the output of a traditional Unix pipeline, which means the output will be plain text, line buffered. In those cases it makes absolutely no sense to read from the pipe one character at a time. S
msg75219 - (view) Author: Winfried Plappert (wplappert) Date: 2008-10-25 21:04
On the other hand, we will silently break all those applications which are out there relying on the fact that a pipe is an unbuffered device. You might consider it for Python 3.0, but I don't know if it is a good idea for Python 2.x. The documentation of subprocess.Popen clear states that the default is unbuffered, bufsize=0. I quote from http://docs.python.org/library/subprocess.html: bufsize, if given, has the same meaning as the corresponding argument to the built-in open() function: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size. A negative bufsize means to use the system default, which usually means fully buffered. The default value for bufsize is 0 (unbuffered).
msg76288 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-11-24 00:59
About Python3, os.popen() is more than two times faster (0.20 sec vs 0.50 sec) than subprocess.Popen()! It's amazing because popen() opens the standard output as unicode file whereas Popen() creates a binary file! Another funny thing: os.popen() calls subprocess.Popen() :-) The difference is just this instruction: stdout = io.TextIOWrapper(stdout)
msg76291 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-11-24 01:30
Summary of unchanged Python (2.4 .. 2.7): * Mac: subprocess is 25 .. 50 times SLOWER than os.popen * Solaris : subprocess is 13 times SLOWER than os.popen * Windows XP : subprocess is 1.5 times FASTER than os.popen * Linux : (results are very close) With a different buffer size: * Solaris : Popen(bufsize=-1) is FASTER than os.popen() * Mac : Popen(bufsize=1) and Popen(bufsize=8192) are a little bit slower than os.popen(), but much FASTER than Popen(bufsize=0) Notes: - PyFile_SetBufSize(bufsize) does nothing if bufsize < 0: keep system default (buffer of BUFSIZE bytes) - On Ubuntu Gutsy, system default (BUFSIZ) is 8192 bytes
msg76292 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-11-24 01:47
If anything for 2.6 lets just highlight this in the docs and mention that bufsize needs to be set to non-zero for good performance on many platforms such as Mac OS X and Solaris. We can consider changing the default for 2.7/3.1. 3.x having poor performance is pretty much another issue entirely of its own..
msg76325 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-11-24 15:11
Victor> About Python3, os.popen() is more than two times faster (0.20 Victor> sec vs 0.50 sec) than subprocess.Popen()! It's amazing because Victor> popen() opens the standard output as unicode file whereas Victor> Popen() creates a binary file! Another funny thing: os.popen() Victor> calls subprocess.Popen() :-) The difference is just this Victor> instruction: Victor> stdout = io.TextIOWrapper(stdout) This is a known issue. The default for bufsize in os.popen is -1 (fully buffered? line buffered?). The default for bufsize in subprocess.Popen is 0 (unbuffered). I think it should have been changed but was voted down. I think the best you can do at this point is document the change, perhaps in the "Replacing os.popen" section. Skip
msg76328 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-11-24 15:32
> Victor> About Python3, os.popen() is more than two times faster (...) > Victor> The difference is just this instruction: > Victor> stdout = io.TextIOWrapper(stdout) > > This is a known issue. The default for bufsize in os.popen is -1 (fully > buffered? line buffered?). The default for bufsize in subprocess.Popen is > 0 (unbuffered). Wrong, it's not related to the buffer size. With Python3 trunk on Linux, os.popen time is ~0.10 sec whereas subprocess.Popen is ~0.25 sec. Change the buffer size of subprocess doesn't help: - (default) 0.25 - buffering = (-1): 0.25 - buffering = 1: 0.25 - buffering = 8192: 0.26 - buffering = 16384: 0.26 (it's a little big slower with a bigger buffer...) You get the same speed (than os.popen) using TextIOWrapper() adapter: [i for i in read_handle] => 0.25 sec [i for i in io.TextIOWrapper(read_handle)] => 0.10 sec WTF? Unicode is *FASTER* than raw bytes?
msg83723 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-03-18 00:21
I just wanna say that buffering can be a problem when writing, but not when reading. If you read() from a buffered file, you still get the available contents immediately, you don't have to wait for the buffer to be full.
msg83725 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-18 01:18
The strange performance between bytes/text (BufferedReader/TextIOWrapper) may be explained by the issue #5502.
msg98054 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-19 13:50
Can someone check if this still applies to Python 3.1/3.2? By the way, I see no reason not to buffer at least the subprocess's stdout and stderr streams.
msg98062 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-01-19 16:14
Looks good to me: tmp% python3.1 popentest.py time with os.popen : 0.035565 time with subprocess.Popen : 0.031796 tmp% python3.2 popentest.py time with os.popen : 0.03501 time with subprocess.Popen : 0.031168 tmp% python3.1 Python 3.1.1+ (release31-maint:77485M, Jan 13 2010, 19:53:41) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> tmp% python3.2 Python 3.2a0 (py3k:77484:77485, Jan 13 2010, 19:49:33) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information.
msg98970 - (view) Author: Shashwat Anand (l0nwlf) Date: 2010-02-06 23:20
Tested it on mac OSX (Snow leopard) Shashwat-Anands-MacBook-Pro:Desktop l0nwlf$ python2.5 popentest.py time with os.popen : 0.0342061519623 time with subprocess.Popen : 0.0421631336212 Shashwat-Anands-MacBook-Pro:Desktop l0nwlf$ python2.6 --version Python 2.6.1 Shashwat-Anands-MacBook-Pro:Desktop l0nwlf$ python2.6 popentest.py time with os.popen : 0.0282561779022 time with subprocess.Popen : 0.0366630554199 Python 2.5.4s os.popen was faster than subprocess.Popen, the same being the case with Python 2.6.1 I do not think it is a mac issue.
msg115645 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-05 13:08
The subprocess doc now has a note about buffering and performance issues, closing.
History
Date User Action Args
2022-04-11 14:56:40 admin set github: 48444
2010-09-05 13:08:43 pitrou set status: open -> closedresolution: out of datemessages: +
2010-05-20 20:38:33 skip.montanaro set nosy: - skip.montanaro
2010-02-06 23:20:35 l0nwlf set nosy: + l0nwlfmessages: +
2010-01-19 16:14:53 skip.montanaro set messages: +
2010-01-19 13:50:31 pitrou set messages: + versions: - Python 2.6, Python 2.5, Python 2.4, Python 3.0
2010-01-19 09:59:20 guettli set nosy: + guettli
2009-03-18 01🔞16 vstinner set messages: +
2009-03-18 00:21:48 pitrou set nosy: + pitroumessages: +
2008-11-24 15:32:24 vstinner set messages: +
2008-11-24 15:11:56 skip.montanaro set messages: +
2008-11-24 01:47:41 gregory.p.smith set nosy: + gregory.p.smithmessages: +
2008-11-24 01:30:33 vstinner set messages: +
2008-11-24 00:59:16 vstinner set nosy: + vstinnermessages: +
2008-11-03 17:08:08 vstinner set title: Miserable subprocess.Popen performance -> default subprocess.Popen buffer size
2008-11-02 06:24:30 bgh set nosy: + bgh
2008-10-25 21:04:31 wplappert set messages: +
2008-10-25 19:03:06 skip.montanaro set messages: +
2008-10-25 18:26:10 giampaolo.rodola set nosy: + giampaolo.rodola
2008-10-25 14:11:33 skip.montanaro set assignee: georg.brandlmessages: + nosy: + georg.brandlcomponents: + Documentation, Library (Lib)versions: + Python 3.0, Python 2.7
2008-10-25 13:50:32 skip.montanaro set messages: +
2008-10-25 13:23:38 wplappert set messages: +
2008-10-25 08:39:41 wplappert set messages: +
2008-10-25 08:38:40 wplappert set messages: +
2008-10-24 22:37:31 wplappert set type: performanceversions: + Python 2.6, Python 2.5, Python 2.4
2008-10-24 22:17:13 wplappert set messages: +
2008-10-24 21:52:22 wplappert set messages: +
2008-10-24 20:44:33 skip.montanaro set messages: +
2008-10-24 19:58:06 LambertDW set nosy: + LambertDWmessages: +
2008-10-24 19:35:03 skip.montanaro set messages: +
2008-10-24 19:25:35 sameerd set messages: +
2008-10-24 19:16:39 skip.montanaro set messages: +
2008-10-24 18:55:25 wplappert set nosy: + wplappertmessages: +
2008-10-24 18:30:29 sameerd set nosy: + sameerd
2008-10-24 17:54:41 skip.montanaro create