subprocess.py (Python 2.5, current SVN, probably all versions) contains this O(N**2) code: bytes_written = os.write(self.stdin.fileno(), input[:512]) input = input[bytes_written:] For large but reasonable "input" the second line is rate limiting. Luckily, it is very easy to remove this bottleneck. I'll upload a simple patch. Below is a small script that demonstrates the huge speed difference. The output on my machine is: creating input 0.888417959213 slow slicing input 61.1553330421 creating input 0.863168954849 fast slicing input 0.0163860321045 done The numbers are times in seconds. This is the source: import time import sys size = 1000000 t0 = time.time() print "creating input" input = "\n".join([str(i) for i in xrange(size)]) print time.time()-t0 t0 = time.time() print "slow slicing input" n_out_slow = 0 while True: out = input[:512] n_out_slow += 1 input = input[512:] if not input: break print time.time()-t0 t0 = time.time() print "creating input" input = "\n".join([str(i) for i in xrange(size)]) print time.time()-t0 t0 = time.time() print "fast slicing input" n_out_fast = 0 input_done = 0 while True: out = input[input_done:input_done+512] n_out_fast += 1 input_done += 512 if input_done >= len(input): break print time.time()-t0 assert n_out_fast == n_out_slow print "done"
I reviewed the patch--the proposed fix looks good. Minor comments: - "input_done" sounds like a flag, not a count of written bytes - buffer() could be used to avoid the 512-byte copy created by the slice