msg29680 - (view) |
Author: GaryD (gazzadee) |
Date: 2006-08-25 05:52 |
When I use an existing file object as stdin for a call to subprocess.Popen, then Popen cannot read the file if I have called seek on it more than once. eg. in the following python code: >>> import subprocess >>> rawfile = file('hello.txt', 'rb') >>> rawfile.readline() 'line 1\n' >>> rawfile.seek(0) >>> rawfile.readline() 'line 1\n' >>> rawfile.seek(0) >>> process_object = subprocess.Popen(["cat"], stdin=rawfile, stdout=subprocess.PIPE, stderr=subprocess.PIPE) process_object.stdout now contains nothing, implying that nothing was on process_object.stdin. Note that this only applies for a non-trivial seek (ie. where the file-pointer actually changes). Calling seek(0) multiple times in a row does not change anything (obviously). I have not investigated whether this reveals a problem with seek not changing the underlying file descriptor, or a problem with Popen not handling the file descriptor properly. I have attached some complete python scripts that demonstrate this problem. One shows cat working after calling seek once, the other shows cat failing after calling seek twice. Python version being used: Python 2.4.2 (#1, Nov 3 2005, 12:41:57) [GCC 3.4.3-20050110 (Gentoo Linux 3.4.3.20050110, ssp-3.4.3.20050110-0, pie-8.7 on linux2 |
|
|
msg29681 - (view) |
Author: lplatypus (ldeller) * |
Date: 2006-08-25 07:13 |
Logged In: YES user_id=1534394 I found the cause of this bug: A libc FILE* (used by python file objects) may hold a different file offset than the underlying OS file descriptor. The posix version of Popen._get_handles does not take this into account, resulting in this bug. The following patch against svn trunk fixes the problem. I don't have permission to attach files to this item, so I'll have to paste the patch here: Index: subprocess.py =================================================================== --- subprocess.py (revision 51581) +++ subprocess.py (working copy) @@ -907,6 +907,12 @@ else: # Assuming file-like object p2cread = stdin.fileno() + # OS file descriptor's file offset does not necessarily match + # the file offset in the file-like object, so do an lseek: + try: + os.lseek(p2cread, stdin.tell(), 0) + except OSError: + pass # file descriptor does not support seek/tell if stdout is None: pass @@ -917,6 +923,12 @@ else: # Assuming file-like object c2pwrite = stdout.fileno() + # OS file descriptor's file offset does not necessarily match + # the file offset in the file-like object, so do an lseek: + try: + os.lseek(c2pwrite, stdout.tell(), 0) + except OSError: + pass # file descriptor does not support seek/tell if stderr is None: pass @@ -929,6 +941,12 @@ else: # Assuming file-like object errwrite = stderr.fileno() + # OS file descriptor's file offset does not necessarily match + # the file offset in the file-like object, so do an lseek: + try: + os.lseek(errwrite, stderr.tell(), 0) + except OSError: + pass # file descriptor does not support seek/tell return (p2cread, p2cwrite, c2pread, c2pwrite, |
|
|
msg29682 - (view) |
Author: Peter Åstrand (astrand) *  |
Date: 2007-01-21 19:43 |
It's not obvious that the subprocess module is doing anything wrong here. Mixing streams and file descriptors is always problematic and should best be avoided (http://ftp.gnu.org/gnu/Manuals/glibc-2.2.3/html_node/libc_232.html). However, the subprocess module *does* accept a file object (based on a libc stream), for convenience. For things to work correctly, the application and the subprocess module needs to cooperate. I admit that the documentation needs improvement on this topic, though. It's quite easy to demonstrate the problem, you don't need to use seek at all. Here's a simple test case: import subprocess rawfile = file('hello.txt', 'rb') rawfile.readline() p = subprocess.Popen(["cat"], stdin=rawfile, stdout=subprocess.PIPE, stderr=subprocess.PIPE) print "File contents from Popen() call to cat:" print p.stdout.read() p.wait() The descriptor offset is at the end, since the stream buffers. http://ftp.gnu.org/gnu/Manuals/glibc-2.2.3/html_node/libc_233.html describes the need for "cleaning up" a stream, when you switch from stream functions to descriptor functions. This is described at http://ftp.gnu.org/gnu/Manuals/glibc-2.2.3/html_node/libc_235.html#SEC244. The documentation recommends the fclean() function, but it's only available on GNU systems and not in Python. As I understand it, fflush() works good for cleaning an output stream. For input streams, however, things are difficult. fflush() might work sometimes, but to be sure, you must set the file pointer as well. And, this does not work for files that are not random access, since there's no way of move the buffered data back to the operating system. So, since subprocess cannot reliable deal with this situation, I believe it shouldn't try. I think it makes more sense that the application prepares the file object for low-level operations. There are many other Python modules that uses the .fileno() method, for example the select() module, and as far as I understand, this module doesn't try to clean streams or anything like that. To summarize: I'm leaning towards a documentation solution. |
|
|
msg29683 - (view) |
Author: lplatypus (ldeller) * |
Date: 2007-01-22 01:23 |
Fair enough, that's probably cleaner and more efficient than playing games with fflush and lseek anyway. If file objects are not supported properly then maybe they shouldn't be accepted at all, forcing the application to call fileno() if that's what is wanted. That might break a lot of existing code though. Then again it may be beneficial to get everyone to review code which passes file objects to Popen in light of this behaviour. |
|
|
msg84478 - (view) |
Author: Daniel Diniz (ajaksu2) *  |
Date: 2009-03-30 03:23 |
Not a bug, leaving open for the doc RFE (but suggest closing anyway). |
|
|
msg113204 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2010-08-07 21:10 |
In the absence of a doc patch, I am following Daniel's suggestion to close this. |
|
|