Issue 1706039: Added clearerr() to clear EOF state (original) (raw)
Created on 2007-04-23 17:21 by josm, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (39)
Author: John Smith (josm)
Date: 2007-04-23 17:21
This patch is to fix bug 1523853. Now file.read() works as written in the doc, which says "An empty string is returned when EOF is encountered immediately." Before this fix, An empty string was returned even when the last read() encountered EOF.
Author: Alexander Belopolsky (belopolsky) *
Date: 2007-04-23 20:22
I was not able to reproduce the problem. I tried to read through the end of the file, append to the file from an external process and read again. It worked as expected: more data was read.
The proposed patch seems to be redundant: once ferror(f->f_fp) is known to be 0, clearerr will do nothing and in the alternative branch clearerr is already there.
Author: John Smith (josm)
Date: 2007-04-23 23:27
I can easily reproduce this problem on OS X 10.4.9 using Python 2.4.3 and Python 2.5.
Here's the code ################# import os, sys filename = "spam"
f = open(filename, "w+") f.seek(0, 2)
line = f.read() # EOF
Writing the same file using another fd
f2 = open(filename, "a+") f2.write("Spam") f2.flush() f2.close()
statinfo = os.stat(filename) print "file size: %d" % statinfo.st_size print "position : %d" % f.tell() line = f.read() # read() returns emtpy!! readlines?() works ok print "line : [%s]" % line #################
On my machine, it outputs the following ### file size: 4 position : 0 line : [] ###
And just now I found that on ubuntu on the same machine (vmware), it works collect. ### file size: 4 position : 0 line : [Spam] ###
My patched version python outputs the same result as above on OS X.
We need clearerr() there, too because clearerr()'s job is not only only clearing error indicators but EOF indicators also.
Author: Alexander Belopolsky (belopolsky) *
Date: 2007-04-24 04:28
Hmm. It looks like glibc's fread will attempt reading even if EOF flag is already set.
I've reproduced the problem on OS X 10.4.9 and Python 2.6a0 (trunk:54926M).
This patch fixes it.
Author: John Smith (josm)
Date: 2007-04-24 17:15
I confirmed that fread can read data from fp in EOF state on Linux. On OS X, freading from fp in EOF state doesn't work, just return 0.
I tested this behavior using the following code.
#include <stdio.h> #include <stdlib.h>
main(void) { FILE *fp1, *fp2; int n; char buf[128];
fp1 = fopen("spam.txt", "w+"); fputs("foo", fp1);
fseek(fp1, 0L, SEEK_END); if (fread(buf, 1, sizeof(buf), fp1) == 0) fprintf(stderr, "first fread failed\n");
fp2 = fopen("spam.txt", "a+"); fputs("bar", fp2); fclose(fp2);
if (feof(fp1)) fprintf(stderr, "after appending some text, fp1 is still eof\n");
if (fread(buf, 1, sizeof(buf), fp1) == 0) fprintf(stderr, "second fread failed\n"); printf("buf: %s\n", buf);
fclose(fp1);
return 0; }
============= On Linux
first fread failed after appending some text, fp1 is still eof buf: bar
============= On OS X
first fread failed after appending some text, fp1 is still eof second fread failed buf:
Anyway, I think it's safe and preferable to clear EOF indicator in this case.
Author: Alexander Belopolsky (belopolsky) *
Date: 2007-04-25 03:14
I have just verified that this issue affects readlines and readinto methods on Mac OS X as well. This tells me that rather than patching all the individual methods, maybe Py_UniversalNewlineFread should be fixed instead. I believe that the current situation where a function that looks like Python API reports errors by setting flags on a FILE object instead of settin the an exception is against python's design. I suggest changing Py_UniversalNewlineFgets so that if fread returns 0, it checks whether that was due to EOF or an error and sets an exception in a case of an error and always calls clearerr.
Author: John Smith (josm)
Date: 2007-04-28 17:03
New patch Attached. I believe this one includes all belopolsky's suggestions.
Now all dirty jobs are done in Py_UniversalNewlineFread:
- calls clearerr if ferror or feof
- set IOErrorr exception and return
The caller just checks whether IOError is occurred or not and does appropriate jobs.
The attachment also contains unittest for this problem.
File Added: eof.diff
Author: Alexander Belopolsky (belopolsky) *
Date: 2007-05-01 14:38
The latest patch works on Mac OS 10.4 . I would suggest a few cosmetic changes:
- Put the flag checking logic in a separate function and call it instead of fread. This will eliminate code duplication.
- "else" is redundant in
if (ferror(stream)) {
clearerr(stream);
PyErr_SetFromErrno(PyExc_IOError);
return 0;
} else if (feof(stream)) {
clearerr(stream);
}
The flow will be more obvious if you write
if (ferror(stream)) {
clearerr(stream);
PyErr_SetFromErrno(PyExc_IOError);
return 0;
}
if (feof(stream)) {
clearerr(stream);
}
- Please mention bug 1523853 in a comment near the test.
otherwise, the patch looks fine.
Author: John Smith (josm)
Date: 2007-05-03 10:03
The flag checking is revised. I removed feof() so the checking logic become shorter. I didn't factor out that logic because I think the logic is simple enough
Added a comment that mentions bug 1523853.
File Added: eof2.diff
Author: John Smith (josm)
Date: 2007-06-05 14:56
Could someone please review my patch and give me some feedback?
Author: John Smith (josm)
Date: 2007-12-01 12:19
Any chance to get this fix commmited in?
Author: Raghuram Devarakonda (draghuram)
Date: 2007-12-03 17:47
I get an error when I try to read 1523853. Is this not moved to the new tracker or others are able to access it?
Author: Martin v. Löwis (loewis) *
Date: 2007-12-03 18:05
Yes, a number of items were not moved, as SF failed to provide them on export.
Author: Martin v. Löwis (loewis) *
Date: 2007-12-04 09:02
jos, can you please provide a real name in "Your Details" of this tracker? We cannot accept anonymous/pseudonymous patches.
Author: John Smith (josm)
Date: 2007-12-04 13:02
What's in a name? :p Done, anyway.
Author: John Smith (josm)
Date: 2008-11-29 19:44
Hi, when will this patch be checkedin?
Author: Martin v. Löwis (loewis) *
Date: 2008-12-13 15:37
Thanks for the patch. Committed into the 2.5 branch as r67740. This still needs to be applied to the other branches.
Author: Martin v. Löwis (loewis) *
Date: 2008-12-21 23:35
Unfortunately, the patch is broken. The program
fname='test123' f=open(fname,'w') f.read()
crashes with the patch applied.
I think I will revert the patch in 2.5.3, release 2.5.4, and reject the patch.
Author: Guilherme Polo (gpolo) *
Date: 2008-12-21 23:42
It isn't being careful when calling PyErr_SetFromErrno inside the Py_UniversalNewlineFread function since this function is being called all over fileobject after releasing the GIL.. so, isn't this just a matter of adding pairs of PyGILState_Ensure/PyGILState_Release around these calls to PyErr_SetFromErrno in this specific function ?
Author: Martin v. Löwis (loewis) *
Date: 2008-12-21 23:52
It isn't being careful when calling PyErr_SetFromErrno inside the Py_UniversalNewlineFread function since this function is being called all over fileobject after releasing the GIL.. so, isn't this just a matter of adding pairs of PyGILState_Ensure/PyGILState_Release around these calls to PyErr_SetFromErrno in this specific function ?
Perhaps that could fix this problem (or perhaps not - is PyGILState_Ensure guaranteed to do the right thing, even in the presence of multiple interpreters?)
My concern is that this patch may contain other bugs. I am apparently incapable of reviewing it correctly, and nobody else has offered a careful review.
Author: Guilherme Polo (gpolo) *
Date: 2008-12-21 23:55
On Sun, Dec 21, 2008 at 9:52 PM, Martin v. Löwis <report@bugs.python.org> wrote:
Martin v. Löwis <martin@v.loewis.de> added the comment:
It isn't being careful when calling PyErr_SetFromErrno inside the Py_UniversalNewlineFread function since this function is being called all over fileobject after releasing the GIL.. so, isn't this just a matter of adding pairs of PyGILState_Ensure/PyGILState_Release around these calls to PyErr_SetFromErrno in this specific function ?
Perhaps that could fix this problem (or perhaps not - is PyGILState_Ensure guaranteed to do the right thing, even in the presence of multiple interpreters?)
It is said to be unsupported in that case, but I guess you knew that.
Author: Scott Dial (scott.dial)
Date: 2008-12-22 14:42
I believe the original patch is overreaching. I believe the changes to fileobject.c should've been much simpler, just as the title describes (adding clearerr() just before performing reads on the FILE handles).
I've attached a much simpler patch that I believe corrects the issue, but I don't have an OS X platform to try it on.
Author: Martin v. Löwis (loewis) *
Date: 2008-12-23 15:07
I have now reverted the patch in r67914. I won't reject the patch because of Scott's alternative, but leave it open for review. Since Scott's patch is not approved yet, this is not a release blocker anymore.
Author: Alexander Belopolsky (belopolsky) *
Date: 2008-12-23 15:47
I've combined Scott's patch with John's (?) test case in .diff. Confirmed that the patch fixes the issue on Mac OS X 10.5.6 (Intel).
Author: Alexander Belopolsky (belopolsky) *
Date: 2008-12-23 16:03
Maybe not a problem, but an inconsistency: in Py_UniversalNewlineFgets clearerr is added inside univ_newline conditional and then again before the loop over characters, but in Py_UniversalNewlineFread it is added only once before "if (!f->f_univ_newline)". I am not sure which approach is right, but I don't think there is a reason for the two functions to be different.
Author: John Smith (josm)
Date: 2008-12-23 17:05
Sorry for inconvenience caused with the patch. I should've got more reviews before asking it checked in.
I verified eof2.diff-applied-python gets "Bus error" and Scott Dial's fileobject.diff fixes this. (Test was on 9.6.0 Darwin Kernel Version 9.6.0)
Author: Scott Dial (scott.dial)
Date: 2008-12-23 17:13
They differ because in Py_UniversalNewlineFgets() there is a call to FLOCKFILE(), and it seemed appropriate to ensure that clearerr() was called after locking the FILE stream. I certainly pondered over whether it made a difference to do it before or after, and it would seem to me that if we are bothering to lock the FILE stream then we would care to ensure that the first GETC() after locking reaps the benefit of having called clearerr(). In the cases that we fall into fread and fgets, we may actually need to be doing:
FLOCKFILE(stream)
clearerr(stream);
result = fgets(buf, n, stream);
FUNLOCKFILE(stream);
return result;
One nitpick is that the testcase was appended to the StdoutTests class instead of OtherFileTests class, which would seem more appropriate.
Author: Antoine Pitrou (pitrou) *
Date: 2009-05-15 12:55
Does it apply to 3.1?
Author: Scott Dial (scott.dial)
Date: 2009-12-10 09:18
I've attached a patch that applies cleanly against py3k.
I do not have an in-depth understanding of the new io module, but at a cursory glance, it seems to not use FILE streams, and would not be subject to this bug. However, Py_UniversalNewlineFgets() still remains, and I have patched it accordingly. Additionally, I have added the test to test_fileio.
py3k doesn't fail any of these tests for me, but the original issue was only exposed on OS X.
Author: Mark Lawrence (BreamoreBoy) *
Date: 2010-07-17 17:48
The patch adds one line and moves one line in fileobject.c. The rest of the patch is new tests. I've successfully applied the patch on Windows and the tests were fine. Can someone kindly repeat the tests on OS X as this is where the problem was originally reported.
Author: Mark Lawrence (BreamoreBoy) *
Date: 2010-07-24 18:59
Could somebody with an OS X system please try out the py3k patch, thanks.
Author: Alexander Belopolsky (belopolsky) *
Date: 2010-07-25 01:11
The patch passes tests on OSX, but the old code passes the new tests as well, so I am not sure what this patch fixes anymore. Looking back through the messages I see that I had trouble reproducing the problem, then I succeeded, but I don't remember the details.
In any case, I am replacing -py3k.patch with the same patch against recent SVN revision and with tabs replaced by spaces in C code.
Author: Alexander Belopolsky (belopolsky) *
Date: 2010-11-30 17:12
I don't know how to reproduce the issue and without unit tests this patch cannot be committed.
Author: Scott Dial (scott.dial)
Date: 2010-11-30 17:26
The patch includes unittests; the issue is that the tests pass without the changes. That is an entirely different state to be in. The tests should be committed unless somebody wants to object to the behavior that they are testing (although I believe this behavior is already codified in documentation). If the tests start failing on someone's platform, then you have a patch already waiting to be applied. Nevertheless, I don't see the harm in the patch as-is, because it is a quite innocuous change to Py_UniversalNewlineFgets(), unless (again) you want to argue about the correctness of that change despite no test failing.
Author: Alexander Belopolsky (belopolsky) *
Date: 2010-12-09 05:45
The tests need to be cleaned up a little. The setup code should go to setUp() method and instead of calling different methods in a loop with a switch over method names, it should just have a separate test_ method for each method tested. A "with" statement can also reduce some boilerplate.
Author: Scott Dial (scott.dial)
Date: 2011-03-09 07:32
I've updated the patch to apply to the current tip. (This patch was an opportunity for me to update to an Hg workflow.)
Alexander, I disagree with you about the tests. The unittests use the exact same pattern/model that testIteration uses. I find your complaint that a unittest, which does test the feature under question, is not good enough, despite the prevailing unittests being designed in the same manner, to be absurd. Practicality beats purity -- a test that works is better than no test at all.
By rejecting unittests on the merits of its coding style, you are creating a double-standard for people like me (outside of the core committers), which eventually wears out my interest in helping you get this improvement into your project. I have been chased around this obstacle course before. For example, , when I was asked to provide unittests for a module that had none, for a 3-line patch that nobody disagreed it being correct. I had a vested interest in jumping through the obstacles of getting that patch in, but here I am again being blocked from making a 3-line patch, except this time, purely for stylistic reasons.
Author: Antoine Pitrou (pitrou) *
Date: 2011-03-09 22:44
I'm afraid there's a misunderstanding here about the scope of the issue. fileobject.c is not really used in py3k (except for a couple very specific uses such as the tokenizer or the import machinery). The standard I/O stack uses object in the _io module (that is, in Modules/_io/*.c). That's also why you won't see the tests failing, even without the patch...
Author: Scott Dial (scott.dial)
Date: 2011-03-10 02:09
I'm well aware of the limited use of Py_UniversalNewlineFgets() in py3k, but it remains the case that it is a public API that fails to work correctly under the conditions specified by the reporter, and Alexander confirmed the original patch fixed the issue. AFAICT, there no longer are any test cases (beyond the indirect testing of the dependent code) for Py_UniversalNewlineFgets().
One of the issues with applying the patch to tip was due to commenting out some code in it:
/* if ( c == EOF && skipnextlf )
newlinetypes |= NEWLINE_CR; */
For , which really should've just deleted those two lines, but there was not such an extensive review of that change.
The unittests have there own merit. The file object API is supposed to behave in the manner that the tests exercise them. There are currently no tests that would inform us if any change broke this documented behavior. If you want to split the patch in two to treat them as independent items, then fine.
Otherwise, close the issue as WONTFIX due to obsolescence.
Author: Alexander Belopolsky (belopolsky) *
Date: 2011-03-11 15:56
On Wed, Mar 9, 2011 at 2:32 AM, Scott Dial <report@bugs.python.org> wrote: ..
By rejecting unittests on the merits of its coding style, you are creating a double-standard for people like me (outside of the core committers), which eventually wears out my interest in helping you get this improvement into your project.
I don't think there is a double-standard. What you see is more of an evolving standard in which we seek higher quality for the new code than that of the code already in. The readability of the test code is as important if not more important than the readability of the main code. It may be more important because in some cases you can trust obscure code if it is clear that tests cover all corner cases and are easy to understand.
I don't think there are overly strict standards for how unit tests should be written, but when they are written in a style that is unfamiliar to the reviewer, it makes it harder to review. Note that a reviewer needs to worry about tests passing on multiple platforms, their behavior in failure scenarios, as well as other issues that contributors typically overlook. Stylistic issues often point out to real problems, but are easier to spot. For example, try/finally is more error prone than with statement. In fact, in your test "append" stream will not get closed if writing fails. You may think the resource leak on failure is not a big deal, but for some runners that execute tests repeatedly this may lead to spurious failures.
The acceptance rate for a patch is proportional to the severity of an issue that it fixes and obviousness of the fix (including tests.) This patch fails on both counts. Looking at the history of the issue, I see that the patch was applied once but reverted because it caused a crash. See . This shows that the tests were not sufficiently thorough. Next, a patch was suggested with tests that did not fail without a patch. See . This probably shows that additional tests are needed for 3.2.
These are the reasons I unassigned this issue. In my opinion it would take a committer too much effort to bring the patch to commit ready shape for the marginal benefit of fixing a platform dependent corner case. If the tests were better written so that it would be more obvious what is fixed and that nothing gets broken, my calculation would probably be different.
History
Date
User
Action
Args
2022-04-11 14:56:23
admin
set
github: 44887
2020-05-31 11:55:40
serhiy.storchaka
set
status: open -> closed
resolution: out of date
stage: patch review -> resolved
2011-03-11 15:56:21
belopolsky
set
nosy:loewis, belopolsky, scott.dial, pitrou, josm, benjamin.peterson, gpolo
messages: +
2011-03-10 02:09:07
scott.dial
set
nosy:loewis, belopolsky, scott.dial, pitrou, josm, benjamin.peterson, gpolo
messages: +
2011-03-09 22:44:30
pitrou
set
nosy:loewis, belopolsky, scott.dial, pitrou, josm, benjamin.peterson, gpolo
versions: - Python 3.1, Python 3.2
messages: +
stage: test needed -> patch review
2011-03-09 07:32:22
scott.dial
set
files: + issue1706039-py3k.patch
nosy:loewis, belopolsky, scott.dial, pitrou, josm, benjamin.peterson, gpolo
messages: +
2010-12-09 05:45:44
belopolsky
set
nosy: - BreamoreBoy
messages: +
2010-11-30 17:26:06
scott.dial
set
messages: +
2010-11-30 17:12:47
belopolsky
set
assignee: belopolsky ->
messages: +
2010-07-25 01:11:11
belopolsky
set
files: + issue1706039-py3k.patch
messages: +
keywords: + needs review
resolution: accepted -> (no value)
stage: commit review -> test needed
2010-07-24 18:59:06
BreamoreBoy
set
messages: +
2010-07-17 21:30:23
belopolsky
set
assignee: belopolsky
resolution: accepted
stage: patch review -> commit review
2010-07-17 17:48:10
BreamoreBoy
set
nosy: + BreamoreBoy
messages: +
versions: + Python 2.7, Python 3.2, - Python 2.6
2009-12-10 09:21:52
scott.dial
set
files: - issue1706039-py3k.patch
2009-12-10 09:21:48
scott.dial
set
files: + issue1706039-py3k.patch
2009-12-10 09🔞22
scott.dial
set
files: + issue1706039-py3k.patch
messages: +
2009-05-15 12:55:17
pitrou
set
messages: +
2009-05-15 02:10:20
ajaksu2
set
versions: + Python 3.1, - Python 3.0
nosy: + benjamin.peterson, pitrou
priority: normal
components: + IO
stage: patch review
2008-12-23 17:13:22
scott.dial
set
messages: +
2008-12-23 17:05:12
josm
set
messages: +
2008-12-23 16:03:25
belopolsky
set
messages: +
2008-12-23 15:47:23
belopolsky
set
files: + issue1706039.diff
messages: +
2008-12-23 15:07:50
loewis
set
priority: release blocker -> (no value)
assignee: loewis -> (no value)
messages: +
2008-12-22 14:42:40
scott.dial
set
files: + fileobject.diff
nosy: + scott.dial
messages: +
2008-12-21 23:55:25
gpolo
set
messages: +
2008-12-21 23:52:12
loewis
set
messages: +
2008-12-21 23:42:45
gpolo
set
nosy: + gpolo
messages: +
2008-12-21 23:35:21
loewis
set
messages: +
2008-12-20 14:37:04
loewis
set
versions: - Python 2.5.3
2008-12-20 02:41:45
loewis
set
priority: deferred blocker -> release blocker
2008-12-13 15:37:23
loewis
set
priority: release blocker -> deferred blocker
messages: +
2008-12-10 09:10:00
loewis
set
priority: normal -> release blocker
2008-11-29 19:44:48
josm
set
messages: +
versions: + Python 2.6, Python 3.0, Python 2.5.3
2008-01-21 21:36:44
draghuram
set
nosy: - draghuram
2007-12-04 13:02:06
josm
set
messages: +
2007-12-04 09:02:08
loewis
set
messages: +
2007-12-03 18:05:17
loewis
set
nosy: + loewis
messages: +
2007-12-03 17:47:56
draghuram
set
nosy: + draghuram
messages: +
2007-12-01 12:19:34
josm
set
type: behavior
messages: +
2007-08-24 19:39:00
georg.brandl
set
assignee: loewis
2007-04-23 17:21:41
josm
create