msg70658 - (view) |
Author: vadim suvorov (bad) * |
Date: 2008-08-03 19:35 |
The result of the attached script execution is extremely unstable. The change in the print statement in line 146 from 'print "1"' to 'print 1' (string literal to numerical) changes the result of execution to the correct (presumably) one. The results are also affected by commenting out calls to empty method in lines 143 or 227. The use of dis.dis() affects results of the execution as well. More than, the location of error (the triggered assert) shifts in the code, or it is possible that an Exception is raised. The script was tested in Python 2.5.2 in Windows XP SP2 and SP3, 2.6beta2 in Windows XP SP3 and in 2.5.2 Ubuntu 8.0.4 with consistent results. (I mean that behavior of the script is changed by calling of empty methods or irrelevant print statements. The assert shifts.) Sorry for the long script. I failed to further reduce its size because effect disappears. Expected output: ~~~~~~~~~~~~~~~~~~~~~ 1 1 1 1 1 1 1 1 1 1 1 1 Congratulations: grid solved! 1 2 3 4 5 6 7 8 910 1 B B B B B . B . . B 2 . B . . B B B B . B 3 . . . . . B B . B B 4 . . . B B B B B B . 5 B . B B B . . B B B ~~~~~~~~~~~~~~~~~~~~~ Received output (example): ~~~~~~~~~~~~~~~~~~~~~ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Traceback (most recent call last): File "E:\pet projects\nonograms\nonogramsolver10.py", line 244, in g.solve() File "E:\pet projects\nonograms\nonogramsolver10.py", line 230, in solve seq.simplescan() File "E:\pet projects\nonograms\nonogramsolver10.py", line 99, in simplescan assert self.clues, "a seq with no clues" AssertionError: a seq with no clues ~~~~~~~~~~~~~~~~~~~~~ |
|
|
msg70660 - (view) |
Author: vadim suvorov (bad) * |
Date: 2008-08-03 19:59 |
tested Python 2.4.4 on WinXP SP3. It required minor modifications of the code (removing conditional expressions), but the effect stayed. |
|
|
msg70661 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2008-08-03 21:50 |
I doubt this is a bug in Python. It's more likely an error (one or more ;-)) in the logic of the script, triggered by the inherently unpredictable order of set iteration. Some evidence: I added a `.counter` member to the Sequence class, and changed Sequence.__init__ like so: counter = 0 ... global counter ... self.counter = counter counter += 1 This gives a way to identify the order in which Sequence objects are created. Then I replaced: seq = self.dirty.pop() with: seq = min((s.counter, s) for s in self.dirty)[1] self.dirty.remove(seq) This forces removal of the earliest-created Sequence object (instead of an unpredictable one). After those changes, the script runs the same way every time, regardless of whether `print 1` or `print "1"` is used, etc. Then change `min` to `max` above, deterministically forcing removal of the most recently created Sequence object instead, and again the script runs the same way every time, but with different output: AssertionError: an attempt to re-paint cell:(2, 8) This all strongly suggests that the algorithm itself is supremely sensitive to the order in which Sequence objects are removed from the self.dirty set. And that can indeed change across runs depending on all sorts of seemingly irrelevant details (unless, as above, the order is forced in some way). |
|
|
msg70662 - (view) |
Author: vadim suvorov (bad) * |
Date: 2008-08-03 22:13 |
Thank you very much, Tim. I am still very much confused, how change in print statement changes order in which items are removed from a set. I presumed it to be undefined but deterministic (similar to dict()): while I cannot tell which order it is going to be, I can be sure it does not depend on changes in other objects. For example, it does not depend on things like if something is printed, or some dummy method called. I have little doubts that the script contains (or might contain) bugs. I am now trying to use 3.0b2, which shows deterministic (in the sense above) behaviour. |
|
|
msg70663 - (view) |
Author: Guilherme Polo (gpolo) *  |
Date: 2008-08-03 22:17 |
Just run it a couple of times and you will hit the problem in both "failing" and "running" examples you added. |
|
|
msg70664 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-08-03 22:18 |
> I am still very much confused, how change in print statement changes > order in which items are removed from a set. I presumed it to be > undefined but deterministic (similar to dict()): while I cannot tell > which order it is going to be, I can be sure it does not depend on > changes in other objects. It will certainly depend on other objects, like dict(). Order of objects in a set or dict depends on the hash values. The hash values depend on the addresses of the objects in main memory, unless __hash__ is overridden. In many modern operating systems, the main memory addresses change with each program start (a feature called address space randomization). Changing a string literal to an int will mean that a different object is created, or that objects get created in a different order, and thus have different addresses. |
|
|
msg70665 - (view) |
Author: vadim suvorov (bad) * |
Date: 2008-08-03 22:39 |
Tim, Martin, thank you very much. I think now it is not an issue, and the record can be closed. Thank you very much again, and please accept my apology. 2 Guilherme Polo: I ran it many times. The results in WinXP (with constant script) are very consistent. 2 Martin v. Löwis: Thank you very much for the excellent explanation. It was exactly what I needed. I did not realize that hash depends on memory location. |
|
|
msg70666 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2008-08-03 22:40 |
I think we have enough evidence now to close this bug as invalid. :) |
|
|
msg70668 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2008-08-03 23:37 |
Vadim, to see how memory addresses change, simply add something like print "id", id(self) to Sequence.__init__. On Windows (Vista, Python 2.5.1), I see different addresses printed each time the program is run. Then, as Martin said, the default implementation of __hash__ for a type uses the object's memory address, and that's both unpredictable across runs and influenced within a run by just about everything the script does (indeed, adding the `print` statement above may well change the addresses of Sequence objects created after the print executes!). What is deterministic: within a single program run, if you iterate over a given set multiple times, the set elements will be delivered in the same order each time, provided you don't mutate the set for the duration. More than that isn't guaranteed. For example, if you have two sets x and y, it's not even the case that x == y implies list(x) == list(y). For example, >>> x = set(range(-10, 11)) >>> y = set(reversed(range(-10, 11))) >>> x == y True >>> list(x) == list(y) False If I were you, I'd change the script to use lists instead of sets here until the algorithm is debugged. Then you'll get wholly deterministic behavior, which will make debugging easier. You can make it run faster later ;-) |
|
|