msg106793 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2010-05-31 16:39 |
From , : """ >>> from datetime import timedelta as d >>> [d(microseconds=i + .5)//d.resolution for i in range(-10,10)] [-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] Should this be considered a bug? For comparison, >>> [d.resolution*(i+0.5)//d.resolution for i in range(-10,10)] [-10, -8, -8, -6, -6, -4, -4, -2, -2, 0, 0, 2, 2, 4, 4, 6, 6, 8, 8, 10] and >>> [round(i+0.5) for i in range(-10,10)] [-10, -8, -8, -6, -6, -4, -4, -2, -2, 0, 0, 2, 2, 4, 4, 6, 6, 8, 8, 10] I checked the documentation and while it says: "If any argument is a float and there are fractional microseconds, the fractional microseconds left over from all arguments are combined and their sum is rounded to the nearest microsecond." it does not specify how half-integers should be handled. While it may not be a bug in strict sense, it looks like the code in question can be improved. """ |
|
|
msg106802 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2010-05-31 19:09 |
Here is a shorter example of inconsistent behavior: >>> 0.5 * timedelta(microseconds=1) datetime.timedelta(0) >>> timedelta(microseconds=0.5) datetime.timedelta(0, 0, 1) |
|
|
msg106806 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-05-31 19:42 |
I agree it would be nice to fix this. We could either (1) alter delta_new so that the final round uses round-to-nearest; this would give the desired behaviour in most cases, but there would still be a small possibility of rounding going in the wrong direction as a result of accumulated errors in 'leftover_us'. Or (2) rewrite delta_new to do correct rounding in all cases, by using integer arithmetic where necessary. (2) seems like overkill to me. For (1), it looks like it would be enough just to replace the round_to_long function with: static long round_to_long(double x) { return (long)round(x); } Actually, at this point, one might as well just inline round_to_long. Note that round_to_long is also called from datetime_from_timestamp. |
|
|
msg106808 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-05-31 19:49 |
Aargh! No, I take that back. round() also does round-half-away-from-zero, of course. |
|
|
msg106809 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-05-31 20:24 |
Here's a first stab at a patch. It still needs tests. |
|
|
msg106824 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2010-06-01 03:38 |
Mark> (2) seems like overkill to me. I agree, however it would be interesting to figure out when accumulated errors can produce an inaccurate result. ISTM that leftover is the sum of up to 7 doubles each between 0 and 1 which is then rounded to the nearest integer. I don't see how accumulated error can exceed 1 and affect the result. |
|
|
msg106825 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2010-06-01 03:54 |
I wonder if it would be justified to expose something like int _PyLong_IsOdd(PyObject *self) { PyLongObject *lo = (PyLongObject *)self; return Py_SIZE(lo) != 0 && ((lo->ob_digit[0] & 1) != 0); } in longobject.h? |
|
|
msg106828 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-06-01 06:59 |
The accumulated error doesn't need to exceed 1; it just needs to be enough to make (e.g.) leftover appear to be >= 0.5 when it's actually just less than 0.5. It shouldn't be too hard to find examples where this happens, but I haven't thought about it. > I wonder if it would be justified to expose something like > int _PyLong_IsOdd(PyObject *self) {...} Yes, that could work. I'd be happier with this if there are other uses (even within longobject.c); there might well be, but I haven't looked. I also have a mild preference for _PyLong_IsEven over _PyLong_IsOdd. :) |
|
|
msg106831 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-06-01 08:31 |
An example where getting correct rounding is awkward: the following rounds up, with or without the patch: >>> datetime.timedelta(seconds=0.6112295) datetime.timedelta(0, 0, 611230) But in this case, the result should actually be rounded down (with either rounding mode), since the exact value of float('0.6112295') is a little less than 0.6112295: >>> print(Decimal(0.6112295)) 0.61122949999999998116351207499974407255649566650390625 The problem is that leftover_us ends up being exactly 0.5 in this case. Again, this could be worked around if we really care, but I'm not convinced it's worth it. |
|
|
msg107094 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2010-06-04 20:14 |
The timedelta(seconds=0.6112295) example is handled correctly because 0.6112295 sec is not half way between two nearest microseconds: >>> abs(0.6112295 - 0.6112290) == abs(0.6112295 - 0.6112300) False The fact that it displays as if it is does not make timedelta rounding wrong. I am still not sure that it is possible to accumulate rounding error by adding seven doubles, each < 1 to affect the rounded result. While proving that the rounding is always correct or finding a counter-example is an interesting puzzle, I think it has little practical value. I will add unit tests and get this patch ready for for commit review, but setting the priority to "low". |
|
|
msg107095 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2010-06-04 20:18 |
The half-way check should be done with decimal arihmetics, but the result is the same: >>> x = Decimal(0.6112295) >>> abs(x - Decimal('0.6112290')) == abs(x - Decimal('0.6112300')) False |
|
|
msg107097 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-06-04 20:21 |
> The timedelta(seconds=0.6112295) example is handled correctly No, it's not! It's being rounded *up* where it should be being rounded *down*. > because 0.6112295 sec is not half way between two nearest microseconds Exactly. The actual value stored by the C double is a little closer to 0.611229 than to 0.611230. |
|
|
msg108551 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2010-06-24 21:18 |
Similar problem affects fromtimestamp() constructors: >>> datetime.fromtimestamp(0.0078125)-datetime.fromtimestamp(0) datetime.timedelta(0, 0, 7813) >>> datetime.utcfromtimestamp(0.0078125)-datetime.utcfromtimestamp(0) datetime.timedelta(0, 0, 7813) both rounded to odd, but >>> 0.0078125*timedelta(seconds=1) datetime.timedelta(0, 0, 7812) is correctly rounded to even. |
|
|
msg108601 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2010-06-25 14:41 |
>> The timedelta(seconds=0.6112295) example is handled correctly > No, it's not! It's being rounded *up* where it should be > being rounded *down*. Let me try to reformulate the issue. When use is entering 0.6112295, she means 0.6112295, not 0x1.38f312b1b36bdp-1 or 0.61122949999999998116351207499974407255649566650390625 which are exact values of the underlying binary representation of 0.6112295. Modern Python is able to round 0.6112295 to 6 decimal places correctly: >>> round(0.6112295, 6) 0.611229 and timedelta should do the same, but it does not: >>> timedelta(seconds=0.6112295) datetime.timedelta(0, 0, 611230) With respect to accumulation, I believe the invariant that we want to have should be timedelta(x=a, y=b, ...) == timedelta(x=a) + timedelta(y=b) while the alternative (using higher precision addition with rounding at the end) may be more accurate in some cases, the above looks least surprising. |
|
|
msg184529 - (view) |
Author: Yu Tomita (nekobon) * |
Date: 2013-03-18 21:29 |
I updated the unittest for the patch by Mark. The test fails without the patch and passes with it. Also, I updated the patch to work in Python 3.4 (_datetime.c instead of datetime.c). |
|
|
msg194308 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2013-08-03 22:03 |
I cleaned up the patch a little: 1. Removed now unused static round_to_long() function. 2. Removed commented out code. Mark, Any reason not to apply this? Do we need a NEWS entry for something like this? |
|
|
msg194311 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2013-08-03 22:20 |
With the current patch we still have the following quirks: >>> timedelta(seconds=0.6112295) == timedelta(seconds=1)*0.6112295 False >>> timedelta(seconds=0.6112295) == timedelta(seconds=round(0.6112295, 6)) False This is not a reason to hold the patch, though. |
|
|
msg194325 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2013-08-04 07:02 |
Alexander: applying this is fine by me. As a user-visible change, yes, I think it should have a Misc/NEWS entry. (It's too small a change to be worth mentioning in the 'whatsnew' documents though.) |
|
|
msg194409 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-08-04 18:52 |
New changeset f7c84ef35b00 by Alexander Belopolsky in branch 'default': Fixes #8860: Round half-microseconds to even in the timedelta constructor. http://hg.python.org/cpython/rev/f7c84ef35b00 |
|
|