BUG:Floating point accuracy with DatetimeIndex.round (#14440) by mroeschke · Pull Request #15568 · pandas-dev/pandas (original) (raw)
@@ -175,6 +175,17 @@ def test_round(self):
tm.assertRaisesRegexp(ValueError, msg, rng.round, freq='M')
tm.assertRaisesRegexp(ValueError, msg, elt.round, freq='M')
# GH 14440
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also round to us/ns here as well (which should equal the original)
as an aside, you can now do parametrized tests (but have to move them to separate functions and not class based)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the rounding is happening correctly for microseconds but not with nanoseconds:
In [7]: pd.DatetimeIndex(['2016-10-17 12:00:00.0015']).round('ns')
Out[7]: DatetimeIndex(['2016-10-17 12:00:00.001499904'], dtype='datetime64[ns]', freq=None)
The rounding methodology seems sound. I am unsure if this is a limitation of the date going from int64 to float64 to int64 as this is essentially what is happening:
(Pdb) np.round(np.array([1476705600001500000])/1.).astype('i8')
array([1476705600001499904])
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is just losing precision - not sure much can be done
we could warn / raise potentially though
would that be useful?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created a new issue. #15578
index = pd.DatetimeIndex(['2016-10-17 12:00:00.0015'], tz=tz)
result = index.round('ms')
expected = pd.DatetimeIndex(['2016-10-17 12:00:00.002000'], tz=tz)
tm.assert_index_equal(result, expected)
index = pd.DatetimeIndex(['2016-10-17 12:00:00.00149'], tz=tz)
result = index.round('ms')
expected = pd.DatetimeIndex(['2016-10-17 12:00:00.001000'], tz=tz)
tm.assert_index_equal(result, expected)
def test_repeat_range(self):
rng = date_range('1/1/2000', '1/1/2001')