Feature: NaN in float64 series should become NaT with astype() · Issue #2984 · pandas-dev/pandas (original) (raw)
To get an auto-diff on a timeseries I do:
In [47]: ts Out[47]: 0 2011-04-16 00:00:00.025000 1 2011-04-16 00:00:00.152999 2 2011-04-16 00:00:00.280999 3 2011-04-16 00:00:00.409000 4 2011-04-16 00:00:00.537000 5 2011-04-16 00:00:00.665000 6 2011-04-16 00:00:00.792999 7 2011-04-16 00:00:00.921000 8 2011-04-16 00:00:01.049000 9 2011-04-16 00:00:01.177000 10 2011-04-16 00:00:01.297000 11 2011-04-16 00:00:01.424999 12 2011-04-16 00:00:01.552999 13 2011-04-16 00:00:01.680999 14 2011-04-16 00:00:01.809000 ... 25646 2011-04-16 00:59:58.143001 25647 2011-04-16 00:59:58.270999 25648 2011-04-16 00:59:58.398998 25649 2011-04-16 00:59:58.527000 25650 2011-04-16 00:59:58.654998 25651 2011-04-16 00:59:58.783000 25652 2011-04-16 00:59:58.910999 25653 2011-04-16 00:59:59.039001 25654 2011-04-16 00:59:59.166999 25655 2011-04-16 00:59:59.294998 25656 2011-04-16 00:59:59.423000 25657 2011-04-16 00:59:59.550998 25658 2011-04-16 00:59:59.679000 25659 2011-04-16 00:59:59.806999 25660 2011-04-16 00:59:59.935001 Length: 25661, dtype: datetime64[ns]
In [48]: ts.diff() Out[48]: 0 NaN 1 127999000 2 128000000 3 128001000 4 128000000 5 128000000 6 127999000 7 128001000 8 128000000 9 128000000 10 120000000 11 127999000 12 128000000 13 128000000 14 128001000 ... 25646 128002000 25647 127998000 25648 127999000 25649 128002000 25650 127998000 25651 128002000 25652 127999000 25653 128002000 25654 127998000 25655 127999000 25656 128002000 25657 127998000 25658 128002000 25659 127999000 25660 128002000 Length: 25661, dtype: float64
- One thing would be nice if a diff on a time-series automatically becomes a timedelta?
- When converting the result of the
ts.diff()
with astype('timedelta64[ns']) it should convert the NaN into a NaT, if I understand things correctly?
Instead this happens and I have to cut off the NaN to make it work:
In [49]: ts.diff().astype('timedelta[ns]')
TypeError Traceback (most recent call last) in () ----> 1 ts.diff().astype('timedelta[ns]')
/usr/local/epd/lib/python2.7/site-packages/pandas/core/series.pyc in astype(self, dtype) 830 See numpy.ndarray.astype 831 """ --> 832 casted = com._astype_nansafe(self.values, dtype) 833 return self._constructor(casted, index=self.index, name=self.name, 834 dtype=casted.dtype)
/usr/local/epd/lib/python2.7/site-packages/pandas/core/common.pyc in _astype_nansafe(arr, dtype, copy) 1361 """ return a view if copy is False """ 1362 if not isinstance(dtype, np.dtype): -> 1363 dtype = np.dtype(dtype) 1364 1365 if issubclass(arr.dtype.type, np.datetime64):
TypeError: data type not understood
In [50]: ts.diff().astype('timedelta64[ns]')
ValueError Traceback (most recent call last) in () ----> 1 ts.diff().astype('timedelta64[ns]')
/usr/local/epd/lib/python2.7/site-packages/pandas/core/series.pyc in astype(self, dtype) 830 See numpy.ndarray.astype 831 """ --> 832 casted = com._astype_nansafe(self.values, dtype) 833 return self._constructor(casted, index=self.index, name=self.name, 834 dtype=casted.dtype)
/usr/local/epd/lib/python2.7/site-packages/pandas/core/common.pyc in astype_nansafe(arr, dtype, copy) 1370 1371 if np.isnan(arr).any(): -> 1372 raise ValueError('Cannot convert NA to integer') 1373 elif arr.dtype == np.object and np.issubdtype(dtype.type, np.integer): 1374 # work around NumPy brokenness, #1987
ValueError: Cannot convert NA to integer
In [51]: ts.diff()[1:].astype('timedelta64[ns]') Out[51]: 1 00:00:00.127999 2 00:00:00.128000 3 00:00:00.128001 4 00:00:00.128000 5 00:00:00.128000 6 00:00:00.127999 7 00:00:00.128001 8 00:00:00.128000 9 00:00:00.128000 10 00:00:00.120000 11 00:00:00.127999 12 00:00:00.128000 13 00:00:00.128000 14 00:00:00.128001 15 00:00:00.128000 ... 25646 00:00:00.128002 25647 00:00:00.127998 25648 00:00:00.127999 25649 00:00:00.128002 25650 00:00:00.127998 25651 00:00:00.128002 25652 00:00:00.127999 25653 00:00:00.128002 25654 00:00:00.127998 25655 00:00:00.127999 25656 00:00:00.128002 25657 00:00:00.127998 25658 00:00:00.128002 25659 00:00:00.127999 25660 00:00:00.128002 Length: 25660, dtype: timedelta64[ns]