Strange Behavior of mean() with timedelta64 · Issue #9442 · pandas-dev/pandas (original) (raw)
xref #6549
When a dataframe with a timedelta64 is very large, the mean() function does not work as expected. In the following code, the mean is incorrect but all the other stats are fine:
import numpy as np import pandas as pd
dates1 = pd.date_range(start='20000101T000000', end='20150101T230000', freq='1H').values dates2 = dates1 + np.random.randint(060601000000000, 10246060*1000000000, len(dates1)) df = pd.DataFrame({'dates1':dates1, 'dates2':dates2}) df['tdiff'] = dates2-dates1 df['fdiff'] = df['tdiff'].apply(lambda x: float(x.item())/1000000000/3600/24) df.describe()
By making the length smaller, by changing the start date in the above example:
dates1 = pd.date_range(start='20140101T000000', end='20150101T230000', freq='1H').values
The correct result is obtained. (The mean of the timedelta should be about 5 days.) Is this an open bug?
Version:
pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.9.final.0 python-bits: 64 OS: Darwin OS-release: 13.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: C LANG: en_US.UTF-8
pandas: 0.15.2 nose: 1.3.4 Cython: 0.21.2 numpy: 1.9.1 scipy: 0.15.1 statsmodels: None IPython: 2.3.1 sphinx: 1.2.3 patsy: None dateutil: 2.4.0 pytz: 2014.10 bottleneck: None tables: None numexpr: None matplotlib: 1.4.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.3.2 html5lib: None httplib2: None apiclient: None rpy2: None sqlalchemy: None pymysql: None psycopg2: None