append : NaT should be the default for missing values for datetime64 columns (original) (raw)

Hello,

Here is a issue I discovered in pandas version '0.12.0' (already present in preview version I think)

Issue:
When appending a DataFrame (with a new datetime64 column) to an existing one, the default value for missing values should be a pandas.tslib.NaT.

Example:

import pandas as pd import datetime as dt from pandas.tslib import NaT df1 = pd.DataFrame(index=[1,2],
data=[dt.datetime(2013,1,1,0,0),dt.datetime(2013,1,2,0,0)],
columns=['start_time'])

df1

df2 = pd.DataFrame(index=[4,5],
data=[[dt.datetime(2013,1,3,0,0),dt.datetime(2013,1,3,6,10)],[dt.datetime(2013,1,4,0,0),dt.datetime(2013,1,4,7,10)]],
columns=['start_time','end_time'])

df2

df3=df1.append(df2,ignore_index=True)

df3_nan

While in reallity we want:
df3_wanted

A simple work around is:

df3['end_time']=df3['end_time'].apply(pd.to_datetime)

Could be nice if, be default, when a new "datetime64" column is added, the default for missing values is NaT. Otherwise this creates problem when, for example, saving as HDF5 using pytable which does not accept mixed types per column.

Have a nice day,

Patrick