BUG: unwanted numeric coercion after groupby-apply · Issue #14423 · pandas-dev/pandas (original) (raw)

xref #14873 (boolean casts)
xref #14849 (datetime)

A small, complete example of the issue

import pandas as pd

def predictions(tool): out = pd.Series(index=['p1', 'p2', 'useTime'], dtype=object) if 'step1' in list(tool.State): out['p1'] = str(tool[tool.State == 'step1'].Machine.values[0]) if 'step2' in list(tool.State): out['p2'] = str(tool[tool.State == 'step2'].Machine.values[0]) out['useTime'] = str(tool[tool.State == 'step2'].oTime.values[0]) return out

df1 = pd.DataFrame({'Key': ['B', 'B', 'A', 'A'], 'State': ['step1', 'step2', 'step1', 'step2'], 'oTime': ['', '2016-09-19 05:24:33', '', '2016-09-19 23:59:04'], 'Machine': ['23', '36L', '36R', '36R']})

df2 = df1.copy() df2.oTime = pd.to_datetime(df2.oTime)

pred1 = df1.groupby('Key').apply(predictions) pred2 = df2.groupby('Key').apply(predictions)

print(pred1) print(pred2)

Actual Output:

      p1   p2              useTime
Key                               
A    36R  36R  2016-09-19 23:59:04
B     23  36L  2016-09-19 05:24:33
       p1   p2                        useTime
Key                                          
A     NaN  36R  2016-09-19T23:59:04.000000000
B    23.0  36L  2016-09-19T05:24:33.000000000

Expected Output

pred1 and pred2 should have the same values in column p1.
pred1 is correct whereas pred2 is changing type to float64.

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: None
pip: 8.1.2
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.0
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None