BUG/API: Data assignment dtype inference inconsistencies · Issue #14001 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
Assuming we have int DataFrame
and assign float values.
- assigning 2-d float array keeps
int
columns if possible.
df = pd.DataFrame(np.zeros((3, 2), dtype=np.int64))
df.iloc[0:2, 0:2] = np.array([[1, 2.1], [2, 2.2]], dtype=np.float64)
df
# 0 1
#0 1 2.1
#1 2 2.2
#2 0 0.0
- but assigning 1-d array coerces to
float
, not keepsint
.
df = pd.DataFrame(np.zeros((3, 2), dtype=np.int64))
df.iloc[0:2, 0] = np.array([1, 2], dtype=np.float64)
df
# 0 1
#0 1.0 0
#1 2.0 0
#2 0.0 0
Expected Output
I think 1st one should be all float, as we're assigning float dtype.
df = pd.DataFrame(np.zeros((3, 2), dtype=np.int64))
df.iloc[0:2, 0:2] = np.array([[1, 2.1], [2, 2.2]], dtype=np.float64)
df
# 0 1
#0 1.0 2.1
#1 2.0 2.2
#2 0.0 0.0
The inference occurs around here, and it should be something like:
if hasattr(value, 'dtype'):
if is_dtype_equal(values.dtype, value.dtype):
dtype = value.dtype
else:
dtype = _find_common_type(values.dtpye, value.dtype)
elif is_scalar(value):
dtype, _ = _infer_dtype_from_scalar(value)
else:
# not sure can reach here...
dtype = 'infer'
...
output of pd.show_versions()
0.18.1