pd.eval division operation upcasts float32 to float64 · Issue #12388 · pandas-dev/pandas (original) (raw)
The current behavior is inconsistent with normal python division of two DataFrame
s (see code sample).
Pandas upcasts both terms to 64-bit floats when it detects a division, see:
I think numexpr can handle different types too, and upcast automatically, though I am not 100% sure. I can submit a PR, but how do you recommend fixing this? Something like the following?
if truediv or PY3:
for term in com.flatten(self):
try:
dt = term.values.dtype # can .values be expensive?
except AttributeError:
dt = type(term)
if dt == np.float32:
continue
else:
_cast_inplace([term], np.float_)
The downside is that if someone does 2 + df
, they'll probably still end up upcasting it. But this proposal is still better than what we have today
I might re-write the above using filter
too, but at this time I just wanted to discuss the general approach
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(3, dtype=np.float32))
print('normal', (df/df).values.dtype)
print('pd_eval', pd.eval('df/df').values.dtype)
assert ((df/df).dtypes == pd.eval('df/df').dtypes).all()
Expected Output
normal float32
pd_eval float32
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.17.1
nose: None
pip: 8.0.2
setuptools: 19.6.2
Cython: 0.23.4
numpy: 1.10.4
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.9.2
apiclient: 1.4.2
sqlalchemy: 1.0.9
pymysql: 0.6.7.None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
Jinja2: None