to_stata + read_stata results in NaNs (close to double precision limit) · Issue #14618 · pandas-dev/pandas (original) (raw)
Explanation
Saving and loading data as stata results in a lot of NaNs.
I think the code & output is pretty self-explanatory, otherwise please ask.
I've not been able to test this on other systems yet.
If this is somehow expected behaviour, maybe a bigger warning would be in order.
A small, complete example of the issue
from numpy.random.mtrand import RandomState
from pandas import DataFrame, read_stata
pth = '/tmp/demo.dta'
rs = RandomState(seed=123456789)
data = (2 * rs.rand(1000, 400).astype('float64') - 1) * 1.7976931348623157e+308
colnames = tuple('c{0:03d}'.format(k) for k in range(data.shape[1]))
frame = DataFrame(data=data, columns=colnames)
with open(pth, 'w+') as fh:
frame.to_stata(fh)
with open(pth, 'r') as fh:
frame2 = read_stata(fh)
print(frame2.tail())
Expected Output
index c000 c001 c002 c003 \
995 995 1.502566e+308 1.019238e+308 -1.169342e+308 6.845363e+307
996 996 -3.418435e+307 -8.113486e+307 2.544741e+306 5.771775e+307
997 997 1.507324e+308 4.610183e+307 -1.016633e+308 -1.632862e+308
998 998 -8.138620e+307 6.312126e+307 -6.557370e+307 6.342690e+307
999 999 -1.179032e+308 1.554709e+308 -1.175680e+308 1.921731e+307
c004 c005 c006 c007 \
995 1.611898e+308 -5.171776e+307 -8.918000e+307 -5.322720e+307
996 3.693405e+307 -1.480267e+308 1.586053e+308 7.489689e+306
997 1.060605e+308 -6.826590e+307 1.644990e+308 -1.379562e+308
998 1.379642e+308 1.005632e+307 -1.206948e+308 -1.198931e+308
999 -5.965607e+307 8.844623e+307 2.727894e+307 -5.433995e+307
c008 c009 ... c390 \
995 -6.580851e+306 1.284482e+308 ... -1.770789e+308
996 -9.312612e+307 -1.778315e+308 ... 7.410784e+307
997 -9.415141e+307 9.058828e+307 ... -5.451829e+305
998 1.651712e+308 4.435415e+307 ... 5.220773e+307
999 -1.747738e+308 -1.603248e+308 ... 1.415798e+307
c391 c392 c393 c394 \
995 7.360232e+307 -3.850417e+307 1.453624e+308 5.690363e+307
996 -6.943490e+307 1.047268e+308 4.026712e+307 9.161669e+305
997 4.406343e+306 1.617739e+308 4.218585e+307 1.573892e+307
998 -2.390131e+307 -6.649416e+307 6.548489e+307 1.000078e+307
999 -1.239203e+308 -5.038284e+307 -1.340608e+307 -1.193758e+308
c395 c396 c397 c398 c399
995 8.371989e+307 3.491895e+307 7.344525e+307 -9.260950e+307 1.032120e+308
996 9.200510e+307 -1.729595e+308 4.021503e+307 2.274318e+307 5.856302e+307
997 -7.624901e+307 -1.206386e+308 -6.164537e+306 -7.634148e+307 -1.462809e+308
998 -9.399560e+307 9.697224e+307 -6.963726e+307 -1.655656e+308 1.513218e+308
999 -1.476121e+308 1.187603e+308 1.402195e+308 -1.584051e+308 -1.232190e+308
[5 rows x 401 columns]
Actual Output
index c000 c001 c002 c003 \
995 995 NaN NaN -1.169342e+308 6.845363e+307
996 996 -3.418435e+307 -8.113486e+307 2.544741e+306 5.771775e+307
997 997 NaN 4.610183e+307 -1.016633e+308 -1.632862e+308
998 998 -8.138620e+307 6.312126e+307 -6.557370e+307 6.342690e+307
999 999 -1.179032e+308 NaN -1.175680e+308 1.921731e+307
c004 c005 c006 c007 \
995 NaN -5.171776e+307 -8.918000e+307 -5.322720e+307
996 3.693405e+307 -1.480267e+308 NaN 7.489689e+306
997 NaN -6.826590e+307 NaN -1.379562e+308
998 NaN 1.005632e+307 -1.206948e+308 -1.198931e+308
999 -5.965607e+307 8.844623e+307 2.727894e+307 -5.433995e+307
c008 ... c390 c391 \
995 -6.580851e+306 ... -1.770789e+308 7.360232e+307
996 -9.312612e+307 ... 7.410784e+307 -6.943490e+307
997 -9.415141e+307 ... -5.451829e+305 4.406343e+306
998 NaN ... 5.220773e+307 -2.390131e+307
999 -1.747738e+308 ... 1.415798e+307 -1.239203e+308
c392 c393 c394 c395 \
995 -3.850417e+307 NaN 5.690363e+307 8.371989e+307
996 NaN 4.026712e+307 9.161669e+305 NaN
997 NaN 4.218585e+307 1.573892e+307 -7.624901e+307
998 -6.649416e+307 6.548489e+307 1.000078e+307 -9.399560e+307
999 -5.038284e+307 -1.340608e+307 -1.193758e+308 -1.476121e+308
c396 c397 c398 c399
995 3.491895e+307 7.344525e+307 -9.260950e+307 NaN
996 -1.729595e+308 4.021503e+307 2.274318e+307 5.856302e+307
997 -1.206386e+308 -6.164537e+306 -7.634148e+307 -1.462809e+308
998 NaN -6.963726e+307 -1.655656e+308 NaN
999 NaN NaN -1.584051e+308 -1.232190e+308
[5 rows x 401 columns]
Output of pd.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-45-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8
pandas: 0.18.1
nose: None
pip: 9.0.1
setuptools: 26.1.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None