BUG: SAS datetime column with null values cannot be parsed. · Issue #39725 · pandas-dev/pandas (original) (raw)


Code Sample

import pandas as pd pd.read_sas('dates_null.sas7bdat')

Problem description

When a column datetime contains null values it cannot be converted to a dataframe

Traceback

Traceback (most recent call last): File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sas7bdat.py", line 52, in _convert_datetimes return pd.to_datetime(sas_datetimes, unit=unit, origin="1960-01-01") File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 805, in to_datetime values = convert_listlike(arg._values, format) File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 345, in _convert_listlike_datetimes result, tz_parsed = tslib.array_with_unit_to_datetime( File "pandas/_libs/tslib.pyx", line 249, in pandas._libs.tslib.array_with_unit_to_datetime pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: cannot convert input with unit 'd'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sasreader.py", line 152, in read_sas return reader.read() File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sas7bdat.py", line 723, in read rslt = self._chunk_to_dataframe() File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sas7bdat.py", line 771, in _chunk_to_dataframe rslt[name] = _convert_datetimes(rslt[name], "d") File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sas7bdat.py", line 59, in _convert_datetimes return sas_datetimes.apply( File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/core/series.py", line 4135, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer File "/home/wertha/source/pandas/pandas/tests/io/sas/data/.test/lib/python3.9/site-packages/pandas/io/sas/sas7bdat.py", line 60, in lambda sas_float: datetime(1960, 1, 1) + timedelta(days=sas_float) ValueError: cannot convert float NaN to integer

Expected Output

A dataframe with NaT when nulls are found.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7d32926
python : 3.9.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.14-arch1-1
Version : #1 SMP PREEMPT Sun, 07 Feb 2021 22:42:17 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.2
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 49.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None