BUG: pandas.to_datetime produces wrong/strange results on 32-bit float data for 6-column format · Issue #60506 · pandas-dev/pandas (original) (raw)

Pandas version checks

Reproducible Example

from datetime import datetime, UTC import pandas as pd

start = datetime(2024, 1, 1) end = datetime(2025, 1, 1) samples = 10

df = pd.DataFrame([ [ 2024, 1, 7, 11, 42, 13], [ 2024, 9, 19, 11, 54, 20], [ 2024, 9, 17, 1, 22, 0], [ 2024, 1, 24, 21, 59, 55], [ 2024, 6, 15, 12, 27, 30], [ 2024, 9, 26, 23, 58, 26], [ 2024, 6, 6, 0, 19, 59], [ 2024, 1, 8, 2, 7, 43], [ 2024, 2, 16, 16, 20, 13], [ 2024, 12, 22, 23, 54, 4]])

df.columns = ['year', 'month', 'day', 'hour', 'minute', 'second']

ts = pd.to_datetime(df, utc=True) ts32 = pd.to_datetime(df.astype('float32'), utc=True) ts64 = pd.to_datetime(df.astype('float64'), utc=True)

print (ts - ts32)

assert ts.equals(ts64) assert ts.equals(ts32)

Issue Description

When constructing datetime from 6-column format, and the data is stored at 32-bit floats pandas.to_datetime silently produces strange (off by one day) results.
pandas.to_datetime should either produce correct results or throw an Exception. Correct results would be preferred :)

Expected Behavior

from datetime import datetime, UTC
import pandas as pd

start = datetime(2024, 1, 1)
end = datetime(2025, 1, 1)
samples = 10

df = pd.DataFrame([
[ 2024, 1, 7, 11, 42, 13],
[ 2024, 9, 19, 11, 54, 20],
[ 2024, 9, 17, 1, 22, 0],
[ 2024, 1, 24, 21, 59, 55],
[ 2024, 6, 15, 12, 27, 30],
[ 2024, 9, 26, 23, 58, 26],
[ 2024, 6, 6, 0, 19, 59],
[ 2024, 1, 8, 2, 7, 43],
[ 2024, 2, 16, 16, 20, 13],
[ 2024, 12, 22, 23, 54, 4]])

df.columns = ['year', 'month', 'day', 'hour', 'minute', 'second']

ts = pd.to_datetime(df, utc=True)
ts32 = pd.to_datetime(df.astype('float32'), utc=True)
ts64 = pd.to_datetime(df.astype('float64'), utc=True)

print (ts - ts32)

assert ts.equals(ts64)
assert ts.equals(ts32)

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cpython : 3.12.7 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : Intel64 Family 6 Model 186 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : English_Denmark.1252

pandas : 2.2.3
numpy : 2.1.2
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : None
sphinx : None
IPython : 8.28.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 17.0.0
pyreadstat : None
pytest : 8.3.3
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2024.2
qtpy : None
pyqt5 : None