Failing to parse date given as integers from a (MS)SQL query · Issue #17855 · pandas-dev/pandas (original) (raw)
Code Sample
SQL = """ SELECT TOP 10 submitted_date_id FROM my_table WHERE submitted_date_id BETWEEN %(mindate)s AND %(maxdate)s """
df = pd.read_sql( SQL, con, params={'mindate': 20171008, 'maxdate': '20171010'}, parse_dates={'submitted_date_id': '%Y%m%d'})
df
The underlaying database is MSSQL and I'm using pymssql
to build the connection con
.
Problem description
The column submitted_date_id
has integers representing the date YYYYMMDD
. The code above is expected to parse the column and populate the resulting column in the dataframe with Timestamp
objects. However, this code yields an error:
ValueError: cannot cast unit %Y%m%d
The current behavior is counter intuitive and seems to assume that the date column contains stings. A work around is to cast the column in the query to VARCHAR
, but this is cumbersome. Moreover, it adds unneeded complexity to the query.
Expected Output
When providing the format of the dates to parse_dates
, it is expected that the column will hold Timestamp objects.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.3.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.2.0
sphinx: 1.6.4
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None