DataFrame() returns an empty DataFrame, if DatetimeIndex with timezone-info and column label are passed · Issue #19157 · pandas-dev/pandas (original) (raw)

Problem description

Converting a DatetimeIndex to Dataframe will result in different behaviors, depending on whether DatetimeIndex is timezone-naive or timezone-aware. This issue only happens if columns label is provided.

Input of Example 1: DatetimeIndex with timezone-info.

import pandas as pd start_dt2 = pd.to_datetime('20180101T10:00:00') start_dt2 = start_dt2.tz_localize('Asia/Singapore') end_dt2 = pd.to_datetime('20180101T10:03:00') end_dt2 = end_dt2.tz_localize('Asia/Singapore') date_range2 = pd.date_range(start_dt2, end_dt2, freq='T') print(date_range2, '\n')

df2 = pd.DataFrame(date_range2, columns=['timestamps']) print(df2)

Output of Example1: an empty DataFrame.

DatetimeIndex(['2018-01-01 10:00:00+08:00', '2018-01-01 10:01:00+08:00', '2018-01-01 10:02:00+08:00', '2018-01-01 10:03:00+08:00'], dtype='datetime64[ns, Asia/Singapore]', freq='T')

Empty DataFrame Columns: [timestamps] Index: []

Input of Example 2: DatetimeIndex is timezone-naive.

start_dt = pd.to_datetime('20180101T10:00:00') end_dt = pd.to_datetime('20180101T10:03:00') date_range1 = pd.date_range(start_dt, end_dt, freq='T') print(date_range1, '\n')

df1 = pd.DataFrame(date_range1, columns=['timestamps']) print(df1)

Output of Example2: DataFrame returned.

DatetimeIndex(['2018-01-01 10:00:00', '2018-01-01 10:01:00', '2018-01-01 10:02:00', '2018-01-01 10:03:00'], dtype='datetime64[ns]', freq='T')

       timestamps

0 2018-01-01 10:00:00 1 2018-01-01 10:01:00 2 2018-01-01 10:02:00 3 2018-01-01 10:03:00

The above two examples include the argument columns=['timestamp']. However, if we simply convert the DatetimeIndex (timezone-aware) with input argument as a dictionary object, it returns a DataFrame as shown in the example 2 above.

Input of Example 3: DatetimeIndex with timezone-info, passed as part of a dictionary.

start_dt3 = pd.to_datetime('20180101T10:00:00') start_dt3 = start_dt3.tz_localize('Asia/Singapore') end_dt3 = pd.to_datetime('20180101T10:03:00') end_dt3 = end_dt3.tz_localize('Asia/Singapore') date_range3 = pd.date_range(start_dt3, end_dt3, freq='T') print(date_range3, '\n')

df3 = pd.DataFrame({"timestamps": date_range3}) print(df3)

Output of Example3: DataFrame returned.

DatetimeIndex(['2018-01-01 10:00:00+08:00', '2018-01-01 10:01:00+08:00', '2018-01-01 10:02:00+08:00', '2018-01-01 10:03:00+08:00'], dtype='datetime64[ns, Asia/Singapore]', freq='T')

             timestamps

0 2018-01-01 10:00:00+08:00 1 2018-01-01 10:01:00+08:00 2 2018-01-01 10:02:00+08:00 3 2018-01-01 10:03:00+08:00

Question

Shouldn't the Example 1 returns a non-empty DataFrame which agrees with the output Example 3? As I expect the outputs will be consistent, regardless of the timezone-info or timezone-naive of DatetimeIndex. Thank you.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
None