pandas-dev/pandas (original) (raw)
I found that when using pandas.concat on dataframes with timezone information, the timezone might be lost when using 'innder' join:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: from pandas.util.print_versions import show_versions
In [4]: show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.7.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-0.bpo.4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US
pandas: 0.14.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2014.3
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
In [5]: dr1 = pd.date_range('20110101 0000','20110101 2359',freq='T',tz='Asia/Tokyo')
In [6]: dr2 = pd.date_range('20110101 0000','20110102 2359',freq='T',tz='Asia/Tokyo')
In [7]: df1 = pd.DataFrame(np.random.randn(5,1440),columns=dr1)
In [8]: df2 = pd.DataFrame(np.random.randn(5,2880),columns=dr2)
In [9]: outer = pd.concat([df1,df2],join='outer')
In [10]: inner = pd.concat([df1,df2],join='inner')
In [11]: print outer.axes[1]
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00+09:00, ..., 2011-01-02 23:59:00+09:00]
Length: 2880, Freq: T, Timezone: Asia/Tokyo
In [12]: print inner.axes[1]
<class 'pandas.tseries.index.DatetimeIndex'>
[2010-12-31 15:00:00, ..., 2011-01-01 14:59:00]
Length: 1440, Freq: T, Timezone: None
Looking at the output [11], the timezone information is kept after concat(), however, if we use 'inner' join, the timezone information is lost!