BUG: Pandas cannot create DataFrame from Numpy Array of TimeStamps · Issue #13287 · pandas-dev/pandas (original) (raw)
I have the following array of Timestamps:
ts_array = np.array([[Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T'), Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T'), Timestamp('2016-05-02 15:50:00+0000', tz='UTC', offset='5T')], [Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T'), Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T'), Timestamp('2016-05-02 17:10:00+0000', tz='UTC', offset='5T')], [Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T'), Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T'), Timestamp('2016-05-02 20:25:00+0000', tz='UTC', offset='5T')]], dtype=object)
I can't create a DataFrame from this array using the DataFrame constructor:
Traceback (most recent call last):
File "/Users/jkelleher/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-46-ae20c6b6248f>", line 1, in <module>
pd.DataFrame(ts_array)
File "/Users/jkelleher/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 255, in __init__
copy=copy)
File "/Users/jkelleher/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 432, in _init_ndarray
return create_block_manager_from_blocks([values], [columns, index])
File "/Users/jkelleher/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3986, in create_block_manager_from_blocks
mgr = BlockManager(blocks, axes)
File "/Users/jkelleher/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2591, in __init__
(block.ndim, self.ndim))
AssertionError: Number of Block dimensions (1) must equal number of axes (2)
I can create the DataFrame from the array using from_records
:
ts_df = pd.DataFrame.from_records(ts_array)
However, when I attempt to transpose this DataFrame, I wind up with the same AssertionError
as before.
AssertionError: Number of Block dimensions (1) must equal number of axes (2)
If I convert the Timestamps to Datetimes, the error persists. I can, however, convert the Timestamps to Datetime64 objects, and this fixes the problem.
dt64_array = np.array([[ts.to_datetime64() for ts in sublist] for sublist in ts_array]) pd.DataFrame(dt64_array)
Out[56]:
0 1 2
0 2016-05-02 15:50:00 2016-05-02 15:50:00 2016-05-02 15:50:00
1 2016-05-02 17:10:00 2016-05-02 17:10:00 2016-05-02 17:10:00
2 2016-05-02 20:25:00 2016-05-02 20:25:00 2016-05-02 20:25:00
pd.DataFrame(dt64_array).transpose()
Out[57]:
0 1 2
0 2016-05-02 15:50:00 2016-05-02 17:10:00 2016-05-02 20:25:00
1 2016-05-02 15:50:00 2016-05-02 17:10:00 2016-05-02 20:25:00
2 2016-05-02 15:50:00 2016-05-02 17:10:00 2016-05-02 20:25:00
Though I found a suitable workaround, I feel like pandas should be able to construct and operate on DataFrames of Timestamps as easily as other other objects.
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 20.3
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.8.0.dev0+970e99e
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None