The feather example from docs segfault with pyarrow 0.8.0 · Issue #24767 · pandas-dev/pandas (original) (raw)
The example from the docs succeeds to write and read the feather file, but the read-in dataframe is corrupted.
This seems to be due to the Categorical column (reduced example):
In [1]: pd.__version__
Out[1]: '0.24.0rc1+12.g453fa85a8'
In [2]: import pyarrow
In [3]: pyarrow.__version__
Out[3]: '0.8.0'
In [6]: df = pd.DataFrame({'a': [1, 2, 3], 'b': pd.Categorical(['a', 'b', 'a'])})
In [7]: df.to_feather('example.feather')
In [8]: result = pd.read_feather('example.feather')
/home/joris/miniconda3/envs/dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py:628: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.
labels, = index.labels
In [9]: result._data
Out[9]:
BlockManager
Items: Index(['a', 'b'], dtype='object')
Axis 1: RangeIndex(start=0, stop=3, step=1)
IntBlock: slice(0, 1, 1), 1 x 3, dtype: int64
CategoricalBlock: slice(1, 2, 1), 1 x 3, dtype: category
In [10]: result._data.blocks[1].values
Out[10]: Segmentation fault (core dumped)
On general displaying of the result
, you also get a segfault.
The above is with latest master of pandas, and pyarrow 0.8.0 installed from conda-forge (quite old, but still in the supported range). Still need to try with a newer version of pyarrow.