The feather example from docs segfault with pyarrow 0.8.0 · Issue #24767 · pandas-dev/pandas (original) (raw)

The example from the docs succeeds to write and read the feather file, but the read-in dataframe is corrupted.
This seems to be due to the Categorical column (reduced example):

In [1]: pd.__version__                                                                                                                                                                                              
Out[1]: '0.24.0rc1+12.g453fa85a8'

In [2]: import pyarrow                                                                                                                                                                                              

In [3]: pyarrow.__version__                                                                                                                                                                                         
Out[3]: '0.8.0'

In [6]: df = pd.DataFrame({'a': [1, 2, 3], 'b': pd.Categorical(['a', 'b', 'a'])})                                                                                                                                   

In [7]: df.to_feather('example.feather')                                                                                                                                                                            

In [8]: result = pd.read_feather('example.feather')                                                                                                                                                                 
/home/joris/miniconda3/envs/dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py:628: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead.
  labels, = index.labels

In [9]: result._data                                                                                                                                                                                                
Out[9]: 
BlockManager
Items: Index(['a', 'b'], dtype='object')
Axis 1: RangeIndex(start=0, stop=3, step=1)
IntBlock: slice(0, 1, 1), 1 x 3, dtype: int64
CategoricalBlock: slice(1, 2, 1), 1 x 3, dtype: category

In [10]: result._data.blocks[1].values                                                                                                                                                                              
Out[10]: Segmentation fault (core dumped)

On general displaying of the result, you also get a segfault.

The above is with latest master of pandas, and pyarrow 0.8.0 installed from conda-forge (quite old, but still in the supported range). Still need to try with a newer version of pyarrow.