BUG: Load ORC-format data failed when pandas version>1.2.0.dev0 · Issue #40918 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
https://github.com/pandas-dev/pandas/blob/master/pandas/io/orc.py#L15-L54
Code Sample, a copy-pastable example
... import pandas as pd orc_data = pd.read_orc(orc_file_path)
Problem description
Pandas uses PyArrow package to load ORC/Parquet data.
For the orc data format, it will use pyarrow.orc.ORCFile
to read data (orc.py), but the PyArrow does not declare orc
in __init__.py file, so pandas will raise an AttributeError: module 'pyarrow' has no attribute 'orc'
This bug will occur if the Pandas version is greater than v1.2.0.dev0(after commit-6d1541e). Before that, pandas/io/orc.py
will declare import pyarrow.orc
before uses pyarrow to load orc data(v1.1.5/pandas/io.orc.py/).
Testing environment:
- Ubuntu 18.04
- python 3.7
- pandas v1.2.1
- pyarrow v3.0.0 (install via pip)(I haven't installed pyarrow via Conda for testing yet.)