BUG: read_excel with openpyxl results in empty data frame · Issue #39001 · pandas-dev/pandas (original) (raw)
- [ X ] I have checked that this issue has not already been reported.
- [ X ] I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Old behavior with xlrd 1.2.0 (last version with XLSX support)
>>> import pandas as pd
>>> df=pd.read_excel('infile.xlsx', engine='xlrd')
<stdin>:1: FutureWarning: Your version of xlrd is 1.2.0. In xlrd >= 2.0, only the xls format is supported. As a result, the openpyxl engine will be used if it is installed and the engine argument is not specified. Install openpyxl instead.
>>> df.shape
(843, 11)
File is read perfectly and dataframe is ok, with all 11 columns.
Problem description
When using openpyxl as engine for read_excel:
>>> import pandas as pd
>>> df=pd.read_excel('infile.xlsx', engine='openpyxl')
>>> print(df)
Empty DataFrame
Columns: [Column1, Column2, Column3]
Index: []
>>> df.shape
(0, 3)
Only the 3 first headers are read, and it stops.
However, if the file is opened directly with openpyxl, it works fine, I'm using it to open and save the file with a temporary name to make openpyxl work as engine on Pandas:
>>> wb = load_workbook('infile.xlsx')
>>> wb.save(filename = 'temp.xlsx')
>>> pd.read_excel('temp.xlsx', engine='openpyxl')
>>> df.shape
(843, 11)
INSTALLED VERSIONS
commit : 3e89b4c
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.128-microsoft-standard
Version : #1 SMP Tue Jun 23 12:58:10 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.0
numpy : 1.19.5
pytz : 2020.5
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.3.1
Cython : 0.29.21
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.8.5
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : None
tables : None
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : None