BUG: Error in read_excel() function with some ods files · Issue #45598 · pandas-dev/pandas (original) (raw)

Pandas version checks

Reproducible Example

import pandas as pd

pd.read_excel('exported_with_qgis.ods')

Issue Description

When taking an ods file generated from qgis (exporting a vector layer in ods format), its content.xml (internal file from ods) has newlines between XML elements, like this:

<table:table-row>
<table:table-cell office:value-type="string">
<text:p>CODIGO</text:p>
</table:table-cell>
...

instead of being like this:

<table:table-row><table:table-cell office:value-type="string"><text:p>CODIGO</text:p></table:table-cell>
...

then if I open with LibreOffice calc that file it's ok, but when doing:

then there is this traceback:

AttributeError: 'Text' object has no attribute 'qname'

Solution:
rewrite line no. 103 of https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_odfreader.py

-sheet_cells = [x for x in sheet_row.childNodes if x.qname in cell_names]
+sheet_cells = [x for x in sheet_row.childNodes if 'qname' in dir(x) and x.qname in cell_names]

Expected Behavior

The expected behaviour is to ignore the newlines. LibreOffice calc opens correctly that kind of ods

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 73c68257545b5f8530b7044f56647bd2db92e2ba
python           : 3.9.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.13.0-22-generic
Version          : #22-Ubuntu SMP Fri Nov 5 13:21:36 UTC 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : es_ES.UTF-8
LOCALE           : es_ES.UTF-8

pandas           : 1.3.3
numpy            : 1.19.5
pytz             : 2021.1
dateutil         : 2.8.1
pip              : 20.3.4
setuptools       : 52.0.0
Cython           : None
pytest           : 6.0.2
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.6.3
html5lib         : 1.1
pymysql          : None
psycopg2         : 2.8.6 (dt dec pq3 ext lo64)
jinja2           : 3.0.0
IPython          : None
pandas_datareader: None
bs4              : 4.9.3
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.4
numexpr          : None
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.7.3
sqlalchemy       : 1.3.22
tables           : None
tabulate         : 0.8.9
xarray           : None
xlrd             : 1.2.0
xlwt             : 1.2.0
numba            : 0.54.1