BUG: Comment in ODS-file gets included in string cells · Issue #55200 · pandas-dev/pandas (original) (raw)

Pandas version checks

Reproducible Example

import pandas as pd df = pd.read_excel('TableWithComment.ods', engine='odf', header=0) print(df)

Issue Description

Here is the example *.ods file for the example code: TableWithComment.ods. The file was create using LibreOffice Calc 7.4.7.2
TableWithComment

The print(df) command produces the following output

   Quantity 1 2023-09-19T00:00:00Comment IQuantity 2
0          10                                     20
1          11                                     21
2          12                                     22
3          13                                     23
4          14                                     24
5          15      2023-09-19T00:00:00Comment IIIABC
6          16                                     26
7          17                                     27
8          18                                     28

If a cell in the *.ods file has a comment and the cell content is a string (B1 and B7), the comment text and timestamp are appended in front of the cell content. This also applies to the header line. For cells containing numbers, the comment is ignored (A6 and B10).

(Note, that I am referring to comments inserted using LibreOffice Calc itself. This is unrelated to the comment argument of read_excel.)

Expected Behavior

The ods-comments should be ignored and only the cell content read into the dataframe. The expected output from print(df) is

   Quantity 1 Quantity 2
0          10         20
1          11         21
2          12         22
3          13         23
4          14         24
5          15        ABC
6          16         26
7          17         27
8          18         28

(This is the output when all comments in the *.ods file are deleted)

Installed Versions

INSTALLED VERSIONS

commit : ba1cccd
python : 3.10.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.17763
machine : AMD64
processor : Intel64 Family 6 Model 165 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : de_DE.cp1252

pandas : 2.1.0
numpy : 1.26.0
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.15.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : 2.4.0
pyqt5 : None

(The installed odfpy version is 1.4.1, which somehow is not shown in the above output)

Libre Office Version

Version: 7.4.7.2 (x64) / LibreOffice Community
Build ID: 723314e595e8007d3cf785c16538505a1c878ca5
CPU threads: 12; OS: Windows 10.0 Build 17763; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: de-DE
Calc: threaded