BUG: df.to_parquet() fails for a PyFilesystem2 file handle · Issue #48944 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
def test_pandas_parquet_writer_bug(): import pandas as pd from fs import open_fs df = pd.DataFrame(data={"A": [0, 1], "B": [1, 0]}) with open_fs('osfs://./').open('.test.parquet', 'wb') as f: print(f.name) df.to_parquet(f)
Issue Description
Running the example above with Pandas 1.5.0 results in the exception below, but works fine in 1.4.3.
File ".venv\lib\site-packages\pandas\io\parquet.py", line 202, in write
self.api.parquet.write_table(
File ".venv\lib\site-packages\pyarrow\parquet.py", line 1970, in write_table
with ParquetWriter(
File ".venv\lib\site-packages\pyarrow\parquet.py", line 655, in __init__
self.writer = _parquet.ParquetWriter(
File "pyarrow\_parquet.pyx", line 1387, in pyarrow._parquet.ParquetWriter.__cinit__
File "pyarrow\io.pxi", line 1579, in pyarrow.lib.get_writer
TypeError: Unable to read from object of type: <class 'bytes'>
This seems to be because of this new block added to io.parquet.py
:
if (
isinstance(path_or_handle, io.BufferedWriter)
and hasattr(path_or_handle, "name")
and isinstance(path_or_handle.name, (str, bytes))
):
path_or_handle = path_or_handle.name
Which assumes that path_or_handle.name
will always be of type str
, but is actually bytes in this example.
A solution that works for me is to simply append this below the block:
if isinstance(path_or_handle, bytes):
path_or_handle = path_or_handle.decode("utf-8")
Alternatively, just passing thr BufferedWriter through to the subsequent self.api.parquet.write_to_dataset
call works as well, but I'm not sure of the implications of reverting to this old behaviour (which I why I haven't submitted a pull request, but happy to do so if someone can confirm).
Cheers,
Sam.
Expected Behavior
No exception is thrown
Installed Versions
INSTALLED VERSIONS
commit : 87cfe4e
python : 3.9.13.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Australia.1252
pandas : 1.5.0
numpy : 1.23.3
pytz : 2022.4
dateutil : 2.8.2
setuptools : 59.8.0
pip : 22.2.2
Cython : None
pytest : 6.2.5
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.3
jinja2 : 3.1.2
IPython : None
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.8.2
gcsfs : None
matplotlib : 3.6.0
numba : 0.56.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 5.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2022.8.2
scipy : 1.9.1
snappy : None
sqlalchemy : 1.4.41
tables : None
tabulate : 0.8.10
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None