pandas.DataFrame.to_orc — pandas 2.2.3 documentation (original) (raw)
DataFrame.to_orc(path=None, *, engine='pyarrow', index=None, engine_kwargs=None)[source]#
Write a DataFrame to the ORC format.
Added in version 1.5.0.
Parameters:
pathstr, file-like object or None, default None
If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function). If path is None, a bytes object is returned.
engine{‘pyarrow’}, default ‘pyarrow’
ORC library to use.
indexbool, optional
If True
, include the dataframe’s index(es) in the file output. If False
, they will not be written to the file. If None
, similar to infer
the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.
engine_kwargsdict[str, Any] or None, default None
Additional keyword arguments passed to pyarrow.orc.write_table().
Returns:
bytes if no path argument is provided else None
Raises:
NotImplementedError
Dtype of one or more columns is category, unsigned integers, interval, period or sparse.
ValueError
engine is not pyarrow.
Notes
- Before using this function you should read the user guide about ORC and install optional dependencies.
- This function requires pyarrowlibrary.
- For supported dtypes please refer to supported ORC features in Arrow.
- Currently timezones in datetime columns are not preserved when a dataframe is converted into ORC files.
Examples
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [4, 3]}) df.to_orc('df.orc')
pd.read_orc('df.orc')
col1 col2 0 1 4 1 2 3
If you want to get a buffer to the orc content you can write it to io.BytesIO
import io b = io.BytesIO(df.to_orc())
b.seek(0)
0 content = b.read()