pyarrow.orc.ORCFile — Apache Arrow v20.0.0 (original) (raw)
class pyarrow.orc.ORCFile(source)[source]#
Bases: object
Reader interface for a single ORC file
Parameters:
sourcestr or pyarrow.NativeFile
Readable source. For passing Python file objects or byte buffers, see pyarrow.io.PythonFileInterface or pyarrow.io.BufferReader.
Methods
Attributes
property compression#
Compression codec of the file
property compression_size#
Number of bytes to buffer for the compression codec in the file
property content_length#
Length of the data stripes in the file in bytes
The number of compressed bytes in the file footer
property file_length#
The number of bytes in the file
property file_postscript_length#
The number of bytes in the file postscript
property file_version#
Format version of the ORC file, must be 0.11 or 0.12
property metadata#
The file metadata, as an arrow KeyValueMetadata
property nrows#
The number of rows in the file
property nstripe_statistics#
Number of stripe statistics
property nstripes#
The number of stripes in the file
Read the whole file.
Parameters:
columnslist
If not None, only these columns will be read from the file. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’. Output always follows the ordering of the file and not the columns list.
Returns:
Content of the file as a Table.
read_stripe(n, columns=None)[source]#
Read a single stripe from the file.
Parameters:
nint
The stripe index
columnslist
If not None, only these columns will be read from the stripe. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’
Returns:
Content of the stripe as a RecordBatch.
property row_index_stride#
Number of rows per an entry in the row index or 0 if there is no row index
property schema#
The file schema, as an arrow schema
property software_version#
Software instance and version that wrote this file
property stripe_statistics_length#
The number of compressed bytes in the file stripe statistics
property writer#
Name of the writer that wrote this file. If the writer is unknown then its Writer ID (a number) is returned
property writer_version#
Version of the writer