pyarrow.orc.ORCFile — Apache Arrow v20.0.0 (original) (raw)

class pyarrow.orc.ORCFile(source)[source]#

Bases: object

Reader interface for a single ORC file

Parameters:

sourcestr or pyarrow.NativeFile

Readable source. For passing Python file objects or byte buffers, see pyarrow.io.PythonFileInterface or pyarrow.io.BufferReader.

__init__(source)[source]#

Methods

Attributes

property compression#

Compression codec of the file

property compression_size#

Number of bytes to buffer for the compression codec in the file

property content_length#

Length of the data stripes in the file in bytes

The number of compressed bytes in the file footer

property file_length#

The number of bytes in the file

property file_postscript_length#

The number of bytes in the file postscript

property file_version#

Format version of the ORC file, must be 0.11 or 0.12

property metadata#

The file metadata, as an arrow KeyValueMetadata

property nrows#

The number of rows in the file

property nstripe_statistics#

Number of stripe statistics

property nstripes#

The number of stripes in the file

read(columns=None)[source]#

Read the whole file.

Parameters:

columnslist

If not None, only these columns will be read from the file. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’. Output always follows the ordering of the file and not the columns list.

Returns:

pyarrow.Table

Content of the file as a Table.

read_stripe(n, columns=None)[source]#

Read a single stripe from the file.

Parameters:

nint

The stripe index

columnslist

If not None, only these columns will be read from the stripe. A column name may be a prefix of a nested field, e.g. ‘a’ will select ‘a.b’, ‘a.c’, and ‘a.d.e’

Returns:

pyarrow.RecordBatch

Content of the stripe as a RecordBatch.

property row_index_stride#

Number of rows per an entry in the row index or 0 if there is no row index

property schema#

The file schema, as an arrow schema

property software_version#

Software instance and version that wrote this file

property stripe_statistics_length#

The number of compressed bytes in the file stripe statistics

property writer#

Name of the writer that wrote this file. If the writer is unknown then its Writer ID (a number) is returned

property writer_version#

Version of the writer