pyarrow.RecordBatchReader — Apache Arrow v20.0.0 (original) (raw)

class pyarrow.RecordBatchReader#

Bases: _Weakrefable

Base class for reading stream of record batches.

Record batch readers function as iterators of record batches that also provide the schema (without the need to get any batches).

Warning

Do not call this class’s constructor directly, use one of theRecordBatchReader.from_* functions instead.

Notes

To import and export using the Arrow C stream interface, use the_import_from_c and _export_to_c methods. However, keep in mind this interface is intended for expert users.

Examples

import pyarrow as pa schema = pa.schema([('x', pa.int64())]) def iter_record_batches(): ... for i in range(2): ... yield pa.RecordBatch.from_arrays([pa.array([1, 2, 3])], schema=schema) reader = pa.RecordBatchReader.from_batches(schema, iter_record_batches()) print(reader.schema) x: int64 for batch in reader: ... print(batch) pyarrow.RecordBatch x: int64


x: [1,2,3] pyarrow.RecordBatch x: int64

x: [1,2,3]

__init__(*args, **kwargs)#

Methods

Attributes

cast(self, target_schema)#

Wrap this reader with one that casts each batch lazily as it is pulled. Currently only a safe cast to target_schema is implemented.

Parameters:

target_schemaSchema

Schema to cast to, the names and order of fields must match.

Returns:

RecordBatchReader

close(self)#

Release any resources associated with the reader.

static from_batches(Schema schema, batches)#

Create RecordBatchReader from an iterable of batches.

Parameters:

schemaSchema

The shared schema of the record batches

batchesIterable[RecordBatch]

The batches that this reader will return.

Returns:

readerRecordBatchReader

static from_stream(data, schema=None)#

Create RecordBatchReader from a Arrow-compatible stream object.

This accepts objects implementing the Arrow PyCapsule Protocol for streams, i.e. objects that have a __arrow_c_stream__ method.

Parameters:

dataArrow-compatible stream object

Any object that implements the Arrow PyCapsule Protocol for streams.

schemaSchema, default None

The schema to which the stream should be casted, if supported by the stream object.

Returns:

RecordBatchReader

iter_batches_with_custom_metadata(self)#

Iterate over record batches from the stream along with their custom metadata.

Yields:

RecordBatchWithMetadata

read_all(self)#

Read all record batches as a pyarrow.Table.

Returns:

Table

read_next_batch(self)#

Read next RecordBatch from the stream.

Returns:

RecordBatch

Raises:

StopIteration:

At end of stream.

read_next_batch_with_custom_metadata(self)#

Read next RecordBatch from the stream along with its custom metadata.

Returns:

batchRecordBatch

custom_metadataKeyValueMetadata

Raises:

StopIteration:

At end of stream.

read_pandas(self, **options)#

Read contents of stream to a pandas.DataFrame.

Read all record batches as a pyarrow.Table then convert it to a pandas.DataFrame using Table.to_pandas.

Parameters:

**options

Arguments to forward to Table.to_pandas().

Returns:

dfpandas.DataFrame

schema#

Shared schema of the record batches in the stream.

Returns:

Schema