pyarrow.ExtensionType — Apache Arrow v20.0.0 (original) (raw)

class pyarrow.ExtensionType(DataType storage_type, extension_name)#

Bases: BaseExtensionType

Concrete base class for Python-defined extension types.

Parameters:

storage_typeDataType

The underlying storage type for the extension type.

extension_namestr

A unique name distinguishing this extension type. The name will be used when deserializing IPC data.

Examples

Define a RationalType extension type subclassing ExtensionType:

import pyarrow as pa class RationalType(pa.ExtensionType): ... def init(self, data_type: pa.DataType): ... if not pa.types.is_integer(data_type): ... raise TypeError(f"data_type must be an integer type not {data_type}") ... super().init( ... pa.struct( ... [ ... ("numer", data_type), ... ("denom", data_type), ... ], ... ), ... # N.B. This name does not reference data_type so deserialization ... # will work for any integer data_type after registration ... "my_package.rational", ... ) ... def arrow_ext_serialize(self) -> bytes: ... # No parameters are necessary ... return b"" ... @classmethod ... def arrow_ext_deserialize(cls, storage_type, serialized): ... # return an instance of this subclass ... return RationalType(storage_type[0].type)

Register the extension type:

pa.register_extension_type(RationalType(pa.int64()))

Create an instance of RationalType extension type:

rational_type = RationalType(pa.int32())

Inspect the extension type:

rational_type.extension_name 'my_package.rational' rational_type.storage_type StructType(struct<numer: int32, denom: int32>)

Wrap an array as an extension array:

storage_array = pa.array( ... [ ... {"numer": 10, "denom": 17}, ... {"numer": 20, "denom": 13}, ... ], ... type=rational_type.storage_type ... ) rational_array = rational_type.wrap_array(storage_array) rational_array <pyarrow.lib.ExtensionArray object at ...> -- is_valid: all not null -- child 0 type: int32 [ 10, 20 ] -- child 1 type: int32 [ 17, 13 ]

Or do the same with creating an ExtensionArray:

rational_array = pa.ExtensionArray.from_storage(rational_type, storage_array) rational_array <pyarrow.lib.ExtensionArray object at ...> -- is_valid: all not null -- child 0 type: int32 [ 10, 20 ] -- child 1 type: int32 [ 17, 13 ]

Unregister the extension type:

pa.unregister_extension_type("my_package.rational")

Note that even though we registered the concrete typeRationalType(pa.int64()), PyArrow will be able to deserializeRationalType(integer_type) for any integer_type, as the deserializer will reference the name my_package.rational and the @classmethod __arrow_ext_deserialize__.

__init__()#

Initialize an extension type instance.

This should be called at the end of the subclass’__init__ method.

Methods

Attributes

bit_width#

The bit width of the extension type.

byte_width#

The byte width of the extension type.

equals(self, other, *, check_metadata=False)#

Return true if type is equivalent to passed value.

Parameters:

otherDataType or str convertible to DataType

check_metadatabool

Whether nested Field metadata equality should be checked as well.

Returns:

is_equalbool

Examples

import pyarrow as pa pa.int64().equals(pa.string()) False pa.int64().equals(pa.int64()) True

extension_name#

The extension type name.

field(self, i) → Field#

Parameters:

iint

Returns:

pyarrow.Field

has_variadic_buffers#

If True, the number of expected buffers is only lower-bounded by num_buffers.

Examples

import pyarrow as pa pa.int64().has_variadic_buffers False pa.string_view().has_variadic_buffers True

id#

num_buffers#

Number of data buffers required to construct Array type excluding children.

Examples

import pyarrow as pa pa.int64().num_buffers 2 pa.string().num_buffers 3

num_fields#

The number of child fields.

Examples

import pyarrow as pa pa.int64() DataType(int64) pa.int64().num_fields 0 pa.list_(pa.string()) ListType(list<item: string>) pa.list_(pa.string()).num_fields 1 struct = pa.struct({'x': pa.int32(), 'y': pa.string()}) struct.num_fields 2

storage_type#

The underlying storage type.

to_pandas_dtype(self)#

Return the equivalent NumPy / Pandas dtype.

Examples

import pyarrow as pa pa.int64().to_pandas_dtype() <class 'numpy.int64'>

wrap_array(self, storage)#

Wrap the given storage array as an extension array.

Parameters:

storageArray or ChunkedArray

Returns:

arrayArray or ChunkedArray

Extension array wrapping the storage array