pyarrow.Schema — Apache Arrow v20.0.0 (original) (raw)
class pyarrow.Schema#
Bases: _Weakrefable
A named collection of types a.k.a schema. A schema defines the column names and types in a record batch or table data structure. They also contain metadata about the columns. For example, schemas converted from Pandas contain metadata about their original Pandas types so they can be converted back to the same types.
Warning
Do not call this class’s constructor directly. Instead usepyarrow.schema() factory function which makes a new Arrow Schema object.
Examples
Create a new Arrow Schema object:
import pyarrow as pa pa.schema([ ... ('some_int', pa.int32()), ... ('some_string', pa.string()) ... ]) some_int: int32 some_string: string
Create Arrow Schema with metadata:
pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal'
__init__(*args, **kwargs)#
Methods
Attributes
add_metadata(self, metadata)#
DEPRECATED
Parameters:
metadatadict
Keys and values must be string-like / coercible to bytes
append(self, Field field)#
Append a field at the end of the schema.
In contrast to Python’s list.append()
it does return a new object, leaving the original Schema unmodified.
Parameters:
fieldField
Returns:
schema: Schema
New object with appended field.
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Append a field ‘extra’ at the end of the schema:
schema_new = schema.append(pa.field('extra', pa.bool_())) schema_new n_legs: int64 animals: string extra: bool
Original schema is unmodified:
schema n_legs: int64 animals: string
empty_table(self)#
Provide an empty table according to the schema.
Returns:
table: pyarrow.Table
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Create an empty table with schema’s fields:
schema.empty_table() pyarrow.Table n_legs: int64 animals: string
n_legs: [[]] animals: [[]]
equals(self, Schema other, bool check_metadata=False)#
Test if this schema is equal to the other
Parameters:
otherpyarrow.Schema
check_metadatabool, default False
Key/value metadata must be equal too
Returns:
is_equalbool
Examples
import pyarrow as pa schema1 = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) schema2 = pa.schema([ ... ('some_int', pa.int32()), ... ('some_string', pa.string()) ... ])
Test two equal schemas:
schema1.equals(schema1) True
Test two unequal schemas:
schema1.equals(schema2) False
field(self, i)#
Select a field by its column name or numeric index.
Parameters:
Returns:
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Select the second field:
schema.field(1) pyarrow.Field<animals: string>
Select the field of the column named ‘n_legs’:
schema.field('n_legs') pyarrow.Field<n_legs: int64>
field_by_name(self, name)#
DEPRECATED
Parameters:
namestr
Returns:
field: pyarrow.Field
classmethod from_pandas(cls, df, preserve_index=None)#
Returns implied schema from dataframe
Parameters:
preserve_indexbool, default True
Whether to store the index as an additional column (or columns, for MultiIndex) in the resulting Table. The default of None will store the index as a column, except for RangeIndex which is stored as metadata only. Usepreserve_index=True
to force it to be stored as a column.
Returns:
Examples
import pandas as pd import pyarrow as pa df = pd.DataFrame({ ... 'int': [1, 2], ... 'str': ['a', 'b'] ... })
Create an Arrow Schema from the schema of a pandas dataframe:
pa.Schema.from_pandas(df) int: int64 str: string -- schema metadata -- pandas: '{"index_columns": [{"kind": "range", "name": null, ...
get_all_field_indices(self, name)#
Return sorted list of indices for the fields with the given name.
Parameters:
namestr
The name of the field to look up.
Returns:
indicesList
[int]
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string()), ... pa.field('animals', pa.bool_())])
Get the indexes of the fields named ‘animals’:
schema.get_all_field_indices("animals") [1, 2]
get_field_index(self, name)#
Return index of the unique field with the given name.
Parameters:
namestr
The name of the field to look up.
Returns:
indexint
The index of the field with the given name; -1 if the name isn’t found or there are several fields with the given name.
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Get the index of the field named ‘animals’:
schema.get_field_index("animals") 1
Index in case of several fields with the given name:
schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string()), ... pa.field('animals', pa.bool_())], ... metadata={"n_legs": "Number of legs per animal"}) schema.get_field_index("animals") -1
insert(self, int i, Field field)#
Add a field at position i to the schema.
Parameters:
iint
fieldField
Returns:
schema: Schema
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Insert a new field on the second position:
schema.insert(1, pa.field('extra', pa.bool_())) n_legs: int64 extra: bool animals: string
metadata#
The schema’s metadata (if any is set).
Returns:
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"})
Get the metadata of the schema’s fields:
schema.metadata {b'n_legs': b'Number of legs per animal'}
names#
The schema’s field names.
Returns:
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Get the names of the schema’s fields:
schema.names ['n_legs', 'animals']
pandas_metadata#
Return deserialized-from-JSON pandas metadata field (if it exists)
Examples
import pyarrow as pa import pandas as pd df = pd.DataFrame({'n_legs': [2, 4, 5, 100], ... 'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]}) schema = pa.Table.from_pandas(df).schema
Select pandas metadata field from Arrow Schema:
schema.pandas_metadata {'index_columns': [{'kind': 'range', 'name': None, 'start': 0, 'stop': 4, 'step': 1}], ...
remove(self, int i)#
Remove the field at index i from the schema.
Parameters:
iint
Returns:
schema: Schema
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Remove the second field of the schema:
schema.remove(1) n_legs: int64
remove_metadata(self)#
Create new schema without metadata, if any
Returns:
schemapyarrow.Schema
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) schema n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal'
Create a new schema with removing the metadata from the original:
schema.remove_metadata() n_legs: int64 animals: string
serialize(self, memory_pool=None)#
Write Schema to Buffer as encapsulated IPC message
Parameters:
memory_poolMemoryPool, default None
Uses default memory pool if not specified
Returns:
serializedBuffer
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Write schema to Buffer:
schema.serialize() <pyarrow.Buffer address=0x... size=... is_cpu=True is_mutable=True>
set(self, int i, Field field)#
Replace a field at position i in the schema.
Parameters:
iint
fieldField
Returns:
schema: Schema
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Replace the second field of the schema with a new field ‘extra’:
schema.set(1, pa.field('replaced', pa.bool_())) n_legs: int64 replaced: bool
to_string(self, truncate_metadata=True, show_field_metadata=True, show_schema_metadata=True)#
Return human-readable representation of Schema
Parameters:
truncate_metadatabool, default True
Limit metadata key/value display to a single line of ~80 characters or less
show_field_metadatabool, default True
Display Field-level KeyValueMetadata
show_schema_metadatabool, default True
Display Schema-level KeyValueMetadata
Returns:
strthe
formatted
output
types#
The schema’s field types.
Returns:
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Get the types of the schema’s fields:
schema.types [DataType(int64), DataType(string)]
with_metadata(self, metadata)#
Add metadata as dict of string keys and values to Schema
Parameters:
metadatadict
Keys and values must be string-like / coercible to bytes
Returns:
schemapyarrow.Schema
Examples
import pyarrow as pa schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Add metadata to existing schema field:
schema.with_metadata({"n_legs": "Number of legs per animal"}) n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal'