NumPy Integration — Apache Arrow v22.0.0 (original) (raw)
PyArrow allows converting back and forth fromNumPy arrays to Arrow Arrays.
NumPy to Arrow#
To convert a NumPy array to Arrow, one can simply call the pyarrow.array()factory function.
import numpy as np import pyarrow as pa data = np.arange(10, dtype='int16') arr = pa.array(data) arr <pyarrow.lib.Int16Array object at 0x7fb1d1e6ae58> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
Converting from NumPy supports a wide range of input dtypes, including structured dtypes or strings.
Arrow to NumPy#
In the reverse direction, it is possible to produce a view of an Arrow Array for use with NumPy using the to_numpy() method. This is limited to primitive types for which NumPy has the same physical representation as Arrow, and assuming the Arrow data has no nulls.
import numpy as np import pyarrow as pa arr = pa.array([4, 5, 6], type=pa.int32()) view = arr.to_numpy() view array([4, 5, 6], dtype=int32)
For more complex data types, you have to use the to_pandas()method (which will construct a Numpy array with Pandas semantics for, e.g., representation of null values).