ak.to_buffers — Awkward Array 2.8.2 documentation (original) (raw)

Defined in awkward.operations.ak_to_buffers on line 16.

ak.to_buffers(array, container=None, buffer_key='{form_key}-{attribute}', form_key='node{id}', *, id_start=0, backend=None, byteorder='<')#

Parameters:

Decomposes an Awkward Array into a Form and a collection of memory buffers, so that data can be losslessly written to file formats and storage devices that only map names to binary blobs (such as a filesystem directory).

This function returns a 3-tuple:

(form, length, container)

where the form is a ak.forms.Form (whose string representation is JSON), the length is an integer (len(array)), and the container is either the MutableMapping you passed in or a new dict containing the buffers (as NumPy arrays).

These are also the first three arguments of ak.from_buffers, so a full round-trip is

reconstituted = ak.from_buffers(*ak.to_buffers(original))

The container argument lets you specify your own MutableMapping, which might be an interface to some storage format or device (e.g. h5py). It’s okay if the container drops NumPy’s dtype and shape information, leaving raw bytes, since dtype and shape can be reconstituted from the ak.forms.NumpyForm.

The buffer_key and form_key arguments let you configure the names of the buffers added to the container and string labels on each Form node, so that the two can be uniquely matched later. buffer_key and form_key are distinct arguments to allow for more indirection (buffer keys can differ from Form keys, as long as there’s a way to map them to each other) and because some Form nodes, such as ak.forms.ListForm and ak.forms.UnionForm, have more than one attribute (starts and stops for ak.forms.ListForm and tags and index forak.forms.UnionForm).

Awkward 1.x also included partition numbers ("part0-", "part1-", …) in the buffer keys. In version 2.x onward, partitioning is handled externally by Dask, but partition numbers can be emulated by prepending a fixed "partN-"string to the buffer_key. The array represents exactly one partition.

Here is a simple example:

original = ak.Array([[1, 2, 3], [], [4, 5]]) form, length, container = ak.to_buffers(original) print(form) { "class": "ListOffsetArray", "offsets": "i64", "content": { "class": "NumpyArray", "primitive": "int64", "form_key": "node1" }, "form_key": "node0" } length 3 container {'node0-offsets': array([0, 3, 3, 5]), 'node1-data': array([1, 2, 3, 4, 5])}

which may be read back with

ak.from_buffers(form, length, container) <Array [[1, 2, 3], [], [4, 5]] type='3 * var * int64'>

If you intend to use this function for saving data, you may want to pack it first with ak.to_packed.

See also ak.from_buffers and ak.to_packed.