ak.cartesian — Awkward Array 2.8.2 documentation (original) (raw)

Defined in awkward.operations.ak_cartesian on line 31.

ak.cartesian(arrays, axis=1, *, nested=None, parameters=None, with_name=None, highlevel=True, behavior=None, attrs=None)#

Parameters:

Computes a Cartesian product (i.e. cross product) of data from a set ofarrays. This operation creates records (if arrays is a dict) or tuples (if arrays is another kind of iterable) that hold the combinations of elements, and it can introduce new levels of nesting.

As a simple example with axis=0, the Cartesian product of

one = ak.Array([1, 2, 3]) two = ak.Array(["a", "b"])

is

ak.cartesian([one, two], axis=0).show() [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')]

With nesting, a new level of nested lists is created to group combinations that share the same element from one into the same list.

ak.cartesian([one, two], axis=0, nested=True).show() [[(1, 'a'), (1, 'b')], [(2, 'a'), (2, 'b')], [(3, 'a'), (3, 'b')]]

The primary purpose of this function, however, is to compute a different Cartesian product for each element of an array: in other words, axis=1. The following arrays each have four elements.

one = ak.Array([[1, 2, 3], [], [4, 5], [6]]) two = ak.Array([["a", "b"], ["c"], ["d"], ["e", "f"]])

The default axis=1 produces 6 pairs from the Cartesian product of[1, 2, 3] and ["a", "b"], 0 pairs from [] and ["c"], 1 pair from[4, 5] and ["d"], and 1 pair from [6] and ["e", "f"].

ak.cartesian([one, two]).show() [[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')], [], [(4, 'd'), (5, 'd')], [(6, 'e'), (6, 'f')]]

The nesting depth is the same as the original arrays; with nested=True, the nesting depth is increased by 1 and tuples are grouped by their first element.

ak.cartesian([one, two], nested=True).show() [[[(1, 'a'), (1, 'b')], [(2, 'a'), (2, ...)], [(3, 'a'), (3, 'b')]], [], [[(4, 'd')], [(5, 'd')]], [[(6, 'e'), (6, 'f')]]]

These tuples are ak.contents.RecordArray nodes with unnamed fields. To name the fields, we can pass one and two in a dict, rather than a list.

ak.cartesian({"x": one, "y": two}).show() [[{x: 1, y: 'a'}, {x: 1, y: 'b'}, {...}, ..., {x: 3, y: 'a'}, {x: 3, y: 'b'}], [], [{x: 4, y: 'd'}, {x: 5, y: 'd'}], [{x: 6, y: 'e'}, {x: 6, y: 'f'}]]

With more than two elements in the Cartesian product, nested can specify which are grouped and which are not. For example,

one = ak.Array([1, 2, 3, 4]) two = ak.Array([1.1, 2.2, 3.3]) three = ak.Array(["a", "b"])

can be left entirely ungrouped:

ak.cartesian([one, two, three], axis=0).show() [(1, 1.1, 'a'), (1, 1.1, 'b'), (1, 2.2, 'a'), (1, 2.2, 'b'), (1, 3.3, 'a'), (1, 3.3, 'b'), (2, 1.1, 'a'), (2, 1.1, 'b'), (2, 2.2, 'a'), (2, 2.2, 'b'), ..., (3, 2.2, 'b'), (3, 3.3, 'a'), (3, 3.3, 'b'), (4, 1.1, 'a'), (4, 1.1, 'b'), (4, 2.2, 'a'), (4, 2.2, 'b'), (4, 3.3, 'a'), (4, 3.3, 'b')]

can be grouped by one (adding 1 more dimension):

ak.cartesian([one, two, three], axis=0, nested=[0]).show() [[(1, 1.1, 'a'), (1, 1.1, 'b'), (1, 2.2, 'a')], [(1, 2.2, 'b'), (1, 3.3, 'a'), (1, 3.3, 'b')], [(2, 1.1, 'a'), (2, 1.1, 'b'), (2, 2.2, 'a')], [(2, 2.2, 'b'), (2, 3.3, 'a'), (2, 3.3, 'b')], [(3, 1.1, 'a'), (3, 1.1, 'b'), (3, 2.2, 'a')], [(3, 2.2, 'b'), (3, 3.3, 'a'), (3, 3.3, 'b')], [(4, 1.1, 'a'), (4, 1.1, 'b'), (4, 2.2, 'a')], [(4, 2.2, 'b'), (4, 3.3, 'a'), (4, 3.3, 'b')]]

can be grouped by one and two (adding 2 more dimensions):

ak.cartesian([one, two, three], axis=0, nested=[0, 1]).show() [[[(1, 1.1, 'a'), (1, 1.1, 'b')], [...], [(1, 3.3, 'a'), (1, 3.3, ...)]], [[(2, 1.1, 'a'), (2, 1.1, 'b')], [...], [(2, 3.3, 'a'), (2, 3.3, ...)]], [[(3, 1.1, 'a'), (3, 1.1, 'b')], [...], [(3, 3.3, 'a'), (3, 3.3, ...)]], [[(4, 1.1, 'a'), (4, 1.1, 'b')], [...], [(4, 3.3, 'a'), (4, 3.3, ...)]]]

or grouped by unique one-two pairs (adding 1 more dimension):

ak.cartesian([one, two, three], axis=0, nested=[1]).show() [[(1, 1.1, 'a'), (1, 1.1, 'b')], [(1, 2.2, 'a'), (1, 2.2, 'b')], [(1, 3.3, 'a'), (1, 3.3, 'b')], [(2, 1.1, 'a'), (2, 1.1, 'b')], [(2, 2.2, 'a'), (2, 2.2, 'b')], [(2, 3.3, 'a'), (2, 3.3, 'b')], [(3, 1.1, 'a'), (3, 1.1, 'b')], [(3, 2.2, 'a'), (3, 2.2, 'b')], [(3, 3.3, 'a'), (3, 3.3, 'b')], [(4, 1.1, 'a'), (4, 1.1, 'b')], [(4, 2.2, 'a'), (4, 2.2, 'b')], [(4, 3.3, 'a'), (4, 3.3, 'b')]]

The order of the output is fixed: it is always lexicographical in the order that the arrays are written.

To emulate an SQL or Pandas “group by” operation, put the keys that you wish to group by first and use nested=[0] or nested=[n] to group by unique n-tuples. If necessary, record keys can later be reordered with a list of strings in ak.Array.__getitem__.

To get list index positions in the tuples/records, rather than data from the original arrays, use ak.argcartesian instead of ak.cartesian. Theak.argcartesian form can be particularly useful as nested indexing inak.Array.__getitem__.