ak.combinations — Awkward Array 2.8.2 documentation (original) (raw)

Defined in awkward.operations.ak_combinations on line 21.

ak.combinations(array, n, *, replacement=False, axis=1, fields=None, parameters=None, with_name=None, highlevel=True, behavior=None, attrs=None)#

Parameters:

Computes a Cartesian product (i.e. cross product) of array with itself that is restricted to combinations sampled without replacement. If the normal Cartesian product is thought of as an n dimensional tensor, these represent the “upper triangle” of sets without repetition. Ifreplacement=True, the diagonal of this “upper triangle” is included.

As a simple example with axis=0, consider the following

array = ak.Array(["a", "b", "c", "d", "e"])

The combinations choose 2 are:

ak.combinations(array, 2, axis=0).show() [('a', 'b'), ('a', 'c'), ('a', 'd'), ('a', 'e'), ('b', 'c'), ('b', 'd'), ('b', 'e'), ('c', 'd'), ('c', 'e'), ('d', 'e')]

Including the diagonal allows pairs like ('a', 'a').

ak.combinations(array, 2, axis=0, replacement=True).show() [('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd'), ('a', 'e'), ('b', 'b'), ('b', 'c'), ('b', 'd'), ('b', 'e'), ('c', 'c'), ('c', 'd'), ('c', 'e'), ('d', 'd'), ('d', 'e'), ('e', 'e')]

The combinations choose 3 can’t be easily arranged as a triangle in two dimensions.

ak.combinations(array, 3, axis=0).show() [('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'b', 'e'), ('a', 'c', 'd'), ('a', 'c', 'e'), ('a', 'd', 'e'), ('b', 'c', 'd'), ('b', 'c', 'e'), ('b', 'd', 'e'), ('c', 'd', 'e')]

Including the (three-dimensional) diagonal allows triples like('a', 'a', 'a'), but also ('a', 'a', 'b'), ('a', 'b', 'b'), etc., but not ('a', 'b', 'a'). All combinations are in the same order as the original array.

ak.combinations(array, 3, axis=0, replacement=True).show() [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), ('a', 'a', 'd'), ('a', 'a', 'e'), ('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'b', 'e'), ('a', 'c', 'c'), ..., ('c', 'c', 'd'), ('c', 'c', 'e'), ('c', 'd', 'd'), ('c', 'd', 'e'), ('c', 'e', 'e'), ('d', 'd', 'd'), ('d', 'd', 'e'), ('d', 'e', 'e'), ('e', 'e', 'e')]

The primary purpose of this function, however, is to compute a different set of combinations for each element of an array: in other words, axis=1. The following has a different number of items in each element.

array = ak.Array([[1, 2, 3, 4], [], [5], [6, 7, 8]])

There are 6 ways to choose pairs from 4 elements, 0 ways to choose pairs from 0 elements, 0 ways to choose pairs from 1 element, and 3 ways to choose pairs from 3 elements.

ak.combinations(array, 2).show() [[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)], [], [], [(6, 7), (6, 8), (7, 8)]]

Note, however, that the combinatorics isn’t determined by equality of the data themselves, but by their placement in the array. For example, even if all elements of an array are equal, the output has the same structure.

same = ak.Array([[7, 7, 7, 7], [], [7], [7, 7, 7]]) ak.combinations(same, 2).show() [[(7, 7), (7, 7), (7, 7), (7, 7), (7, 7), (7, 7)], [], [], [(7, 7), (7, 7), (7, 7)]]

To get records instead of tuples, pass a set of field names to fields.

ak.combinations(array, 2, fields=["x", "y"]).show() [ [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 1, 'y': 4}, {'x': 2, 'y': 3}, {'x': 2, 'y': 4}, {'x': 3, 'y': 4}], [], [], [{'x': 6, 'y': 7}, {'x': 6, 'y': 8}, {'x': 7, 'y': 8}]]

This operation can be constructed from ak.argcartesian and other primitives:

left, right = ak.unzip(ak.argcartesian([array, array])) keep = left < right result = ak.zip([array[left][keep], array[right][keep]]) result.show() [ [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)], [], [], [(6, 7), (6, 8), (7, 8)]]

but it is frequently needed for data analysis, and the logic of which indexes to keep (above) gets increasingly complicated for large n.

To get list index positions in the tuples/records, rather than data from the original array, use ak.argcombinations instead of ak.combinations. The ak.argcombinations form can be particularly useful as nested indexing in ak.Array.__getitem__.