pyspark.RDD.reduce β€” PySpark 3.5.5 documentation (original) (raw)

RDD. reduce(f: Callable[[T, T], T]) β†’ T[source]ΒΆ

Reduces the elements of this RDD using the specified commutative and associative binary operator. Currently reduces partitions locally.

New in version 0.7.0.

Parameters

ffunction

the reduce function

Returns

T

the aggregated result

Examples

from operator import add sc.parallelize([1, 2, 3, 4, 5]).reduce(add) 15 sc.parallelize((2 for _ in range(10))).map(lambda x: 1).cache().reduce(add) 10 sc.parallelize([]).reduce(add) Traceback (most recent call last): ... ValueError: Can not reduce() empty RDD