pyspark.RDD.reduce — PySpark 3.5.5 documentation (original) (raw)

RDD. reduce(f: Callable[[T, T], T]) → T[source]¶

Reduces the elements of this RDD using the specified commutative and associative binary operator. Currently reduces partitions locally.

New in version 0.7.0.

Parameters

ffunction

the reduce function

Returns

the aggregated result

Examples

from operator import add sc.parallelize([1, 2, 3, 4, 5]).reduce(add) 15 sc.parallelize((2 for _ in range(10))).map(lambda x: 1).cache().reduce(add) 10 sc.parallelize([]).reduce(add) Traceback (most recent call last): ... ValueError: Can not reduce() empty RDD