pyspark.RDD.reduce β PySpark 3.5.5 documentation (original) (raw)
RDD.
reduce
(f: Callable[[T, T], T]) β T[source]ΒΆ
Reduces the elements of this RDD using the specified commutative and associative binary operator. Currently reduces partitions locally.
New in version 0.7.0.
Parameters
ffunction
the reduce function
Returns
T
the aggregated result
Examples
from operator import add sc.parallelize([1, 2, 3, 4, 5]).reduce(add) 15 sc.parallelize((2 for _ in range(10))).map(lambda x: 1).cache().reduce(add) 10 sc.parallelize([]).reduce(add) Traceback (most recent call last): ... ValueError: Can not reduce() empty RDD