A generalized parallel algorithm for frequent itemset mining (original) (raw)

A parallel algorithm for finding the frequent itemsets in a set of transactions is presented. The frequent individual items are identified by their index. We assume that processors number (m) is less than the frequent items number (n). At the first stage, every processor Pi, i isin; {1, ...,m - 1} sequentially computes the frequent itemsets from the interval Ii = [(i - 1) cdot; p + 1, i cdot; p], where p = lfloor;n/mrfloor;. The processor Pm computes frequent itemsets from the interval Im = [(m - 1) cdot; p + 1, n]. In the second stage, the parallel algorithm is applied. The processor Pi computes, step by step, the sets FIi,Ij of the frequent itemsets with individual items from the intervals Ii,j = Ii∪Ii+1∪...∪Ij, j = i+1,...,m. In order to compute the set FIi,Ij, the processor Pi uses FIi,Ij-1 obtained in the previous step and FIi+1,Ij received from the processor Pi+1. The main advantage of our parallel algorithm is that it uses a communication pattern known before algorithm start,...