Better speedups using simpler parallel programming for graph connectivity and biconnectivity (original) (raw)
Speedups demonstrated for finding the biconnected components of a graph: 9x to 33x on the Explicit Multi-Threading (XMT) many-core computing platform relative to the best serial algorithm using a relatively modest silicon budget. Further evidence suggests that speedups of 21x to 48x are possible. For graph connectivity, we demonstrate that XMT outperforms two recent NVIDIA GPUs of similar or greater silicon area. Previous studies of parallel biconnectivity algorithms achieved at most a 4x speedup, but we could not find biconnectivity code for GPUs to compare biconnectivity against them. Ease-of-programming: The paper suggests that parallel programming for the XMT platform is considerably simpler than for the SMP and GPU ones. Unlike the quantitative speedup results, the ease-of-programming comparison is more qualitative. Productivity of parallel programming is a central interest of PMAM/PPoPP strongly favoring ease-ofprogramming. We believe that the discussion is on par with the state of the art on this relatively underexplored interest. The results provide new insights into the synergy between algorithms, the practice of parallel programming and architecture: (1) no single biconnectivity algorithm is dominant for all inputs; (2) XMT provides good performance for each algorithm and better speedups relative to other platforms; (3) the textbook (TV) PRAM algorithm was the only one that provided strong speedups on XMT across all inputs considered; and (4) the TV implementation was a direct implementation of a PRAM algorithm, though a nontrivial effort was needed to get a PRAM version with lower constant factors. Overall, it appears that previous low speedups on other platforms were not caused by inefficient algorithms or their programming. Instead, it is because of the better match between the algorithms and the XMT platform. Given the growing interest in adding architectural support for parallel programming to existing multi-cores, our results suggest the following open question: can such added architectural support catch up on speedups and ease-of-programming with Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.