Massimo Torquati | University of Pisa (original) (raw)
Uploads
Papers by Massimo Torquati
The Journal of supercomputing/Journal of supercomputing, Apr 6, 2024
The ability to teach parallel programming principles and techniques is becoming fundamental to pr... more The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Classical parallel programming courses leverage either low-level programming frameworks (e.g. those based on Pthreads) or higher level frameworks such as OpenMP or MPI. We discuss our teaching experience within the Master in "Computer Science and networking" where parallel programming is taught leveraging structured parallel programming principles and frameworks. The paper summarizes the results achieved in eight years of experience and shows how the adoption of a structured parallel programming approach improves the efficiency of the teaching process.
Lecture Notes in Computer Science, 2015
FastFlow is an open source, structured parallel programming framework originally conceived to sup... more FastFlow is an open source, structured parallel programming framework originally conceived to support highly efficient stream parallel computation while targeting shared memory multi cores. Its efficiency mainly comes from the optimized implementation of the base communication mechanisms and from its layered design. FastFlow eventually provides the parallel applications programmers with a set of ready-to-use, parametric algorithmic skeletons modeling the most common parallelism exploitation patterns. The algorithmic skeleton provided by FastFlow may be freely nested to model more and more complex parallelism exploitation patterns. This tutorial describes the “core” FastFlow, that is the set of skeletons supported since version 1.0 in FastFlow, and outlines the recent advances aimed at (i) introducing new, higher level skeletons and (ii) targeting networked multi cores, possibly equipped with GPUs, in addition to single multi/many core processing elements.
International Symposium on Performance Evaluation of Computer and Telecommunication Systems, Jul 13, 2009
... for processor management on a NP platform. The library also allows for testing andana-lyzing ... more ... for processor management on a NP platform. The library also allows for testing andana-lyzing the application code on a cluster of standard PCs, thanks to the ASSIST [9] environment. This approach results in a multi-phase ...
ABSTRACT We propose a set of building blocks (RISC-pb2l) suitable to build high-level structured ... more ABSTRACT We propose a set of building blocks (RISC-pb2l) suitable to build high-level structured parallel programming frameworks. The set is designed following a RISC approach. RISC-pb2l is architecture independent but the implementation of the different blocks may be specialized to make the best usage of the target architecture peculiarities. A number of optimizations may be designed transforming basic building blocks compositions into more efficient compositions, such that parallel application efficiency may be derived by construction rather than by debugging.
Università di Pisa eBooks, Sep 5, 2011
Data flow techniques have been around since the early ’70s when they were used in compilers for s... more Data flow techniques have been around since the early ’70s when they were used in compilers for sequential languages. Shortly after their intro- duction they were also considered as a possible model for parallel comput- ing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow “macro” instructions is left to the programmer, while the compiler/run time system manages only the ef- ficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the fea- sibility of the approach and the efficiency of the resulting “object” code on different classes of state-of-the-art multi-core architectures. The ex- perimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical parallel frameworks are also presented.
arXiv (Cornell University), Sep 16, 2016
International Journal of Parallel Programming, Nov 9, 2020
International Journal of High Performance Computing Applications, Mar 9, 2017
Proceedings of the 20th ACM International Conference on Computing Frontiers
Concurrency and Computation: Practice and Experience, 2017
SummaryThe rapid progress of multi/many‐core architectures has caused data‐intensive parallel app... more SummaryThe rapid progress of multi/many‐core architectures has caused data‐intensive parallel applications not yet fully optimized to deliver the best performance. In the advent of concurrent programming, frameworks offering structured patterns have alleviated developers' burden adapting such applications to multithreaded architectures. While some of these patterns are implemented using synchronization primitives, others avoid them by means of lock‐free data mechanisms. However, lock‐free programming is not straightforward, ensuring an appropriate use of their interfaces can be challenging, since different memory models plus instruction reordering at compiler/processor levels can interfere in the occurrence of data races. The benefits of race detectors are formidable in this sense; however, they may emit false positives if are unaware of the underlying lock‐free structure semantics. To mitigate this issue, this paper extends ThreadSanitizer, a race detection tool, with the seman...
2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2016
Patterns provide a mechanism to express parallelism at a high level of abstraction and to make ea... more Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writing new parallel applications. In this paper we introduce the REPARA approach for expressing parallel patterns and transforming the source code to parallelism frameworks. We take advantage of C++11 attributes as a mechanism to introduce annotations and enrich semantic information on valid source code. We also present a methodology for performing transformation of source code that allows to target multiple parallel programming models. Another contribution is a rule based mechanism to transform annotated code to those specific programming models. The REPARA approach requires programmer intervention only to perform initial code annotation while providing speedups that are comparable to those obtained by manual parallelization.
Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016
International Journal of Parallel Programming, Dec 2, 2022
Power-aware computing is gaining an increasing attention both in academic and industrial settings... more Power-aware computing is gaining an increasing attention both in academic and industrial settings. The problem of guaranteeing a given QoS requirement (either in terms of performance or power consumption) can be faced by selecting and dynamically adapting the amount of physical and logical resources used by the application. In this study, we considered standard multicore platforms by taking as a reference approaches for power-aware computing two well-known dynamic reconfiguration techniques: Concurrency Throttling and Thread Packing. Furthermore, we also studied the impact of using simultaneous multithreading (e.g., Intel's HyperThreading) in both techniques. In this work, leveraging on the applications of the PARSEC benchmark suite, we evaluate these techniques by considering performance-power trade-offs, resource efficiency, predictability and required programming effort. The results show that, according to the comparison criteria, these techniques complement each other.
Lecture Notes in Computer Science, 2010
ACM Transactions on Architecture and Code Optimization, Dec 2, 2016
The Journal of supercomputing/Journal of supercomputing, Apr 6, 2024
The ability to teach parallel programming principles and techniques is becoming fundamental to pr... more The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Classical parallel programming courses leverage either low-level programming frameworks (e.g. those based on Pthreads) or higher level frameworks such as OpenMP or MPI. We discuss our teaching experience within the Master in "Computer Science and networking" where parallel programming is taught leveraging structured parallel programming principles and frameworks. The paper summarizes the results achieved in eight years of experience and shows how the adoption of a structured parallel programming approach improves the efficiency of the teaching process.
Lecture Notes in Computer Science, 2015
FastFlow is an open source, structured parallel programming framework originally conceived to sup... more FastFlow is an open source, structured parallel programming framework originally conceived to support highly efficient stream parallel computation while targeting shared memory multi cores. Its efficiency mainly comes from the optimized implementation of the base communication mechanisms and from its layered design. FastFlow eventually provides the parallel applications programmers with a set of ready-to-use, parametric algorithmic skeletons modeling the most common parallelism exploitation patterns. The algorithmic skeleton provided by FastFlow may be freely nested to model more and more complex parallelism exploitation patterns. This tutorial describes the “core” FastFlow, that is the set of skeletons supported since version 1.0 in FastFlow, and outlines the recent advances aimed at (i) introducing new, higher level skeletons and (ii) targeting networked multi cores, possibly equipped with GPUs, in addition to single multi/many core processing elements.
International Symposium on Performance Evaluation of Computer and Telecommunication Systems, Jul 13, 2009
... for processor management on a NP platform. The library also allows for testing andana-lyzing ... more ... for processor management on a NP platform. The library also allows for testing andana-lyzing the application code on a cluster of standard PCs, thanks to the ASSIST [9] environment. This approach results in a multi-phase ...
ABSTRACT We propose a set of building blocks (RISC-pb2l) suitable to build high-level structured ... more ABSTRACT We propose a set of building blocks (RISC-pb2l) suitable to build high-level structured parallel programming frameworks. The set is designed following a RISC approach. RISC-pb2l is architecture independent but the implementation of the different blocks may be specialized to make the best usage of the target architecture peculiarities. A number of optimizations may be designed transforming basic building blocks compositions into more efficient compositions, such that parallel application efficiency may be derived by construction rather than by debugging.
Università di Pisa eBooks, Sep 5, 2011
Data flow techniques have been around since the early ’70s when they were used in compilers for s... more Data flow techniques have been around since the early ’70s when they were used in compilers for sequential languages. Shortly after their intro- duction they were also considered as a possible model for parallel comput- ing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow “macro” instructions is left to the programmer, while the compiler/run time system manages only the ef- ficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the fea- sibility of the approach and the efficiency of the resulting “object” code on different classes of state-of-the-art multi-core architectures. The ex- perimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical parallel frameworks are also presented.
arXiv (Cornell University), Sep 16, 2016
International Journal of Parallel Programming, Nov 9, 2020
International Journal of High Performance Computing Applications, Mar 9, 2017
Proceedings of the 20th ACM International Conference on Computing Frontiers
Concurrency and Computation: Practice and Experience, 2017
SummaryThe rapid progress of multi/many‐core architectures has caused data‐intensive parallel app... more SummaryThe rapid progress of multi/many‐core architectures has caused data‐intensive parallel applications not yet fully optimized to deliver the best performance. In the advent of concurrent programming, frameworks offering structured patterns have alleviated developers' burden adapting such applications to multithreaded architectures. While some of these patterns are implemented using synchronization primitives, others avoid them by means of lock‐free data mechanisms. However, lock‐free programming is not straightforward, ensuring an appropriate use of their interfaces can be challenging, since different memory models plus instruction reordering at compiler/processor levels can interfere in the occurrence of data races. The benefits of race detectors are formidable in this sense; however, they may emit false positives if are unaware of the underlying lock‐free structure semantics. To mitigate this issue, this paper extends ThreadSanitizer, a race detection tool, with the seman...
2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2016
Patterns provide a mechanism to express parallelism at a high level of abstraction and to make ea... more Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writing new parallel applications. In this paper we introduce the REPARA approach for expressing parallel patterns and transforming the source code to parallelism frameworks. We take advantage of C++11 attributes as a mechanism to introduce annotations and enrich semantic information on valid source code. We also present a methodology for performing transformation of source code that allows to target multiple parallel programming models. Another contribution is a rule based mechanism to transform annotated code to those specific programming models. The REPARA approach requires programmer intervention only to perform initial code annotation while providing speedups that are comparable to those obtained by manual parallelization.
Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016
International Journal of Parallel Programming, Dec 2, 2022
Power-aware computing is gaining an increasing attention both in academic and industrial settings... more Power-aware computing is gaining an increasing attention both in academic and industrial settings. The problem of guaranteeing a given QoS requirement (either in terms of performance or power consumption) can be faced by selecting and dynamically adapting the amount of physical and logical resources used by the application. In this study, we considered standard multicore platforms by taking as a reference approaches for power-aware computing two well-known dynamic reconfiguration techniques: Concurrency Throttling and Thread Packing. Furthermore, we also studied the impact of using simultaneous multithreading (e.g., Intel's HyperThreading) in both techniques. In this work, leveraging on the applications of the PARSEC benchmark suite, we evaluate these techniques by considering performance-power trade-offs, resource efficiency, predictability and required programming effort. The results show that, according to the comparison criteria, these techniques complement each other.
Lecture Notes in Computer Science, 2010
ACM Transactions on Architecture and Code Optimization, Dec 2, 2016