Massimo Torquati | University of Pisa (original) (raw)

Uploads

Papers by Massimo Torquati

Research paper thumbnail of Analyzing FOSS license usage in publicly available software at scale via the SWH-analytics framework

˜The œJournal of supercomputing/Journal of supercomputing, Apr 6, 2024

Research paper thumbnail of Increasing Efficiency in Parallel Programming Teaching

The ability to teach parallel programming principles and techniques is becoming fundamental to pr... more The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Classical parallel programming courses leverage either low-level programming frameworks (e.g. those based on Pthreads) or higher level frameworks such as OpenMP or MPI. We discuss our teaching experience within the Master in "Computer Science and networking" where parallel programming is taught leveraging structured parallel programming principles and frameworks. The paper summarizes the results achieved in eight years of experience and shows how the adoption of a structured parallel programming approach improves the efficiency of the teaching process.

Research paper thumbnail of Structured Parallel Programming with “core” FastFlow

Lecture Notes in Computer Science, 2015

FastFlow is an open source, structured parallel programming framework originally conceived to sup... more FastFlow is an open source, structured parallel programming framework originally conceived to support highly efficient stream parallel computation while targeting shared memory multi cores. Its efficiency mainly comes from the optimized implementation of the base communication mechanisms and from its layered design. FastFlow eventually provides the parallel applications programmers with a set of ready-to-use, parametric algorithmic skeletons modeling the most common parallelism exploitation patterns. The algorithmic skeleton provided by FastFlow may be freely nested to model more and more complex parallelism exploitation patterns. This tutorial describes the “core” FastFlow, that is the set of skeletons supported since version 1.0 in FastFlow, and outlines the recent advances aimed at (i) introducing new, higher level skeletons and (ii) targeting networked multi cores, possibly equipped with GPUs, in addition to single multi/many core processing elements.

Research paper thumbnail of Structured parallel implementation of Tree Echo State Network model selection

Research paper thumbnail of A high level development, modeling and simulation methodology for complex multicore Network Processors

International Symposium on Performance Evaluation of Computer and Telecommunication Systems, Jul 13, 2009

... for processor management on a NP platform. The library also allows for testing andana-lyzing ... more ... for processor management on a NP platform. The library also allows for testing andana-lyzing the application code on a cluster of standard PCs, thanks to the ASSIST [9] environment. This approach results in a multi-phase ...

Research paper thumbnail of A RISC Building Block Set for Structured Parallel Programming

ABSTRACT We propose a set of building blocks (RISC-pb2l) suitable to build high-level structured ... more ABSTRACT We propose a set of building blocks (RISC-pb2l) suitable to build high-level structured parallel programming frameworks. The set is designed following a RISC approach. RISC-pb2l is architecture independent but the implementation of the different blocks may be specialized to make the best usage of the target architecture peculiarities. A number of optimizations may be designed transforming basic building blocks compositions into more efficient compositions, such that parallel application efficiency may be derived by construction rather than by debugging.

Research paper thumbnail of Targeting multi cores by structured programming and data flow

Università di Pisa eBooks, Sep 5, 2011

Data flow techniques have been around since the early ’70s when they were used in compilers for s... more Data flow techniques have been around since the early ’70s when they were used in compilers for sequential languages. Shortly after their intro- duction they were also considered as a possible model for parallel comput- ing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow “macro” instructions is left to the programmer, while the compiler/run time system manages only the ef- ficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the fea- sibility of the approach and the efficiency of the resulting “object” code on different classes of state-of-the-art multi-core architectures. The ex- perimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical parallel frameworks are also presented.

Research paper thumbnail of State access patterns in embarrassingly parallel computations

arXiv (Cornell University), Sep 16, 2016

Research paper thumbnail of Algorithmic Skeletons and Parallel Design Patterns in Mainstream Parallel Programming

International Journal of Parallel Programming, Nov 9, 2020

Research paper thumbnail of Finding parallel patterns through static analysis in C++ applications

International Journal of High Performance Computing Applications, Mar 9, 2017

Research paper thumbnail of Accelerating Stream Processing Queries with Congestion-aware Scheduling and Real-time Linux Threads

Proceedings of the 20th ACM International Conference on Computing Frontiers

Research paper thumbnail of Enabling semantics to improve detection of data races and misuses of lock‐free data structures

Concurrency and Computation: Practice and Experience, 2017

SummaryThe rapid progress of multi/many‐core architectures has caused data‐intensive parallel app... more SummaryThe rapid progress of multi/many‐core architectures has caused data‐intensive parallel applications not yet fully optimized to deliver the best performance. In the advent of concurrent programming, frameworks offering structured patterns have alleviated developers' burden adapting such applications to multithreaded architectures. While some of these patterns are implemented using synchronization primitives, others avoid them by means of lock‐free data mechanisms. However, lock‐free programming is not straightforward, ensuring an appropriate use of their interfaces can be challenging, since different memory models plus instruction reordering at compiler/processor levels can interfere in the occurrence of data races. The benefits of race detectors are formidable in this sense; however, they may emit false positives if are unaware of the underlying lock‐free structure semantics. To mitigate this issue, this paper extends ThreadSanitizer, a race detection tool, with the seman...

Research paper thumbnail of Introducing Parallelism by Using REPARA C++11 Attributes

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2016

Patterns provide a mechanism to express parallelism at a high level of abstraction and to make ea... more Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writing new parallel applications. In this paper we introduce the REPARA approach for expressing parallel patterns and transforming the source code to parallelism frameworks. We take advantage of C++11 attributes as a mechanism to introduce annotations and enrich semantic information on valid source code. We also present a methodology for performing transformation of source code that allows to target multiple parallel programming models. Another contribution is a rule based mechanism to transform annotated code to those specific programming models. The REPARA approach requires programmer intervention only to perform initial code annotation while providing speedups that are comparable to those obtained by manual parallelization.

Research paper thumbnail of Embedding Semantics of the Single-Producer/Single-Consumer Lock-Free Queue into a Race Detection Tool

Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016

Research paper thumbnail of Distributed-Memory FastFlow Building Blocks

International Journal of Parallel Programming, Dec 2, 2022

Research paper thumbnail of A Green Perspective on Structured Parallel Programming

Research paper thumbnail of Message Passing on InfiniBand RDMA for Parallel Run-Time Supports

Research paper thumbnail of Evaluating Concurrency Throttling and Thread Packing on SMT Multicores

Power-aware computing is gaining an increasing attention both in academic and industrial settings... more Power-aware computing is gaining an increasing attention both in academic and industrial settings. The problem of guaranteeing a given QoS requirement (either in terms of performance or power consumption) can be faced by selecting and dynamically adapting the amount of physical and logical resources used by the application. In this study, we considered standard multicore platforms by taking as a reference approaches for power-aware computing two well-known dynamic reconfiguration techniques: Concurrency Throttling and Thread Packing. Furthermore, we also studied the impact of using simultaneous multithreading (e.g., Intel's HyperThreading) in both techniques. In this work, leveraging on the applications of the PARSEC benchmark suite, we evaluate these techniques by considering performance-power trade-offs, resource efficiency, predictability and required programming effort. The results show that, according to the comparison criteria, these techniques complement each other.

Research paper thumbnail of Porting Decision Tree Algorithms to Multicore Using FastFlow

Lecture Notes in Computer Science, 2010

Research paper thumbnail of A Reconfiguration Algorithm for Power-Aware Parallel Applications

ACM Transactions on Architecture and Code Optimization, Dec 2, 2016

Research paper thumbnail of Analyzing FOSS license usage in publicly available software at scale via the SWH-analytics framework

˜The œJournal of supercomputing/Journal of supercomputing, Apr 6, 2024

Research paper thumbnail of Increasing Efficiency in Parallel Programming Teaching

The ability to teach parallel programming principles and techniques is becoming fundamental to pr... more The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Classical parallel programming courses leverage either low-level programming frameworks (e.g. those based on Pthreads) or higher level frameworks such as OpenMP or MPI. We discuss our teaching experience within the Master in "Computer Science and networking" where parallel programming is taught leveraging structured parallel programming principles and frameworks. The paper summarizes the results achieved in eight years of experience and shows how the adoption of a structured parallel programming approach improves the efficiency of the teaching process.

Research paper thumbnail of Structured Parallel Programming with “core” FastFlow

Lecture Notes in Computer Science, 2015

FastFlow is an open source, structured parallel programming framework originally conceived to sup... more FastFlow is an open source, structured parallel programming framework originally conceived to support highly efficient stream parallel computation while targeting shared memory multi cores. Its efficiency mainly comes from the optimized implementation of the base communication mechanisms and from its layered design. FastFlow eventually provides the parallel applications programmers with a set of ready-to-use, parametric algorithmic skeletons modeling the most common parallelism exploitation patterns. The algorithmic skeleton provided by FastFlow may be freely nested to model more and more complex parallelism exploitation patterns. This tutorial describes the “core” FastFlow, that is the set of skeletons supported since version 1.0 in FastFlow, and outlines the recent advances aimed at (i) introducing new, higher level skeletons and (ii) targeting networked multi cores, possibly equipped with GPUs, in addition to single multi/many core processing elements.

Research paper thumbnail of Structured parallel implementation of Tree Echo State Network model selection

Research paper thumbnail of A high level development, modeling and simulation methodology for complex multicore Network Processors

International Symposium on Performance Evaluation of Computer and Telecommunication Systems, Jul 13, 2009

... for processor management on a NP platform. The library also allows for testing andana-lyzing ... more ... for processor management on a NP platform. The library also allows for testing andana-lyzing the application code on a cluster of standard PCs, thanks to the ASSIST [9] environment. This approach results in a multi-phase ...

Research paper thumbnail of A RISC Building Block Set for Structured Parallel Programming

ABSTRACT We propose a set of building blocks (RISC-pb2l) suitable to build high-level structured ... more ABSTRACT We propose a set of building blocks (RISC-pb2l) suitable to build high-level structured parallel programming frameworks. The set is designed following a RISC approach. RISC-pb2l is architecture independent but the implementation of the different blocks may be specialized to make the best usage of the target architecture peculiarities. A number of optimizations may be designed transforming basic building blocks compositions into more efficient compositions, such that parallel application efficiency may be derived by construction rather than by debugging.

Research paper thumbnail of Targeting multi cores by structured programming and data flow

Università di Pisa eBooks, Sep 5, 2011

Data flow techniques have been around since the early ’70s when they were used in compilers for s... more Data flow techniques have been around since the early ’70s when they were used in compilers for sequential languages. Shortly after their intro- duction they were also considered as a possible model for parallel comput- ing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow “macro” instructions is left to the programmer, while the compiler/run time system manages only the ef- ficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the fea- sibility of the approach and the efficiency of the resulting “object” code on different classes of state-of-the-art multi-core architectures. The ex- perimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical parallel frameworks are also presented.

Research paper thumbnail of State access patterns in embarrassingly parallel computations

arXiv (Cornell University), Sep 16, 2016

Research paper thumbnail of Algorithmic Skeletons and Parallel Design Patterns in Mainstream Parallel Programming

International Journal of Parallel Programming, Nov 9, 2020

Research paper thumbnail of Finding parallel patterns through static analysis in C++ applications

International Journal of High Performance Computing Applications, Mar 9, 2017

Research paper thumbnail of Accelerating Stream Processing Queries with Congestion-aware Scheduling and Real-time Linux Threads

Proceedings of the 20th ACM International Conference on Computing Frontiers

Research paper thumbnail of Enabling semantics to improve detection of data races and misuses of lock‐free data structures

Concurrency and Computation: Practice and Experience, 2017

SummaryThe rapid progress of multi/many‐core architectures has caused data‐intensive parallel app... more SummaryThe rapid progress of multi/many‐core architectures has caused data‐intensive parallel applications not yet fully optimized to deliver the best performance. In the advent of concurrent programming, frameworks offering structured patterns have alleviated developers' burden adapting such applications to multithreaded architectures. While some of these patterns are implemented using synchronization primitives, others avoid them by means of lock‐free data mechanisms. However, lock‐free programming is not straightforward, ensuring an appropriate use of their interfaces can be challenging, since different memory models plus instruction reordering at compiler/processor levels can interfere in the occurrence of data races. The benefits of race detectors are formidable in this sense; however, they may emit false positives if are unaware of the underlying lock‐free structure semantics. To mitigate this issue, this paper extends ThreadSanitizer, a race detection tool, with the seman...

Research paper thumbnail of Introducing Parallelism by Using REPARA C++11 Attributes

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2016

Patterns provide a mechanism to express parallelism at a high level of abstraction and to make ea... more Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writing new parallel applications. In this paper we introduce the REPARA approach for expressing parallel patterns and transforming the source code to parallelism frameworks. We take advantage of C++11 attributes as a mechanism to introduce annotations and enrich semantic information on valid source code. We also present a methodology for performing transformation of source code that allows to target multiple parallel programming models. Another contribution is a rule based mechanism to transform annotated code to those specific programming models. The REPARA approach requires programmer intervention only to perform initial code annotation while providing speedups that are comparable to those obtained by manual parallelization.

Research paper thumbnail of Embedding Semantics of the Single-Producer/Single-Consumer Lock-Free Queue into a Race Detection Tool

Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016

Research paper thumbnail of Distributed-Memory FastFlow Building Blocks

International Journal of Parallel Programming, Dec 2, 2022

Research paper thumbnail of A Green Perspective on Structured Parallel Programming

Research paper thumbnail of Message Passing on InfiniBand RDMA for Parallel Run-Time Supports

Research paper thumbnail of Evaluating Concurrency Throttling and Thread Packing on SMT Multicores

Power-aware computing is gaining an increasing attention both in academic and industrial settings... more Power-aware computing is gaining an increasing attention both in academic and industrial settings. The problem of guaranteeing a given QoS requirement (either in terms of performance or power consumption) can be faced by selecting and dynamically adapting the amount of physical and logical resources used by the application. In this study, we considered standard multicore platforms by taking as a reference approaches for power-aware computing two well-known dynamic reconfiguration techniques: Concurrency Throttling and Thread Packing. Furthermore, we also studied the impact of using simultaneous multithreading (e.g., Intel's HyperThreading) in both techniques. In this work, leveraging on the applications of the PARSEC benchmark suite, we evaluate these techniques by considering performance-power trade-offs, resource efficiency, predictability and required programming effort. The results show that, according to the comparison criteria, these techniques complement each other.

Research paper thumbnail of Porting Decision Tree Algorithms to Multicore Using FastFlow

Lecture Notes in Computer Science, 2010

Research paper thumbnail of A Reconfiguration Algorithm for Power-Aware Parallel Applications

ACM Transactions on Architecture and Code Optimization, Dec 2, 2016