Samuel Thibault - Academia.edu (original) (raw)
Uploads
Papers by Samuel Thibault
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation
Microprocessors and Microsystems
2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
Most people don't realize it, but the Hurd system is actually well established. About 75% of ... more Most people don't realize it, but the Hurd system is actually well established. About 75% of Debian official packages do build fine, it has mainstream gcc/glibc/llvm support, go and rust ports are ongoing, it can be installed with the Debian installer and GuixSD and Arch ports are ongoing... Yet not so much has been happening within the Hurd itself in the past couple of years. We have notably added a PCI arbiter, which allows for both flexible and safe PCI access for end users, and some basic ACPI support is ongoing. But many exciting features could be achieved with a bit of work. This talk will discuss some of these promising features, to give a sort of ideas roadmap for contributions. Some have implementation sketches which just need to be polished to be more production-ready, such as httpfs, mboxfs, or writing translators in more high-level languages than C. Other features are at early stage, such as adding sound support through rump, getting complete rid of disk drivers from...
All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
<p>This data set comprises traces and measurements obtained by Luka Stanisic to demonstrate... more <p>This data set comprises traces and measurements obtained by Luka Stanisic to demonstrate the validity of SimGrid simulations of the StarPU runtime.</p
Task-level parallelism is usually exploited by a runtime scheduler, after tasks are mapped to pro... more Task-level parallelism is usually exploited by a runtime scheduler, after tasks are mapped to processing units by a compiler. In this report, we propose a compilation-centric runtime scheduling strategy. We propose a complete compilation algorithm to split the tasks in three parts, whose properties are intended to help the scheduler to take the right decisions. In particular, we show how the polyhedral model may provide a precious help to compute tricky scheduling and parallelism informations. Our compiler is available and may be tried online at http://foobar. ens-lyon.fr/kut.
IEEE Transactions on Parallel and Distributed Systems, 2017
International audienceA now-classical way of meeting the increasing demand for computing speed by... more International audienceA now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are o...
OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020
2019 IEEE International Conference on Cluster Computing (CLUSTER), Sep 1, 2019
OpenMP in a New Era …, 2010
Page 1. Scheduling Dynamic OpenMP Applications over Multicore Architectures François Broquedis, F... more Page 1. Scheduling Dynamic OpenMP Applications over Multicore Architectures François Broquedis, François Diakhaté, Samuel Thibault, Olivier Aumage, Raymond Namyst, and Pierre-André Wacrenier INRIA Futurs - LaBRI Université Bordeaux 1, France Abstract. ...
A now-classical way of meeting the increasing demand for computing speed by HPC applications is t... more A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) ...
Screen readers can drive braille devices for allowing visually impaired users to access computer ... more Screen readers can drive braille devices for allowing visually impaired users to access computer environments, by providing them the same information as sighted users. But in some cases, this view is not easy to use on a braille device. In such cases, it would be much more useful to let applications provide their own braille feedback, specially adapted to visually impaired users. Such applications would then need the ability to output braille ; however, allowing both screen readers and applications access a wide panel of braille devices is not a trivial task. We present an abstraction layer that applications may use to communicate with braille devices. They do not need to deal with the specificities of each device, but can do so if necessary. We show how several applications can communicate with one braille device concurrently, with BrlAPI making sensible choices about which application eventually gets access to the device. The description of a widely used implementation of BrlAPI i...
Task-based runtime systems are adopted by application developers for their valuable features incl... more Task-based runtime systems are adopted by application developers for their valuable features including flexibility of execution and optimized resource management. However, the use of such advanced programming models in complex HPC applications often requires significant training time and programming effort. In this work, we share experiences and lessons learned from the use of StarPU in three independent projects of various complexity. We reach conclusions, with respect to training, programming effort, and existing challenges, that are useful to the communities of application developers, as well as to the developers of runtime systems. Finally, we suggest extensions to the runtime systems beneficial to application developers.
fine grain parallelization framework for multi-core architecture
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation
Microprocessors and Microsystems
2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
Most people don't realize it, but the Hurd system is actually well established. About 75% of ... more Most people don't realize it, but the Hurd system is actually well established. About 75% of Debian official packages do build fine, it has mainstream gcc/glibc/llvm support, go and rust ports are ongoing, it can be installed with the Debian installer and GuixSD and Arch ports are ongoing... Yet not so much has been happening within the Hurd itself in the past couple of years. We have notably added a PCI arbiter, which allows for both flexible and safe PCI access for end users, and some basic ACPI support is ongoing. But many exciting features could be achieved with a bit of work. This talk will discuss some of these promising features, to give a sort of ideas roadmap for contributions. Some have implementation sketches which just need to be polished to be more production-ready, such as httpfs, mboxfs, or writing translators in more high-level languages than C. Other features are at early stage, such as adding sound support through rump, getting complete rid of disk drivers from...
All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
<p>This data set comprises traces and measurements obtained by Luka Stanisic to demonstrate... more <p>This data set comprises traces and measurements obtained by Luka Stanisic to demonstrate the validity of SimGrid simulations of the StarPU runtime.</p
Task-level parallelism is usually exploited by a runtime scheduler, after tasks are mapped to pro... more Task-level parallelism is usually exploited by a runtime scheduler, after tasks are mapped to processing units by a compiler. In this report, we propose a compilation-centric runtime scheduling strategy. We propose a complete compilation algorithm to split the tasks in three parts, whose properties are intended to help the scheduler to take the right decisions. In particular, we show how the polyhedral model may provide a precious help to compute tricky scheduling and parallelism informations. Our compiler is available and may be tried online at http://foobar. ens-lyon.fr/kut.
IEEE Transactions on Parallel and Distributed Systems, 2017
International audienceA now-classical way of meeting the increasing demand for computing speed by... more International audienceA now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are o...
OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020
2019 IEEE International Conference on Cluster Computing (CLUSTER), Sep 1, 2019
OpenMP in a New Era …, 2010
Page 1. Scheduling Dynamic OpenMP Applications over Multicore Architectures François Broquedis, F... more Page 1. Scheduling Dynamic OpenMP Applications over Multicore Architectures François Broquedis, François Diakhaté, Samuel Thibault, Olivier Aumage, Raymond Namyst, and Pierre-André Wacrenier INRIA Futurs - LaBRI Université Bordeaux 1, France Abstract. ...
A now-classical way of meeting the increasing demand for computing speed by HPC applications is t... more A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) ...
Screen readers can drive braille devices for allowing visually impaired users to access computer ... more Screen readers can drive braille devices for allowing visually impaired users to access computer environments, by providing them the same information as sighted users. But in some cases, this view is not easy to use on a braille device. In such cases, it would be much more useful to let applications provide their own braille feedback, specially adapted to visually impaired users. Such applications would then need the ability to output braille ; however, allowing both screen readers and applications access a wide panel of braille devices is not a trivial task. We present an abstraction layer that applications may use to communicate with braille devices. They do not need to deal with the specificities of each device, but can do so if necessary. We show how several applications can communicate with one braille device concurrently, with BrlAPI making sensible choices about which application eventually gets access to the device. The description of a widely used implementation of BrlAPI i...
Task-based runtime systems are adopted by application developers for their valuable features incl... more Task-based runtime systems are adopted by application developers for their valuable features including flexibility of execution and optimized resource management. However, the use of such advanced programming models in complex HPC applications often requires significant training time and programming effort. In this work, we share experiences and lessons learned from the use of StarPU in three independent projects of various complexity. We reach conclusions, with respect to training, programming effort, and existing challenges, that are useful to the communities of application developers, as well as to the developers of runtime systems. Finally, we suggest extensions to the runtime systems beneficial to application developers.
fine grain parallelization framework for multi-core architecture