Samuel Thibault - Academia.edu (original) (raw)

Uploads

Papers by Samuel Thibault

Research paper thumbnail of EXA2PRO programming environment

Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation

Research paper thumbnail of Towards EXtreme scale technologies and accelerators for euROhpc hw/Sw supercomputing applications for exascale: The TEXTAROSSA approach

Microprocessors and Microsystems

Research paper thumbnail of MASA-StarPU: Parallel Sequence Comparison with Multiple Scheduling Policies and Pruning

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Research paper thumbnail of A roadmap for the Hurd?

Most people don't realize it, but the Hurd system is actually well established. About 75% of ... more Most people don't realize it, but the Hurd system is actually well established. About 75% of Debian official packages do build fine, it has mainstream gcc/glibc/llvm support, go and rust ports are ongoing, it can be installed with the Debian installer and GuixSD and Arch ports are ongoing... Yet not so much has been happening within the Hurd itself in the past couple of years. We have notably added a PCI arbiter, which allows for both flexible and safe PCI access for end users, and some basic ACPI support is ongoing. But many exciting features could be achieved with a bit of work. This talk will discuss some of these promising features, to give a sort of ideas roadmap for contributions. Some have implementation sketches which just need to be polished to be more production-ready, such as httpfs, mboxfs, or writing translators in more high-level languages than C. Other features are at early stage, such as adding sound support through rump, getting complete rid of disk drivers from...

Research paper thumbnail of Hurd's PCI arbiter

Research paper thumbnail of Reads

All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.

Research paper thumbnail of Software and Platforms - StarPU

Research paper thumbnail of Companion of the StarPU+SimGrid article

<p>This data set comprises traces and measurements obtained by Luka Stanisic to demonstrate... more <p>This data set comprises traces and measurements obtained by Luka Stanisic to demonstrate the validity of SimGrid simulations of the StarPU runtime.</p

Research paper thumbnail of A Compiler Algorithm to Guide Runtime Scheduling

Task-level parallelism is usually exploited by a runtime scheduler, after tasks are mapped to pro... more Task-level parallelism is usually exploited by a runtime scheduler, after tasks are mapped to processing units by a compiler. In this report, we propose a compilation-centric runtime scheduling strategy. We propose a complete compilation algorithm to split the tasks in three parts, whose properties are intended to help the scheduler to take the right decisions. In particular, we show how the polyhedral model may provide a precious help to compute tricky scheduling and parallelism informations. Our compiler is available and may be tried online at http://foobar. ens-lyon.fr/kut.

Research paper thumbnail of Diifférents types de virtualisation avec linux

Research paper thumbnail of Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

IEEE Transactions on Parallel and Distributed Systems, 2017

Research paper thumbnail of Ordonnancement de tâches indépendantes pour support d'exécution utilisant la localité des données

International audienceA now-classical way of meeting the increasing demand for computing speed by... more International audienceA now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are o...

Research paper thumbnail of Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs with Dynamic Runtime Systems

Research paper thumbnail of sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects

OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

Research paper thumbnail of Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry

2019 IEEE International Conference on Cluster Computing (CLUSTER), Sep 1, 2019

Research paper thumbnail of Scheduling dynamic OpenMP applications over multicore architectures

OpenMP in a New Era …, 2010

Page 1. Scheduling Dynamic OpenMP Applications over Multicore Architectures François Broquedis, F... more Page 1. Scheduling Dynamic OpenMP Applications over Multicore Architectures François Broquedis, François Diakhaté, Samuel Thibault, Olivier Aumage, Raymond Namyst, and Pierre-André Wacrenier INRIA Futurs - LaBRI — Université Bordeaux 1, France Abstract. ...

Research paper thumbnail of Locality-Aware Scheduling of Independent Tasks for Runtime Systems

A now-classical way of meeting the increasing demand for computing speed by HPC applications is t... more A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) ...

Research paper thumbnail of BrlAPI: Simple, Portable, Concurrent, Application-level Control of Braille Terminals

Screen readers can drive braille devices for allowing visually impaired users to access computer ... more Screen readers can drive braille devices for allowing visually impaired users to access computer environments, by providing them the same information as sighted users. But in some cases, this view is not easy to use on a braille device. In such cases, it would be much more useful to let applications provide their own braille feedback, specially adapted to visually impaired users. Such applications would then need the ability to output braille ; however, allowing both screen readers and applications access a wide panel of braille devices is not a trivial task. We present an abstraction layer that applications may use to communicate with braille devices. They do not need to deal with the specificities of each device, but can do so if necessary. We show how several applications can communicate with one braille device concurrently, with BrlAPI making sensible choices about which application eventually gets access to the device. The description of a widely used implementation of BrlAPI i...

Research paper thumbnail of Applying StarPU runtime system to scientific applications: Experiences and lessons learned

Task-based runtime systems are adopted by application developers for their valuable features incl... more Task-based runtime systems are adopted by application developers for their valuable features including flexibility of execution and optimized resource management. However, the use of such advanced programming models in complex HPC applications often requires significant training time and programming effort. In this work, we share experiences and lessons learned from the use of StarPU in three independent projects of various complexity. We reach conclusions, with respect to training, programming effort, and existing challenges, that are useful to the communities of application developers, as well as to the developers of runtime systems. Finally, we suggest extensions to the runtime systems beneficial to application developers.

Research paper thumbnail of To cite this version

fine grain parallelization framework for multi-core architecture

Research paper thumbnail of EXA2PRO programming environment

Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation

Research paper thumbnail of Towards EXtreme scale technologies and accelerators for euROhpc hw/Sw supercomputing applications for exascale: The TEXTAROSSA approach

Microprocessors and Microsystems

Research paper thumbnail of MASA-StarPU: Parallel Sequence Comparison with Multiple Scheduling Policies and Pruning

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Research paper thumbnail of A roadmap for the Hurd?

Most people don't realize it, but the Hurd system is actually well established. About 75% of ... more Most people don't realize it, but the Hurd system is actually well established. About 75% of Debian official packages do build fine, it has mainstream gcc/glibc/llvm support, go and rust ports are ongoing, it can be installed with the Debian installer and GuixSD and Arch ports are ongoing... Yet not so much has been happening within the Hurd itself in the past couple of years. We have notably added a PCI arbiter, which allows for both flexible and safe PCI access for end users, and some basic ACPI support is ongoing. But many exciting features could be achieved with a bit of work. This talk will discuss some of these promising features, to give a sort of ideas roadmap for contributions. Some have implementation sketches which just need to be polished to be more production-ready, such as httpfs, mboxfs, or writing translators in more high-level languages than C. Other features are at early stage, such as adding sound support through rump, getting complete rid of disk drivers from...

Research paper thumbnail of Hurd's PCI arbiter

Research paper thumbnail of Reads

All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.

Research paper thumbnail of Software and Platforms - StarPU

Research paper thumbnail of Companion of the StarPU+SimGrid article

<p>This data set comprises traces and measurements obtained by Luka Stanisic to demonstrate... more <p>This data set comprises traces and measurements obtained by Luka Stanisic to demonstrate the validity of SimGrid simulations of the StarPU runtime.</p

Research paper thumbnail of A Compiler Algorithm to Guide Runtime Scheduling

Task-level parallelism is usually exploited by a runtime scheduler, after tasks are mapped to pro... more Task-level parallelism is usually exploited by a runtime scheduler, after tasks are mapped to processing units by a compiler. In this report, we propose a compilation-centric runtime scheduling strategy. We propose a complete compilation algorithm to split the tasks in three parts, whose properties are intended to help the scheduler to take the right decisions. In particular, we show how the polyhedral model may provide a precious help to compute tricky scheduling and parallelism informations. Our compiler is available and may be tried online at http://foobar. ens-lyon.fr/kut.

Research paper thumbnail of Diifférents types de virtualisation avec linux

Research paper thumbnail of Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

IEEE Transactions on Parallel and Distributed Systems, 2017

Research paper thumbnail of Ordonnancement de tâches indépendantes pour support d'exécution utilisant la localité des données

International audienceA now-classical way of meeting the increasing demand for computing speed by... more International audienceA now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are o...

Research paper thumbnail of Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs with Dynamic Runtime Systems

Research paper thumbnail of sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects

OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

Research paper thumbnail of Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry

2019 IEEE International Conference on Cluster Computing (CLUSTER), Sep 1, 2019

Research paper thumbnail of Scheduling dynamic OpenMP applications over multicore architectures

OpenMP in a New Era …, 2010

Page 1. Scheduling Dynamic OpenMP Applications over Multicore Architectures François Broquedis, F... more Page 1. Scheduling Dynamic OpenMP Applications over Multicore Architectures François Broquedis, François Diakhaté, Samuel Thibault, Olivier Aumage, Raymond Namyst, and Pierre-André Wacrenier INRIA Futurs - LaBRI — Université Bordeaux 1, France Abstract. ...

Research paper thumbnail of Locality-Aware Scheduling of Independent Tasks for Runtime Systems

A now-classical way of meeting the increasing demand for computing speed by HPC applications is t... more A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) ...

Research paper thumbnail of BrlAPI: Simple, Portable, Concurrent, Application-level Control of Braille Terminals

Screen readers can drive braille devices for allowing visually impaired users to access computer ... more Screen readers can drive braille devices for allowing visually impaired users to access computer environments, by providing them the same information as sighted users. But in some cases, this view is not easy to use on a braille device. In such cases, it would be much more useful to let applications provide their own braille feedback, specially adapted to visually impaired users. Such applications would then need the ability to output braille ; however, allowing both screen readers and applications access a wide panel of braille devices is not a trivial task. We present an abstraction layer that applications may use to communicate with braille devices. They do not need to deal with the specificities of each device, but can do so if necessary. We show how several applications can communicate with one braille device concurrently, with BrlAPI making sensible choices about which application eventually gets access to the device. The description of a widely used implementation of BrlAPI i...

Research paper thumbnail of Applying StarPU runtime system to scientific applications: Experiences and lessons learned

Task-based runtime systems are adopted by application developers for their valuable features incl... more Task-based runtime systems are adopted by application developers for their valuable features including flexibility of execution and optimized resource management. However, the use of such advanced programming models in complex HPC applications often requires significant training time and programming effort. In this work, we share experiences and lessons learned from the use of StarPU in three independent projects of various complexity. We reach conclusions, with respect to training, programming effort, and existing challenges, that are useful to the communities of application developers, as well as to the developers of runtime systems. Finally, we suggest extensions to the runtime systems beneficial to application developers.

Research paper thumbnail of To cite this version

fine grain parallelization framework for multi-core architecture