Jean-philippe Diguet - Academia.edu (original) (raw)
Papers by Jean-philippe Diguet
Proceedings of the 55th Annual Design Automation Conference
Parallel applications are essential for efficiently using the computational power of a Multiproce... more Parallel applications are essential for efficiently using the computational power of a Multiprocessor System-on-Chip (MPSoC). Unfortunately, these applications do not scale effortlessly with the number of cores because of synchronization operations that take away valuable computational time and restrict the parallelization gains. Moreover, synchronization is also a bottleneck due to sequential access to shared memory. We address this issue and introduce "Subutai", a hardware/software (HW/SW) architecture designed to distribute essential synchronization mechanisms over the Network-on-Chip (NoC). It includes Network Interfaces (NIs), drivers and a custom library of a NoC-based MP-SoC architecture that speeds up the essential synchronization primitives of any legacy parallel application. Besides, we provide a fast simulation tool for parallel applications and a HW architecture of the NI. Experimental results with PARSEC benchmark show an average application speedup of 2.05 compared to the same architecture running legacy SW solutions for 36% overhead of HW architecture.
2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), 2017
Implementing self-adaptive embedded systems, such as UV, involves an offline provisioning of the ... more Implementing self-adaptive embedded systems, such as UV, involves an offline provisioning of the several implementations of the embedded functionalities with different characteristics in resource usage and performance in order for the system to dynamically adapt itself under uncertainties. FPGAbased architectures offer for support for high flexibility with dynamic reconfiguration features. We propose an autonomic control architecture for self-adaptive and self-reconfigurable FPGA-based embedded systems. The control architecture is structured in three layers: a mission manager, a reconfiguration manager and a scheduling manager. In this paper we focus on the design of the reconfiguration manager. We propose a design approach using automata-based discrete control. It involves reactive programming that provides formal semantics, and discrete controller synthesis from declarative objectives.
Journal of Signal Processing Systems, 2022
Dataflow is a parallel and generic model of computation that is agnostic of the underlying
Annals of Telecommunications, 2022
The growing demand for wireless devices capable of performing complex communication processes has... more The growing demand for wireless devices capable of performing complex communication processes has imposed an urgent need for high-speed communication systems and advanced network processors. This paper proposes a hardware workflow developed for the Long Term Evolution (LTE) communication system. It studies the Multiple-input, multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) LTE system. Specifically, this work focuses on the implementation of the OFDM block that dominates the execution time in high-speed communication systems. To achieve this goal, we have proposed an NoC-based low-latency OFDM LTE multicore system that leverages Inverse Fast Fourier Transform (IFFT) parallel computation on a variable number of processing cores. The proposed multicore system is implemented on an FPGA platform using the ProNoC tool, an automated rapid prototyping platform. Our obtained results show that LTE OFDM execution time is drastically reduced by increasing the number of processing cores. Nevertheless, the NoC's parameters, such as routing algorithm and topology, have a negligible influence on the overall execution time. The implementation results show up to 24% and 76% execution time reduction for a system having 2 and 16 processing cores compared to conventional LTE OFDM implemented in a single-core, respectively. We have found that a 4×4 Mesh NoC with XY deterministic routing connected to 16 processing tiles computing IFFT task is the most efficient configuration for computing LTE OFDM. This configuration is 4.12 times faster than a conventional system running on a single-core processor.
2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2017
An ever larger share of FPGAs are supporting Dynamic and Partial Reconfiguration (DPR). A reconfi... more An ever larger share of FPGAs are supporting Dynamic and Partial Reconfiguration (DPR). A reconfigurable point-to-point interconnect (ρ-P2P) is a communication mechanism based on DPR that swaps between different precomputed configurations stored in partial bitstreams. ρ-Point-to-Point (P2P) is intended as a lightweight interconnect that suits the reconfigurable systems where a limited number of configurations are desirable. This paper assesses the pros and cons of ρ-P2P in terms of resource and performance depending on the number of input/output signals, their width and the number of supported configurations. Experimental results, conducted on an Intel Cyclone V FPGA, compare ρ-P2P to an equivalently functional non-DPR solution called µ-P2P and to a full crossbar. They show that ρ-P2P is indeed lightweight but introduces performance limitations on operating frequency, memory footprint and reconfiguration time. However, ρ-P2P is in general the least resource intensive of the tested interconnects, except in the trivial case of low numbers of signals and configurations. In particular, an 18 × 18 full crossbar interconnect requires 75% more resources than an equivalent ρ-P2P. Interestingly, this resource difference between ρ-P2P and a full crossbar grows linearly with the interconnect size.
2018 International Conference on High Performance Computing & Simulation (HPCS), 2018
in the field of Autonomic Computing. He currently works on the modelbased control of adaptive and... more in the field of Autonomic Computing. He currently works on the modelbased control of adaptive and reconfigurable computing systems, using techniques from Control Theory, and ranging from embedded systems, to Cloud distributed systems and High-Performance Computing.
2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2021
Allocation and scheduling of applications affect the timing response and system performance, part... more Allocation and scheduling of applications affect the timing response and system performance, particularly for Network-on-Chip (NoC) based multicore systems executing realtime applications. These systems with multitasking processors provide improved opportunity for parallel application execution. In dynamic scenarios, runtime task allocation improves the system resource utilization and adapts to varying application workload. In this work, we present an efficient hybrid strategy for unified allocation and scheduling of tasks at runtime. By considering multitasking capability of processors, communication cost and task timing characteristics, potential allocation solutions are obtained at design-time. These are adapted for dynamic mapping and scheduling of computation and communication workloads of real-time applications. Simulation results show that the proposed approach achieves 34.2% and 26% average reduction in network latency and communication cost of the allocated applications. Also, the deadline satisfaction of the tasks improves on average by 42.1% while reducing the allocation-time overhead by 32% when compared with existing techniques.
2018 31st IEEE International System-on-Chip Conference (SOCC), 2018
Efficient synchronization is one of the basic requirements of effective parallel computing. A key... more Efficient synchronization is one of the basic requirements of effective parallel computing. A key operation of the POSIX Thread standard (PThread) is barrier synchronization, where multiple threads block on a userspecified point of execution until all of them have reached it. Conventional architectures for broadcast operations limit the achievable performance benefits as synchronization is significantly affected due to critical path communications. This increases the network latency and degrades the performance dramatically. A Wireless Network-on-Chip (WiNoC) offers a promising solution to reduce the long distance/critical path communication bottlenecks of conventional architectures by augmenting them with single hop, long-range wireless links. In this paper, we propose a power-aware broadcast enabled WiNoC architecture to reduce the cost of broadcast operations for barrier-based applications. The proposed architecture reduces the barrier synchronization cost up to 43.97% regarding network latency under the PARSEC benchmarks. It also saves up to 80.49% idle-state power consumption in WIs for a 64-core system compared with the conventional WiNoC architecture without incurring significant overhead.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020
In this paper, we present a novel logic design style, namely memristor overwrite logic (MOL), ass... more In this paper, we present a novel logic design style, namely memristor overwrite logic (MOL), associated with an original MOL-based computational memory. MOL relies on a fully digital representation of memristor and can operate with different memristive device technologies. Its integration in memristive crossbar arrays and computational memories allows the execution of bit and vector-level primitive logic operations in two computational steps at most. Promising features and performances are demonstrated through the implementation of N-bit full addition using the proposed MOL-based computational memory.
ACM Transactions on Embedded Computing Systems, 2016
Dynamically reconfigurable hardware has been identified as a promising solution for the design of... more Dynamically reconfigurable hardware has been identified as a promising solution for the design of energy-efficient embedded systems. However, its adoption is limited by costly design effort, including verification and validation, which is even more complex than for nondynamically reconfigurable systems. In this article, we propose a tool-supported formal method to automatically design a correct-by-construction control of the reconfiguration. By representing system behaviors with automata, we exploit automated algorithms to synthesize controllers that safely enforce reconfiguration strategies formulated as properties to be satisfied by control. We design generic modeling patterns for a class of reconfigurable architectures, taking into account both hardware architecture and applications, as well as relevant control objectives. We validate our approach on two case studies implemented on FPGAs.
ACM Transactions on Embedded Computing Systems, 2016
This article describes TBES, a software end-to-end environment for synthesizing multitask applica... more This article describes TBES, a software end-to-end environment for synthesizing multitask applications on FPGAs. The implementation follows a template-based approach for creating heterogeneous multiprocessor architectures. Heterogeneity stems from the use of general-purpose processors along with custom accelerators. Experimental results demonstrate substantial speedup for several classes of applications. Furthermore, this work allows for reducing development costs and saving development time for the software architect, the domain expert, and the optimization expert. This work provides a framework to bring together various existing tools and optimisation algorithms. The advantages are manifold: modularity and flexibility, easy customization for best-fit algorithm selection, durability and evolution over time, and legacy preservation including domain experts' know-how. In addition to the use of architecture templates for the overall system, a second contribution lies in using high...
This paper introduces the μSpider CAD tool for NoC design under latency and bandwidth constraints... more This paper introduces the μSpider CAD tool for NoC design under latency and bandwidth constraints and describes the different steps of the associated design flow. We show how the tool can be used to automatically generate a NOC IP compliant with Xilinx EDK tool. We present synthesis results and a real implementation of a video application based on a multi-processor architecture. Finally we conclude about research to be done at application/OS levels above current work to achieve a complete and efficient implementation of a multi-processor embedded system.
EURASIP Journal on Embedded Systems, 2015
This paper addresses the design of embedded systems for outdoor augmented reality (AR) applicatio... more This paper addresses the design of embedded systems for outdoor augmented reality (AR) applications integrated to see-through glasses. The set of tasks includes object positioning, graphic computation, as well as wireless communications, and we consider constraints such as real-time, low power, and low footprint. We introduce an original sailor assistance application, as a typical, useful, and complex outdoor AR application, where context-dependent virtual objects must be placed in the user field of view according to head motions and ambient information. Our study demonstrates that it is worth working on power optimization, since the embedded system based on a standard general-purpose processor (GPP) + graphics processing unit (GPU) consumes more than high-luminosity see-through glasses. This work presents then three main contributions, the first one is the choice and combinations of position and attitude algorithms that fit with the application context. The second one is the architecture of the embedded system, where it is introduced as a fast and simple object processor (OP) optimized for the domain of mobile AR. Finally, the OP implements a new pixel rendering method (incremental pixel shader (IPS)), which is implemented in hardware and takes full advantage of OpenGL ES light model. A GP+OP(s) complete architecture is described and prototyped on field programmable gate-array (FPGA). It includes hardware/software partitioning based on the analysis of application requirements and ergonomics.
IEEE Embedded Systems Letters, 2013
This paper presents an original approach of bandwidth-oriented self-adaptivity in the domain of N... more This paper presents an original approach of bandwidth-oriented self-adaptivity in the domain of Networkon-Chip, where reconfiguration is handled by network interfaces offering traffic with guarantee of service. Reconfiguration is first based on multiple FIFOs with variables bounds and implemented in a single dual-port memory with a dedicated controller. Secondly, it relies on multiple and compliant TDMA tables based on a new heuristic for path computation. Combination of both techniques provide significant bandwidth improvement with a negligible resource overhead. The proposed solution is demonstrated with cycle-accurate VHDL simulation and FPGA implementation for synthetic and image processing applications.
2008 International Conference on Field-Programmable Technology, 2008
The constrained operating environments of many FPGAbased embedded systems require flexible securi... more The constrained operating environments of many FPGAbased embedded systems require flexible security that can be configured to minimize the impact on FPGA area and power consumption. In this paper, a security approach for external memory in FPGA-based embedded systems that exploits FPGA configurability is presented. Our FPGA-based security core provides both confidentiality and integrity for data stored externally to an FPGA which is accessed by a processor on the FPGA chip. The benefits of our security core are demonstrated using four embedded applications implemented on a Stratix II device. Each application requires a collection of tasks with varying memory security requirements. Our security core is used in conjunction with a NIOS II soft processor running the MicroC/OS II operating system. An average memory and energy savings of about 64% and 16%, respectively, is achieved for the four applications versus a non-configurable, uniform security approach.
The emergence of multiple wireless standards is introducing the need of flexible platforms which ... more The emergence of multiple wireless standards is introducing the need of flexible platforms which are able to selfadapt to various environments depending on the application requirements. Our work lies in the domain of self-adaptive heterogeneous multiprocessor architectures. In this paper, we present our ideas about the management of an ASIP-based multistandards iterative receiver, which includes the support for turbo-decoding. In this context, the management of a multistandards receiver provides the services for the self-adaptation mechanisms based on a collect and an analysis of information, a decision making process and a fast reconfiguration of the platform.
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific re... more HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
-La multiplication des standards de communication sans fils introduit le besoin de récepteurs mul... more -La multiplication des standards de communication sans fils introduit le besoin de récepteurs multi-standards reconfigurables. Afin d'adresser cette problématique et faire face à la demande croissante en débit des applications sur les terminaux mobiles, des architectures multi-ASIP ont été développées ces dernières années. De plus, l'évolution dynamique des paramètres de communication ainsi que la réduction du délai entre deux trames de données imposent la mise en oeuvre de mécanismes de reconfiguration optimisés. Dans ce contexte, nous proposons d'optimiser un turbo-décodeur multi-ASIP multi-mode et multi-standard dans un but d'optimisation globale des mécanismes de reconfiguration. Les résultats présentés montrent que les optimisations apportées à l'ASIP engendrent un faible surcoût en surface (0.004 mm 2 pour une technologie CMOS 65 nm) et permettent de réduire significativement la quantité de données nécessaire à la reconfiguration de la plateforme. En effet, pour une plateforme implémentant 8 ASIP, la quantité de données devant être diffusée aux ASIP lors d'un changement de configuration est divisée par 10 grâce aux optimisations proposées couplées à une infrastructure de configuration efficace.
This paper targets the autonomic management of dynamically partially reconfigurable hardware arch... more This paper targets the autonomic management of dynamically partially reconfigurable hardware architectures based on FPGAs. Such hardware-level autonomic computing has been less often studied than at software-level. We consider control techniques to model the considered behaviours of the computing system and derive a controller for the control objective enforcement. Discrete Control modelled with Labelled Transition Systems is employed in this paper. Such models are amenable to Discrete Controller Synthesis algorithms that can automatically generate a controller enforcing the correct behaviours of a controlled system. A general modelling framework is proposed for the control of FPGA based computing systems. We consider system application described as task graphs and FPGA as a set of reconfigurable areas that can be dynamically partially reconfigured to execute tasks. We encode the computation of an autonomic manager as a DCS problem w.r.t. multiple constraints and objectives e.g., mutual exclusion of resource uses, power cost minimization. We validate our models and manager computations by using the BZR language and an experimental demonstrator implemented on a Xilinx FPGA platform.
7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC), 2012
Partial and dynamic reconfiguration provides a relevant new dimension to design efficient paralle... more Partial and dynamic reconfiguration provides a relevant new dimension to design efficient parallel embedded systems. However, due to the encasing complexity of such systems, ensuring the consistency and parallelism management at runtime is still a key challenge. So architecture models and design methodology are required to allow for efficient component reuse and hardware reconfiguration management.This paper presents a distributed persistence management model and its implementation for reconfigurable multiprocessor systems on dynamically reconfigurable circuits. The proposed approach is inspired from the well-known component based models used in software applications development. Our model is based on membranes wrapping the systems components. The objective is to improve design productivity and ensure consistency by managing context switching and storage using modular distributed hardware controllers. These membranes are distributed and optimized with the aim to design self-adaptive systems by allowing dynamic changes in parallelism degree and contexts migration. Simulation and synthesis results are given to show performances and effectiveness of our methodology.
Proceedings of the 55th Annual Design Automation Conference
Parallel applications are essential for efficiently using the computational power of a Multiproce... more Parallel applications are essential for efficiently using the computational power of a Multiprocessor System-on-Chip (MPSoC). Unfortunately, these applications do not scale effortlessly with the number of cores because of synchronization operations that take away valuable computational time and restrict the parallelization gains. Moreover, synchronization is also a bottleneck due to sequential access to shared memory. We address this issue and introduce "Subutai", a hardware/software (HW/SW) architecture designed to distribute essential synchronization mechanisms over the Network-on-Chip (NoC). It includes Network Interfaces (NIs), drivers and a custom library of a NoC-based MP-SoC architecture that speeds up the essential synchronization primitives of any legacy parallel application. Besides, we provide a fast simulation tool for parallel applications and a HW architecture of the NI. Experimental results with PARSEC benchmark show an average application speedup of 2.05 compared to the same architecture running legacy SW solutions for 36% overhead of HW architecture.
2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), 2017
Implementing self-adaptive embedded systems, such as UV, involves an offline provisioning of the ... more Implementing self-adaptive embedded systems, such as UV, involves an offline provisioning of the several implementations of the embedded functionalities with different characteristics in resource usage and performance in order for the system to dynamically adapt itself under uncertainties. FPGAbased architectures offer for support for high flexibility with dynamic reconfiguration features. We propose an autonomic control architecture for self-adaptive and self-reconfigurable FPGA-based embedded systems. The control architecture is structured in three layers: a mission manager, a reconfiguration manager and a scheduling manager. In this paper we focus on the design of the reconfiguration manager. We propose a design approach using automata-based discrete control. It involves reactive programming that provides formal semantics, and discrete controller synthesis from declarative objectives.
Journal of Signal Processing Systems, 2022
Dataflow is a parallel and generic model of computation that is agnostic of the underlying
Annals of Telecommunications, 2022
The growing demand for wireless devices capable of performing complex communication processes has... more The growing demand for wireless devices capable of performing complex communication processes has imposed an urgent need for high-speed communication systems and advanced network processors. This paper proposes a hardware workflow developed for the Long Term Evolution (LTE) communication system. It studies the Multiple-input, multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) LTE system. Specifically, this work focuses on the implementation of the OFDM block that dominates the execution time in high-speed communication systems. To achieve this goal, we have proposed an NoC-based low-latency OFDM LTE multicore system that leverages Inverse Fast Fourier Transform (IFFT) parallel computation on a variable number of processing cores. The proposed multicore system is implemented on an FPGA platform using the ProNoC tool, an automated rapid prototyping platform. Our obtained results show that LTE OFDM execution time is drastically reduced by increasing the number of processing cores. Nevertheless, the NoC's parameters, such as routing algorithm and topology, have a negligible influence on the overall execution time. The implementation results show up to 24% and 76% execution time reduction for a system having 2 and 16 processing cores compared to conventional LTE OFDM implemented in a single-core, respectively. We have found that a 4×4 Mesh NoC with XY deterministic routing connected to 16 processing tiles computing IFFT task is the most efficient configuration for computing LTE OFDM. This configuration is 4.12 times faster than a conventional system running on a single-core processor.
2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2017
An ever larger share of FPGAs are supporting Dynamic and Partial Reconfiguration (DPR). A reconfi... more An ever larger share of FPGAs are supporting Dynamic and Partial Reconfiguration (DPR). A reconfigurable point-to-point interconnect (ρ-P2P) is a communication mechanism based on DPR that swaps between different precomputed configurations stored in partial bitstreams. ρ-Point-to-Point (P2P) is intended as a lightweight interconnect that suits the reconfigurable systems where a limited number of configurations are desirable. This paper assesses the pros and cons of ρ-P2P in terms of resource and performance depending on the number of input/output signals, their width and the number of supported configurations. Experimental results, conducted on an Intel Cyclone V FPGA, compare ρ-P2P to an equivalently functional non-DPR solution called µ-P2P and to a full crossbar. They show that ρ-P2P is indeed lightweight but introduces performance limitations on operating frequency, memory footprint and reconfiguration time. However, ρ-P2P is in general the least resource intensive of the tested interconnects, except in the trivial case of low numbers of signals and configurations. In particular, an 18 × 18 full crossbar interconnect requires 75% more resources than an equivalent ρ-P2P. Interestingly, this resource difference between ρ-P2P and a full crossbar grows linearly with the interconnect size.
2018 International Conference on High Performance Computing & Simulation (HPCS), 2018
in the field of Autonomic Computing. He currently works on the modelbased control of adaptive and... more in the field of Autonomic Computing. He currently works on the modelbased control of adaptive and reconfigurable computing systems, using techniques from Control Theory, and ranging from embedded systems, to Cloud distributed systems and High-Performance Computing.
2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2021
Allocation and scheduling of applications affect the timing response and system performance, part... more Allocation and scheduling of applications affect the timing response and system performance, particularly for Network-on-Chip (NoC) based multicore systems executing realtime applications. These systems with multitasking processors provide improved opportunity for parallel application execution. In dynamic scenarios, runtime task allocation improves the system resource utilization and adapts to varying application workload. In this work, we present an efficient hybrid strategy for unified allocation and scheduling of tasks at runtime. By considering multitasking capability of processors, communication cost and task timing characteristics, potential allocation solutions are obtained at design-time. These are adapted for dynamic mapping and scheduling of computation and communication workloads of real-time applications. Simulation results show that the proposed approach achieves 34.2% and 26% average reduction in network latency and communication cost of the allocated applications. Also, the deadline satisfaction of the tasks improves on average by 42.1% while reducing the allocation-time overhead by 32% when compared with existing techniques.
2018 31st IEEE International System-on-Chip Conference (SOCC), 2018
Efficient synchronization is one of the basic requirements of effective parallel computing. A key... more Efficient synchronization is one of the basic requirements of effective parallel computing. A key operation of the POSIX Thread standard (PThread) is barrier synchronization, where multiple threads block on a userspecified point of execution until all of them have reached it. Conventional architectures for broadcast operations limit the achievable performance benefits as synchronization is significantly affected due to critical path communications. This increases the network latency and degrades the performance dramatically. A Wireless Network-on-Chip (WiNoC) offers a promising solution to reduce the long distance/critical path communication bottlenecks of conventional architectures by augmenting them with single hop, long-range wireless links. In this paper, we propose a power-aware broadcast enabled WiNoC architecture to reduce the cost of broadcast operations for barrier-based applications. The proposed architecture reduces the barrier synchronization cost up to 43.97% regarding network latency under the PARSEC benchmarks. It also saves up to 80.49% idle-state power consumption in WIs for a 64-core system compared with the conventional WiNoC architecture without incurring significant overhead.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020
In this paper, we present a novel logic design style, namely memristor overwrite logic (MOL), ass... more In this paper, we present a novel logic design style, namely memristor overwrite logic (MOL), associated with an original MOL-based computational memory. MOL relies on a fully digital representation of memristor and can operate with different memristive device technologies. Its integration in memristive crossbar arrays and computational memories allows the execution of bit and vector-level primitive logic operations in two computational steps at most. Promising features and performances are demonstrated through the implementation of N-bit full addition using the proposed MOL-based computational memory.
ACM Transactions on Embedded Computing Systems, 2016
Dynamically reconfigurable hardware has been identified as a promising solution for the design of... more Dynamically reconfigurable hardware has been identified as a promising solution for the design of energy-efficient embedded systems. However, its adoption is limited by costly design effort, including verification and validation, which is even more complex than for nondynamically reconfigurable systems. In this article, we propose a tool-supported formal method to automatically design a correct-by-construction control of the reconfiguration. By representing system behaviors with automata, we exploit automated algorithms to synthesize controllers that safely enforce reconfiguration strategies formulated as properties to be satisfied by control. We design generic modeling patterns for a class of reconfigurable architectures, taking into account both hardware architecture and applications, as well as relevant control objectives. We validate our approach on two case studies implemented on FPGAs.
ACM Transactions on Embedded Computing Systems, 2016
This article describes TBES, a software end-to-end environment for synthesizing multitask applica... more This article describes TBES, a software end-to-end environment for synthesizing multitask applications on FPGAs. The implementation follows a template-based approach for creating heterogeneous multiprocessor architectures. Heterogeneity stems from the use of general-purpose processors along with custom accelerators. Experimental results demonstrate substantial speedup for several classes of applications. Furthermore, this work allows for reducing development costs and saving development time for the software architect, the domain expert, and the optimization expert. This work provides a framework to bring together various existing tools and optimisation algorithms. The advantages are manifold: modularity and flexibility, easy customization for best-fit algorithm selection, durability and evolution over time, and legacy preservation including domain experts' know-how. In addition to the use of architecture templates for the overall system, a second contribution lies in using high...
This paper introduces the μSpider CAD tool for NoC design under latency and bandwidth constraints... more This paper introduces the μSpider CAD tool for NoC design under latency and bandwidth constraints and describes the different steps of the associated design flow. We show how the tool can be used to automatically generate a NOC IP compliant with Xilinx EDK tool. We present synthesis results and a real implementation of a video application based on a multi-processor architecture. Finally we conclude about research to be done at application/OS levels above current work to achieve a complete and efficient implementation of a multi-processor embedded system.
EURASIP Journal on Embedded Systems, 2015
This paper addresses the design of embedded systems for outdoor augmented reality (AR) applicatio... more This paper addresses the design of embedded systems for outdoor augmented reality (AR) applications integrated to see-through glasses. The set of tasks includes object positioning, graphic computation, as well as wireless communications, and we consider constraints such as real-time, low power, and low footprint. We introduce an original sailor assistance application, as a typical, useful, and complex outdoor AR application, where context-dependent virtual objects must be placed in the user field of view according to head motions and ambient information. Our study demonstrates that it is worth working on power optimization, since the embedded system based on a standard general-purpose processor (GPP) + graphics processing unit (GPU) consumes more than high-luminosity see-through glasses. This work presents then three main contributions, the first one is the choice and combinations of position and attitude algorithms that fit with the application context. The second one is the architecture of the embedded system, where it is introduced as a fast and simple object processor (OP) optimized for the domain of mobile AR. Finally, the OP implements a new pixel rendering method (incremental pixel shader (IPS)), which is implemented in hardware and takes full advantage of OpenGL ES light model. A GP+OP(s) complete architecture is described and prototyped on field programmable gate-array (FPGA). It includes hardware/software partitioning based on the analysis of application requirements and ergonomics.
IEEE Embedded Systems Letters, 2013
This paper presents an original approach of bandwidth-oriented self-adaptivity in the domain of N... more This paper presents an original approach of bandwidth-oriented self-adaptivity in the domain of Networkon-Chip, where reconfiguration is handled by network interfaces offering traffic with guarantee of service. Reconfiguration is first based on multiple FIFOs with variables bounds and implemented in a single dual-port memory with a dedicated controller. Secondly, it relies on multiple and compliant TDMA tables based on a new heuristic for path computation. Combination of both techniques provide significant bandwidth improvement with a negligible resource overhead. The proposed solution is demonstrated with cycle-accurate VHDL simulation and FPGA implementation for synthetic and image processing applications.
2008 International Conference on Field-Programmable Technology, 2008
The constrained operating environments of many FPGAbased embedded systems require flexible securi... more The constrained operating environments of many FPGAbased embedded systems require flexible security that can be configured to minimize the impact on FPGA area and power consumption. In this paper, a security approach for external memory in FPGA-based embedded systems that exploits FPGA configurability is presented. Our FPGA-based security core provides both confidentiality and integrity for data stored externally to an FPGA which is accessed by a processor on the FPGA chip. The benefits of our security core are demonstrated using four embedded applications implemented on a Stratix II device. Each application requires a collection of tasks with varying memory security requirements. Our security core is used in conjunction with a NIOS II soft processor running the MicroC/OS II operating system. An average memory and energy savings of about 64% and 16%, respectively, is achieved for the four applications versus a non-configurable, uniform security approach.
The emergence of multiple wireless standards is introducing the need of flexible platforms which ... more The emergence of multiple wireless standards is introducing the need of flexible platforms which are able to selfadapt to various environments depending on the application requirements. Our work lies in the domain of self-adaptive heterogeneous multiprocessor architectures. In this paper, we present our ideas about the management of an ASIP-based multistandards iterative receiver, which includes the support for turbo-decoding. In this context, the management of a multistandards receiver provides the services for the self-adaptation mechanisms based on a collect and an analysis of information, a decision making process and a fast reconfiguration of the platform.
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific re... more HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
-La multiplication des standards de communication sans fils introduit le besoin de récepteurs mul... more -La multiplication des standards de communication sans fils introduit le besoin de récepteurs multi-standards reconfigurables. Afin d'adresser cette problématique et faire face à la demande croissante en débit des applications sur les terminaux mobiles, des architectures multi-ASIP ont été développées ces dernières années. De plus, l'évolution dynamique des paramètres de communication ainsi que la réduction du délai entre deux trames de données imposent la mise en oeuvre de mécanismes de reconfiguration optimisés. Dans ce contexte, nous proposons d'optimiser un turbo-décodeur multi-ASIP multi-mode et multi-standard dans un but d'optimisation globale des mécanismes de reconfiguration. Les résultats présentés montrent que les optimisations apportées à l'ASIP engendrent un faible surcoût en surface (0.004 mm 2 pour une technologie CMOS 65 nm) et permettent de réduire significativement la quantité de données nécessaire à la reconfiguration de la plateforme. En effet, pour une plateforme implémentant 8 ASIP, la quantité de données devant être diffusée aux ASIP lors d'un changement de configuration est divisée par 10 grâce aux optimisations proposées couplées à une infrastructure de configuration efficace.
This paper targets the autonomic management of dynamically partially reconfigurable hardware arch... more This paper targets the autonomic management of dynamically partially reconfigurable hardware architectures based on FPGAs. Such hardware-level autonomic computing has been less often studied than at software-level. We consider control techniques to model the considered behaviours of the computing system and derive a controller for the control objective enforcement. Discrete Control modelled with Labelled Transition Systems is employed in this paper. Such models are amenable to Discrete Controller Synthesis algorithms that can automatically generate a controller enforcing the correct behaviours of a controlled system. A general modelling framework is proposed for the control of FPGA based computing systems. We consider system application described as task graphs and FPGA as a set of reconfigurable areas that can be dynamically partially reconfigured to execute tasks. We encode the computation of an autonomic manager as a DCS problem w.r.t. multiple constraints and objectives e.g., mutual exclusion of resource uses, power cost minimization. We validate our models and manager computations by using the BZR language and an experimental demonstrator implemented on a Xilinx FPGA platform.
7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC), 2012
Partial and dynamic reconfiguration provides a relevant new dimension to design efficient paralle... more Partial and dynamic reconfiguration provides a relevant new dimension to design efficient parallel embedded systems. However, due to the encasing complexity of such systems, ensuring the consistency and parallelism management at runtime is still a key challenge. So architecture models and design methodology are required to allow for efficient component reuse and hardware reconfiguration management.This paper presents a distributed persistence management model and its implementation for reconfigurable multiprocessor systems on dynamically reconfigurable circuits. The proposed approach is inspired from the well-known component based models used in software applications development. Our model is based on membranes wrapping the systems components. The objective is to improve design productivity and ensure consistency by managing context switching and storage using modular distributed hardware controllers. These membranes are distributed and optimized with the aim to design self-adaptive systems by allowing dynamic changes in parallelism degree and contexts migration. Simulation and synthesis results are given to show performances and effectiveness of our methodology.