Francois berry - Academia.edu (original) (raw)

Papers by Francois berry

Research paper thumbnail of Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming

Journal of Signal Processing Systems

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A lightweight convolutional neural network as an alternative to DIC to measure in-plane displacement fields

Optics and Lasers in Engineering, Feb 1, 2023

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Flydeling: Streamlined Performance Models for Hardware Acceleration of CNNs through System Identification

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

The introduction of deep learning algorithms, such as Convolutional Neural Networks (CNNs) in man... more The introduction of deep learning algorithms, such as Convolutional Neural Networks (CNNs) in many near-sensor embedded systems, opens new challenges in terms of energy efficiency and hardware performance. An emerging solution to address these challenges is to use tailored heterogeneous hardware accelerators combining processing elements of different architectural natures such as Central Processing Unit (CPU), Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), or Application Specific Integrated Circuit (ASIC). To progress towards heterogeneity, a great asset would be an automated design space exploration tool that chooses, for each accelerated partition of a CNN, the most appropriate architecture considering available resources. To feed such a design space exploration process, models are required that provide very fast yet precise evaluations of alternative architectures or alternative forms of CNNs. Quick configuration estimation could be achieved with few parame...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Risk-Taking Behaviors of Adult Bedridden Patients in Neurosurgery: What Could/Should We Do?

Frontiers in Medicine

Risk-taking behaviors of adult bedridden patients in neurosurgery are frequent, however little an... more Risk-taking behaviors of adult bedridden patients in neurosurgery are frequent, however little analyzed. We aimed to estimate from the literature and our clinical experience the incidence of the different clinical pictures. Risk-taking behaviors seem to be more frequent than reported. They are often minor, but they can lead to death, irrespective of the prescription of physical or chemical constraints. We also aimed to contextualize the risks, and to describe the means reducing the consequences for the patients. Two main conditions were identified, the loss of awareness of risk-taking behaviors by the patient, and uncontrolled body motions. Besides, current experience feedback analyses and new non-exclusive technological solutions could limit the complications, while improving prevention with wearable systems, neighborhood sensors, or room monitoring and service robots. Further research is mandatory to develop efficient and reliable systems avoiding complications and saving lives. E...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Dense Feature Matching Core for FPGA-based Smart Cameras

Proceedings of the 11th International Conference on Distributed Smart Cameras

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?

ArXiv, 2021

Graphics Processing Units (GPUs) are currently the dominating programmable architecture for Deep ... more Graphics Processing Units (GPUs) are currently the dominating programmable architecture for Deep Learning (DL) accelerators. The adoption of Field Programmable Gate Arrays (FPGAs) in DL accelerators is however getting momentum. In this paper, we demonstrate that Direct Hardware Mapping (DHM) of a Convolutional Neural Network (CNN) on an embedded FPGA substantially outperforms a GPU implementation in terms of energy efficiency and execution time. However, DHM is highly resource intensive and cannot fully substitute the GPU when implementing a state-of-the-art CNN. We thus propose a hybrid FPGA-GPU DL acceleration method and demonstrate that heterogeneous acceleration outperforms GPU acceleration even including communication overheads. Experimental results are conducted on a heterogeneous multi-platform setup embedding an Nvidia® Jetson TX2 CPUGPU board and an Intel® Cyclone10GX FPGA board. The SqueezeNet, MobileNetv2, and ShuffleNetv2 mobile-oriented CNNs are experimented. We show th...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of LobNet platform: Target tracking with a low resolution camera network

In this paper, we present a new platform for environment monitoring using very low specification ... more In this paper, we present a new platform for environment monitoring using very low specification cameras. These latter are distinguished by their visual sensors offering tiny images (30*30 pixels), completely local processing thanks to the max10 FPGA and the SmartMesh IP technology offering a mesh network. The lack of information extracted by visual sensors is filled by intensive communication and data exchanged between cameras after each detection. Thus, a re-identification process is applied based on exchanged and extracted data after each target detection.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Alternatives to Bicubic Interpolation Considering FPGA Hardware Resource Consumption

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2021

Bicubic interpolation is widely used in real-time image processing systems because of its quality... more Bicubic interpolation is widely used in real-time image processing systems because of its quality. The real-time implementation of bicubic interpolation requires a lot of hardware resources, especially the number of multipliers because it represents high computational complexity. In this article, a set of algorithms that approximate the bicubic interpolation and reduce the hardware resource consumption are proposed. The proposed algorithms are based on combining linear and cubic interpolations. These algorithms are surveyed and compared in terms of interpolation quality, number of adders, number of multipliers, adaptive logic modules, lookup tables (LUTs), registers, and maximum operating frequency. These algorithms are implemented and tested on an Intel Cyclone V target. This article provides various choices of interpolation algorithms to cater to different application requirements, including accuracy, hardware resource consumption, and throughput performance. The implementation codes are available at github.com/DreamIP/Interpolation.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Towards Embedded Heterogeneous FPGA-GPU Smart Camera Architectures for CNN Inference

Proceedings of the 13th International Conference on Distributed Smart Cameras, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of The CAPH Language, Ten Years After

Embedded Computer Systems: Architectures, Modeling, and Simulation, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Special issue on advances on smart camera architectures for real-time image processing

Journal of Real-Time Image Processing, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Depth from a Motion Algorithm and a Hardware Architecture for Smart Cameras

Sensors, 2018

Applications such as autonomous navigation, robot vision, and autonomous flying require depth map... more Applications such as autonomous navigation, robot vision, and autonomous flying require depth map information of a scene. Depth can be estimated by using a single moving camera (depth from motion). However, the traditional depth from motion algorithms have low processing speeds and high hardware requirements that limit the embedded capabilities. In this work, we propose a hardware architecture for depth from motion that consists of a flow/depth transformation and a new optical flow algorithm. Our optical flow formulation consists in an extension of the stereo matching problem. A pixel-parallel/window-parallel approach where a correlation function based on the sum of absolute difference (SAD) computes the optical flow is proposed. Further, in order to improve the SAD, the curl of the intensity gradient as a preprocessing step is proposed. Experimental results demonstrated that it is possible to reach higher accuracy (90% of accuracy) compared with previous Field Programmable Gate Arr...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Robust feature extraction algorithm suitable for real-time embedded applications

Journal of Real-Time Image Processing, 2017

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA

Proceedings of the 10th International Conference on Distributed Smart Camera, 2016

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Distributed coordination model for smart sensing applications

Proceedings of the 10th International Conference on Distributed Smart Camera, 2016

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Bio-inspired heterogeneous architecture for real-time pedestrian detection applications

Journal of Real-Time Image Processing, 2016

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Radar and vision sensors calibration for outdoor 3D reconstruction

2015 IEEE International Conference on Robotics and Automation (ICRA), 2015

In this paper we introduce a new geometric calibration algorithm, and a geometric method of 3D re... more In this paper we introduce a new geometric calibration algorithm, and a geometric method of 3D reconstruction using a panoramic microwave radar and a camera. These two sensors are complementary, considering the robustness to environmental conditions and depth detection ability of the radar on one hand, and the high spatial resolution of a vision sensor on the other hand. This makes the approach well adapted for large scale outdoor cartography. Firstly, we address the global calibration problem which consists in finding the exact transformation between radar and camera coordinate systems. The method is based on the optimization of a non-linear criterion obtained from a set of radar-to-image target correspondences. Unlike existing methods, no special configuration of the 3D points is required, only the knowledge of inter-targets distance is needed. This makes the method flexible and easy to use by a non expert operator. Secondly, we present a 3D reconstruction method based on sensors geometry. Both methods have been validated with synthetic and real data.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of High-Level Dataflow Programming for Reconfigurable Computing

2014 International Symposium on Computer Architecture and High Performance Computing Workshop, 2014

In many application domains, FPGAs are now promoted as a way of getting round the restrictions of... more In many application domains, FPGAs are now promoted as a way of getting round the restrictions of specific CPU designs on system scalability. However, in the current state-of-the art, programming FPGAs remains essentially a hardware-oriented activity, relying on dedicated hardware description languages such as VHDL or Verilog. Using these languages requires expertise in digital design and in practice this limits the applicability of FPGA-based solutions. This is particulary true for stream-processing applications, in which some processing must be carried out "on the fly" on digital data streams. In this context, the dataflow programming model offers a very effective way to reduce the gap between high-level formulations and low-level implementations. To support this claim, the authors have recently introduced CAPH, a domain specific language, offering a fully-automated compilation path from high-level dataflow descriptions to FPGA configuration for stream-processing applications. This paper is a introduction to the CAPH language, giving its motivations and main design principles and exposing the basic features of its syntax, semantics and compilation. It also points to experimental results showing that, at least for stream-processing applications, the dataflow model of computation, used jointly as a programming model and an execution model, can offer a very effective way to conciliate abstraction and efficiency when programming FPGAs.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Embedded multi-processor system-on-programmable chip for smart camera pose estimation using nonlinear optimization methods

Journal of Real-Time Image Processing, 2014

ABSTRACT The PanoraMOS prototype is a complete localization system targeting Simultaneous Localiz... more ABSTRACT The PanoraMOS prototype is a complete localization system targeting Simultaneous Localization and Mapping applications. It is a panoramic camera that uses a single rotating linear sensor to capture cylindrical panoramic images at up to 3 frames per second. A complete localization algorithm has been implemented into the hardware architecture of the system. It has the ability to estimate its 3D pose in an indoor or an outdoor environment. This estimation is performed using a feature extractor and the Levenberg–Marquardt (LM) algorithm with the Random Sample Consensus (RANSAC) algorithm to perform detection. In this paper, we present the whole system particularly emphasize the localization algorithm and its implementation on a hardware architecture which is our main contribution. The implementation was done on a Multi-Processor System-on-Chip architecture. We present both software and hardware implementations with performance results on an ALTERA System-on-Programmable Chip target. The experimental results including processing times and application speed up show that our homogeneous network of processors is efficient for embedding the proposed image processing application.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Harnessing a multi-sensor fpga-based smart camera: a virtual processor-based approach

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming

Journal of Signal Processing Systems

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A lightweight convolutional neural network as an alternative to DIC to measure in-plane displacement fields

Optics and Lasers in Engineering, Feb 1, 2023

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Flydeling: Streamlined Performance Models for Hardware Acceleration of CNNs through System Identification

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

The introduction of deep learning algorithms, such as Convolutional Neural Networks (CNNs) in man... more The introduction of deep learning algorithms, such as Convolutional Neural Networks (CNNs) in many near-sensor embedded systems, opens new challenges in terms of energy efficiency and hardware performance. An emerging solution to address these challenges is to use tailored heterogeneous hardware accelerators combining processing elements of different architectural natures such as Central Processing Unit (CPU), Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), or Application Specific Integrated Circuit (ASIC). To progress towards heterogeneity, a great asset would be an automated design space exploration tool that chooses, for each accelerated partition of a CNN, the most appropriate architecture considering available resources. To feed such a design space exploration process, models are required that provide very fast yet precise evaluations of alternative architectures or alternative forms of CNNs. Quick configuration estimation could be achieved with few parame...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Risk-Taking Behaviors of Adult Bedridden Patients in Neurosurgery: What Could/Should We Do?

Frontiers in Medicine

Risk-taking behaviors of adult bedridden patients in neurosurgery are frequent, however little an... more Risk-taking behaviors of adult bedridden patients in neurosurgery are frequent, however little analyzed. We aimed to estimate from the literature and our clinical experience the incidence of the different clinical pictures. Risk-taking behaviors seem to be more frequent than reported. They are often minor, but they can lead to death, irrespective of the prescription of physical or chemical constraints. We also aimed to contextualize the risks, and to describe the means reducing the consequences for the patients. Two main conditions were identified, the loss of awareness of risk-taking behaviors by the patient, and uncontrolled body motions. Besides, current experience feedback analyses and new non-exclusive technological solutions could limit the complications, while improving prevention with wearable systems, neighborhood sensors, or room monitoring and service robots. Further research is mandatory to develop efficient and reliable systems avoiding complications and saving lives. E...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Dense Feature Matching Core for FPGA-based Smart Cameras

Proceedings of the 11th International Conference on Distributed Smart Cameras

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?

ArXiv, 2021

Graphics Processing Units (GPUs) are currently the dominating programmable architecture for Deep ... more Graphics Processing Units (GPUs) are currently the dominating programmable architecture for Deep Learning (DL) accelerators. The adoption of Field Programmable Gate Arrays (FPGAs) in DL accelerators is however getting momentum. In this paper, we demonstrate that Direct Hardware Mapping (DHM) of a Convolutional Neural Network (CNN) on an embedded FPGA substantially outperforms a GPU implementation in terms of energy efficiency and execution time. However, DHM is highly resource intensive and cannot fully substitute the GPU when implementing a state-of-the-art CNN. We thus propose a hybrid FPGA-GPU DL acceleration method and demonstrate that heterogeneous acceleration outperforms GPU acceleration even including communication overheads. Experimental results are conducted on a heterogeneous multi-platform setup embedding an Nvidia® Jetson TX2 CPUGPU board and an Intel® Cyclone10GX FPGA board. The SqueezeNet, MobileNetv2, and ShuffleNetv2 mobile-oriented CNNs are experimented. We show th...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of LobNet platform: Target tracking with a low resolution camera network

In this paper, we present a new platform for environment monitoring using very low specification ... more In this paper, we present a new platform for environment monitoring using very low specification cameras. These latter are distinguished by their visual sensors offering tiny images (30*30 pixels), completely local processing thanks to the max10 FPGA and the SmartMesh IP technology offering a mesh network. The lack of information extracted by visual sensors is filled by intensive communication and data exchanged between cameras after each detection. Thus, a re-identification process is applied based on exchanged and extracted data after each target detection.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Alternatives to Bicubic Interpolation Considering FPGA Hardware Resource Consumption

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2021

Bicubic interpolation is widely used in real-time image processing systems because of its quality... more Bicubic interpolation is widely used in real-time image processing systems because of its quality. The real-time implementation of bicubic interpolation requires a lot of hardware resources, especially the number of multipliers because it represents high computational complexity. In this article, a set of algorithms that approximate the bicubic interpolation and reduce the hardware resource consumption are proposed. The proposed algorithms are based on combining linear and cubic interpolations. These algorithms are surveyed and compared in terms of interpolation quality, number of adders, number of multipliers, adaptive logic modules, lookup tables (LUTs), registers, and maximum operating frequency. These algorithms are implemented and tested on an Intel Cyclone V target. This article provides various choices of interpolation algorithms to cater to different application requirements, including accuracy, hardware resource consumption, and throughput performance. The implementation codes are available at github.com/DreamIP/Interpolation.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Towards Embedded Heterogeneous FPGA-GPU Smart Camera Architectures for CNN Inference

Proceedings of the 13th International Conference on Distributed Smart Cameras, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of The CAPH Language, Ten Years After

Embedded Computer Systems: Architectures, Modeling, and Simulation, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Special issue on advances on smart camera architectures for real-time image processing

Journal of Real-Time Image Processing, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Depth from a Motion Algorithm and a Hardware Architecture for Smart Cameras

Sensors, 2018

Applications such as autonomous navigation, robot vision, and autonomous flying require depth map... more Applications such as autonomous navigation, robot vision, and autonomous flying require depth map information of a scene. Depth can be estimated by using a single moving camera (depth from motion). However, the traditional depth from motion algorithms have low processing speeds and high hardware requirements that limit the embedded capabilities. In this work, we propose a hardware architecture for depth from motion that consists of a flow/depth transformation and a new optical flow algorithm. Our optical flow formulation consists in an extension of the stereo matching problem. A pixel-parallel/window-parallel approach where a correlation function based on the sum of absolute difference (SAD) computes the optical flow is proposed. Further, in order to improve the SAD, the curl of the intensity gradient as a preprocessing step is proposed. Experimental results demonstrated that it is possible to reach higher accuracy (90% of accuracy) compared with previous Field Programmable Gate Arr...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Robust feature extraction algorithm suitable for real-time embedded applications

Journal of Real-Time Image Processing, 2017

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA

Proceedings of the 10th International Conference on Distributed Smart Camera, 2016

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Distributed coordination model for smart sensing applications

Proceedings of the 10th International Conference on Distributed Smart Camera, 2016

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Bio-inspired heterogeneous architecture for real-time pedestrian detection applications

Journal of Real-Time Image Processing, 2016

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Radar and vision sensors calibration for outdoor 3D reconstruction

2015 IEEE International Conference on Robotics and Automation (ICRA), 2015

In this paper we introduce a new geometric calibration algorithm, and a geometric method of 3D re... more In this paper we introduce a new geometric calibration algorithm, and a geometric method of 3D reconstruction using a panoramic microwave radar and a camera. These two sensors are complementary, considering the robustness to environmental conditions and depth detection ability of the radar on one hand, and the high spatial resolution of a vision sensor on the other hand. This makes the approach well adapted for large scale outdoor cartography. Firstly, we address the global calibration problem which consists in finding the exact transformation between radar and camera coordinate systems. The method is based on the optimization of a non-linear criterion obtained from a set of radar-to-image target correspondences. Unlike existing methods, no special configuration of the 3D points is required, only the knowledge of inter-targets distance is needed. This makes the method flexible and easy to use by a non expert operator. Secondly, we present a 3D reconstruction method based on sensors geometry. Both methods have been validated with synthetic and real data.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of High-Level Dataflow Programming for Reconfigurable Computing

2014 International Symposium on Computer Architecture and High Performance Computing Workshop, 2014

In many application domains, FPGAs are now promoted as a way of getting round the restrictions of... more In many application domains, FPGAs are now promoted as a way of getting round the restrictions of specific CPU designs on system scalability. However, in the current state-of-the art, programming FPGAs remains essentially a hardware-oriented activity, relying on dedicated hardware description languages such as VHDL or Verilog. Using these languages requires expertise in digital design and in practice this limits the applicability of FPGA-based solutions. This is particulary true for stream-processing applications, in which some processing must be carried out "on the fly" on digital data streams. In this context, the dataflow programming model offers a very effective way to reduce the gap between high-level formulations and low-level implementations. To support this claim, the authors have recently introduced CAPH, a domain specific language, offering a fully-automated compilation path from high-level dataflow descriptions to FPGA configuration for stream-processing applications. This paper is a introduction to the CAPH language, giving its motivations and main design principles and exposing the basic features of its syntax, semantics and compilation. It also points to experimental results showing that, at least for stream-processing applications, the dataflow model of computation, used jointly as a programming model and an execution model, can offer a very effective way to conciliate abstraction and efficiency when programming FPGAs.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Embedded multi-processor system-on-programmable chip for smart camera pose estimation using nonlinear optimization methods

Journal of Real-Time Image Processing, 2014

ABSTRACT The PanoraMOS prototype is a complete localization system targeting Simultaneous Localiz... more ABSTRACT The PanoraMOS prototype is a complete localization system targeting Simultaneous Localization and Mapping applications. It is a panoramic camera that uses a single rotating linear sensor to capture cylindrical panoramic images at up to 3 frames per second. A complete localization algorithm has been implemented into the hardware architecture of the system. It has the ability to estimate its 3D pose in an indoor or an outdoor environment. This estimation is performed using a feature extractor and the Levenberg–Marquardt (LM) algorithm with the Random Sample Consensus (RANSAC) algorithm to perform detection. In this paper, we present the whole system particularly emphasize the localization algorithm and its implementation on a hardware architecture which is our main contribution. The implementation was done on a Multi-Processor System-on-Chip architecture. We present both software and hardware implementations with performance results on an ALTERA System-on-Programmable Chip target. The experimental results including processing times and application speed up show that our homogeneous network of processors is efficient for embedding the proposed image processing application.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Harnessing a multi-sensor fpga-based smart camera: a virtual processor-based approach

Bookmarks Related papers MentionsView impact