Yves Durand - Profile on Academia.edu (original) (raw)

Papers by Yves Durand

Research paper thumbnail of Error Analysis of the Square Root Operation for the Purpose of Precision Tuning: A Case Study on K-means

2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

In this paper, we propose an analytical approach to study the impact of floating point (FLP) prec... more In this paper, we propose an analytical approach to study the impact of floating point (FLP) precision variation on the square root operation, in terms of computational accuracy and performance gain. We estimate the round-off error resulting from reduced precision. We also inspect the Newton Raphson algorithm used to approximate the square root in order to bound the error caused by algorithmic deviation. Consequently, the implementation of the square root can be optimized by fittingly adjusting its number of iterations with respect to any given FLP precision specification, without the need for long simulation times. We evaluate our error analysis of the square root operation as part of approximating a classic data clustering algorithm known as K-means, for the purpose of reducing its energy footprint. We compare the resulting inexact K-means to its exact counterpart, in the context of color quantization, in terms of energy gain and quality of the output. The experimental results show that energy savings could be achieved without penalizing the quality of the output (e.g., up to 41.87% of energy gain for an output quality, measured using structural similarity, within a range of [0.95,1]).

Research paper thumbnail of Evaluation of variable bit-width units in a RISC-V processor for approximate computing

Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

Among various power reduction methods, variable bit-width arithmetic units have been proposed in ... more Among various power reduction methods, variable bit-width arithmetic units have been proposed in approximate computing literature. In this paper, we add a variable bit-width memory unit in a RISC-V processor. Integrating both computation and memory units with variable bit-width leads to a power reduction: from 7% to 29% for Sobel filter application and from 13% to 24% for an application that computes the position of a robotic arm (forwardk2j). We also propose a global energy model for a RISC-V processor with variable bit-width units (for computation and memory). This model allows us to evaluate the impact of various parameters in both the software application (e.g., the amount of instructions that can be executed with a reduced bit-width) and the hardware architecture (e.g., impact of potential reduction for each unit).

Research paper thumbnail of Byte-Aware Floating-point Operations through a UNUM Computing Unit

2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC), 2019

Most floating-point (FP) hardware support the IEEE 754 format, which defines fixed-size data type... more Most floating-point (FP) hardware support the IEEE 754 format, which defines fixed-size data types from 16 to 128 bits. However, a range of applications benefit from different formats, implementing different tradeoffs. This paper proposes a Variable Precision (VP) computing unit offering a finer granularity of high precision FP operations. The chosen memory format is derived from UNUM type I, where the size of a number is stored within the representation itself. The unit implements a fully pipelined architecture, and it supports up to 512 bits of precision for both interval and scalar computing. The user can configure the storage format up to 8-bit granularity, and the internal computing precision at 64-bit granularity. The system is integrated as a RISC-V coprocessor. Dedicated compiler support exposes the unit through a high level programming abstraction, covering all the operating features of UNUM type I. FPGA-based measurements show that the latency and the computation accuracy of this system scale linearly with the memory format length set by the user. Compared with the MPFR software library, the proposed unit achieves speedups between 3.5x and 18x, with comparable accuracy.

Research paper thumbnail of Variable Precision Floating-Point RISC-V Coprocessor Evaluation using Lightweight Software and Compiler Support

The popularity and community-driven development model of RISC-V have opened many areas of investi... more The popularity and community-driven development model of RISC-V have opened many areas of investigation to researchers and engineers. To overcome some of the IEEE 754 standard's limitations, one currently emerging avenue for computer architecture and systems research is the area of alternative floating-point computation. The UNUM format, for instance, offers variable precision and much flexibility useful to scientific computing or computational geometry. Programmers usually rely on arbitrary precision libraries such as MPFR (itself depending on GMP). However, there is currently no specialized RISC-V support for these libraries, and little support for variable precision arithmetic across the tool chain in general. We propose a framework to explore the potential of variable precision arithmetic in scientific computing applications on RISC-V processors. This work comprises: (i) a floating-point RISC-V copro-cessor which improve accuracy using the UNUM format; (ii) an ISA extension ...

Research paper thumbnail of Smurf

Proceedings of the Conference for Next Generation Arithmetic 2019, 2019

This paper proposes an innovative Floating Point (FP) architecture for Variable Precision (VP) co... more This paper proposes an innovative Floating Point (FP) architecture for Variable Precision (VP) computation suitable for high precision FP computing, based on a refined version of the UNUM type I format. This architecture supports VP FP intervals where each interval endpoint can have up to 512 bits of mantissa. The proposed hardware architecture is pipelined and has an internal word-size of 64 bits. Computations on longer mantissas are performed iteratively on the existing hardware. The prototype is integrated in a RISC-V environment, it is exposed to the user through an instruction set extension. The paper we provide an example of software usage. The system has been prototyped on a FPGA (Field-Programmable Gate Array) platform and also synthesized for a 28nm FDSOI process technology. The respective working frequency of FPGA and ASIC implementations are 50MHz and 600MHz. The estimated chip area is 1.5 2 and the estimated power consumption is 95mW. The flops performance of this architecture remains within the range of a regular fixed-precision IEEE FPU while enabling arbitrary precision computation at reasonable cost. CCS CONCEPTS • Hardware → Emerging technologies; Very large scale integration design; Communication hardware, interfaces and storage; Power and energy; • Computer systems organization → Architectures; Embedded and cyber-physical systems; • Computing methodologies → Modeling and simulation;

Research paper thumbnail of A methodology for the design of dynamic accuracy operators by runtime back bias

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, Mar 1, 2017

Mobile and IoT applications must balance increasing processing demands with limited power and cos... more Mobile and IoT applications must balance increasing processing demands with limited power and cost budgets. Approximate computing achieves this goal leveraging the error tolerance features common in many emerging applications to reduce power consumption. In particular, adequate (i.e., energy/qualityconfigurable) hardware operators are key components in an error tolerant system. Existing implementations of these operators require significant architectural modifications, hence they are often design-specific and tend to have large overheads compared to accurate units. In this paper, we propose a methodology to design adequate datapath operators in an automatic way, which uses threshold voltage scaling as a knob to dynamically control the power/accuracy tradeoff. The method overcomes the limitations of previous solutions based on supply voltage scaling, in that it introduces lower overheads and it allows fine-grain regulation of this tradeoff. We demonstrate our approach on a state-of-the-art 28nm FDSOI technology, exploiting the strong effect of back biasing on threshold voltage. Results show a power consumption reduction of as much as 39% compared to solutions based only on supply voltage scaling, at iso-accuracy.

Research paper thumbnail of Dynamic Precision Numerics Using a Variable-Precision UNUM Type I HW Coprocessor

A very large internal accumulation register has been proposed to increase the accuracy of scienti... more A very large internal accumulation register has been proposed to increase the accuracy of scientific code. However, there is a general class of iterative kernels where a vector of high-precision data must be saved from one iteration to the next. Saving the large internal accumulator to memory is impractical in such cases. This work proposes a Variable Precision (VP) Floating Point (FP) arithmetic co-processor architecture based on RISC-V, which 1/ supports legacy IEEE formats for input and output variables, 2/ uses variable length internal registers (up to 512 bits of mantissa) for inner loop multiply-add and 3/ supports loads and stores of intermediate results to cache memory with a dynamically adjustable precision (up to 256 bits of mantissa). It exploits the UNUM type I floating point format, proposing solutions to address some of its pitfalls such as the variable latency of the internal operation, and the variable memory footprint of the intermediate variables. This work is inte...

Research paper thumbnail of Channel-bonding CMOS transceiver for 100 Gbps wireless point-to-point links

EURASIP Journal on Wireless Communications and Networking

5G systems and networks are expected to provide unprecedented data-rate to final users and servic... more 5G systems and networks are expected to provide unprecedented data-rate to final users and services, in combination with increased coverage and density. The traffic generated at the edges of the network should be hauled through high capacity data-conveyors. Extremely high data-rate links able to provide optical-fiber like performance in the order of 100 Gbps are required to reduce the cost and increase the flexibility of the network infrastructure deployment. This paper presents a full transceiver architecture based on a channel-bonding radio-frequency front-end operating at millimeter-wave frequencies and digital baseband processing units able to provide such data-rates with a feasible implementation in low-cost CMOS technologies. The baseband section of the receiver includes digital compensation algorithms that allow to cope with some of the radio front-end impairments. The main functionalities of the proposed transceiver architecture are validated in hardware.

Research paper thumbnail of D8.4 Technology Implementation Plan

D8.4 Technology Implementation Plan

Research paper thumbnail of Resilient protocol for control of composite services

Resilient protocol for control of composite services

Research paper thumbnail of FAUST, an Asynchronous Network-on-Chip based Architecture for Telecom Applications

We present the FAUST chip (20 NoC nodes and units in a 130µm technology) and the FAUST platform a... more We present the FAUST chip (20 NoC nodes and units in a 130µm technology) and the FAUST platform addressing Telecom applications. The demo shows the feasibility of a complex GALS NoC architecture.

Research paper thumbnail of Direct Access Memory Controller with Multiple Sources, Corresponding Method and Computer Program

Direct Access Memory Controller with Multiple Sources, Corresponding Method and Computer Program

Research paper thumbnail of Stream Management in an On-Chip Network

Stream Management in an On-Chip Network

Research paper thumbnail of Lightweight service brokering systems

Lightweight service brokering systems

Research paper thumbnail of Guaranteed Services of the NoC of a Manycore Processor

Guaranteed Services of the NoC of a Manycore Processor

Proceedings of the 2014 International Workshop, Dec 13, 2014

Research paper thumbnail of SIDRAH: A software infrastructure for a resilient community of

The SIDRAH project proposes a software package to enable spontaneous collaboration between wirele... more The SIDRAH project proposes a software package to enable spontaneous collaboration between wireless devices. Networked devices temporarily form a group, called a SIDRAH community, where they offer services to each others. We propose a complete solution, from template services down to wireless protocol stacks to realize this. We face two major issues: 1/ How to create the illusion of a homogeneous "virtual network" between devices supporting heterogeneous technologies and 2/ how to handle hazards in a way that is acceptable for the end-users. The detection of these hazards is the key for the resilience of SIDRAH community. Our approach is to instrument the network layers of the SIDRAH stack with pseudo-asynchronous "failure detectors". The events that the detectors identify are propagated and handled to compensate the failures at the most appropriate level using explicit "disconnection policies".

Research paper thumbnail of Network on Chip with Quality of Service

Network on Chip with Quality of Service

Research paper thumbnail of Direct memory access controller, corresponding method and computer program

Direct memory access controller, corresponding method and computer program

Research paper thumbnail of Power Modeling of a NoC Based Design for High Speed Telecommunication Systems

Power Modeling of a NoC Based Design for High Speed Telecommunication Systems

Lecture Notes in Computer Science, 2006

Considering the complexity of the future 4G telecommunication systems, power consumption manageme... more Considering the complexity of the future 4G telecommunication systems, power consumption management becomes a major challenge for the designers, particularly for base-band modem functionalities. System level low-power policies which optimize dynamically the consumption, achieve major power savings compared to low level optimisations (e.g gated clock or transistor optimisation). We present an innovative power modeling methodology of a 4G modem which

Research paper thumbnail of The Radio Virtual Machine: A solution for SDR portability and platform reconfigurability

The Radio Virtual Machine: A solution for SDR portability and platform reconfigurability

2009 IEEE International Symposium on Parallel & Distributed Processing, 2009

... des Arts, 69621 Villeurbanne Cedex, France {riadh.ben-abdallah, tanguy.risset, antoine.frabou... more ... des Arts, 69621 Villeurbanne Cedex, France {riadh.ben-abdallah, tanguy.risset, antoine.fraboulet}@insa-lyon.fr ... 2005. IEEE Computer Society. [6] J. Eker, JW Janneck, EA Lee, J. Liu, X. Liu, J. Lud-vig, S. Neuendorffer, S. Sachs, and Y. Xiong. ...

Research paper thumbnail of Error Analysis of the Square Root Operation for the Purpose of Precision Tuning: A Case Study on K-means

2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

In this paper, we propose an analytical approach to study the impact of floating point (FLP) prec... more In this paper, we propose an analytical approach to study the impact of floating point (FLP) precision variation on the square root operation, in terms of computational accuracy and performance gain. We estimate the round-off error resulting from reduced precision. We also inspect the Newton Raphson algorithm used to approximate the square root in order to bound the error caused by algorithmic deviation. Consequently, the implementation of the square root can be optimized by fittingly adjusting its number of iterations with respect to any given FLP precision specification, without the need for long simulation times. We evaluate our error analysis of the square root operation as part of approximating a classic data clustering algorithm known as K-means, for the purpose of reducing its energy footprint. We compare the resulting inexact K-means to its exact counterpart, in the context of color quantization, in terms of energy gain and quality of the output. The experimental results show that energy savings could be achieved without penalizing the quality of the output (e.g., up to 41.87% of energy gain for an output quality, measured using structural similarity, within a range of [0.95,1]).

Research paper thumbnail of Evaluation of variable bit-width units in a RISC-V processor for approximate computing

Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

Among various power reduction methods, variable bit-width arithmetic units have been proposed in ... more Among various power reduction methods, variable bit-width arithmetic units have been proposed in approximate computing literature. In this paper, we add a variable bit-width memory unit in a RISC-V processor. Integrating both computation and memory units with variable bit-width leads to a power reduction: from 7% to 29% for Sobel filter application and from 13% to 24% for an application that computes the position of a robotic arm (forwardk2j). We also propose a global energy model for a RISC-V processor with variable bit-width units (for computation and memory). This model allows us to evaluate the impact of various parameters in both the software application (e.g., the amount of instructions that can be executed with a reduced bit-width) and the hardware architecture (e.g., impact of potential reduction for each unit).

Research paper thumbnail of Byte-Aware Floating-point Operations through a UNUM Computing Unit

2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC), 2019

Most floating-point (FP) hardware support the IEEE 754 format, which defines fixed-size data type... more Most floating-point (FP) hardware support the IEEE 754 format, which defines fixed-size data types from 16 to 128 bits. However, a range of applications benefit from different formats, implementing different tradeoffs. This paper proposes a Variable Precision (VP) computing unit offering a finer granularity of high precision FP operations. The chosen memory format is derived from UNUM type I, where the size of a number is stored within the representation itself. The unit implements a fully pipelined architecture, and it supports up to 512 bits of precision for both interval and scalar computing. The user can configure the storage format up to 8-bit granularity, and the internal computing precision at 64-bit granularity. The system is integrated as a RISC-V coprocessor. Dedicated compiler support exposes the unit through a high level programming abstraction, covering all the operating features of UNUM type I. FPGA-based measurements show that the latency and the computation accuracy of this system scale linearly with the memory format length set by the user. Compared with the MPFR software library, the proposed unit achieves speedups between 3.5x and 18x, with comparable accuracy.

Research paper thumbnail of Variable Precision Floating-Point RISC-V Coprocessor Evaluation using Lightweight Software and Compiler Support

The popularity and community-driven development model of RISC-V have opened many areas of investi... more The popularity and community-driven development model of RISC-V have opened many areas of investigation to researchers and engineers. To overcome some of the IEEE 754 standard's limitations, one currently emerging avenue for computer architecture and systems research is the area of alternative floating-point computation. The UNUM format, for instance, offers variable precision and much flexibility useful to scientific computing or computational geometry. Programmers usually rely on arbitrary precision libraries such as MPFR (itself depending on GMP). However, there is currently no specialized RISC-V support for these libraries, and little support for variable precision arithmetic across the tool chain in general. We propose a framework to explore the potential of variable precision arithmetic in scientific computing applications on RISC-V processors. This work comprises: (i) a floating-point RISC-V copro-cessor which improve accuracy using the UNUM format; (ii) an ISA extension ...

Research paper thumbnail of Smurf

Proceedings of the Conference for Next Generation Arithmetic 2019, 2019

This paper proposes an innovative Floating Point (FP) architecture for Variable Precision (VP) co... more This paper proposes an innovative Floating Point (FP) architecture for Variable Precision (VP) computation suitable for high precision FP computing, based on a refined version of the UNUM type I format. This architecture supports VP FP intervals where each interval endpoint can have up to 512 bits of mantissa. The proposed hardware architecture is pipelined and has an internal word-size of 64 bits. Computations on longer mantissas are performed iteratively on the existing hardware. The prototype is integrated in a RISC-V environment, it is exposed to the user through an instruction set extension. The paper we provide an example of software usage. The system has been prototyped on a FPGA (Field-Programmable Gate Array) platform and also synthesized for a 28nm FDSOI process technology. The respective working frequency of FPGA and ASIC implementations are 50MHz and 600MHz. The estimated chip area is 1.5 2 and the estimated power consumption is 95mW. The flops performance of this architecture remains within the range of a regular fixed-precision IEEE FPU while enabling arbitrary precision computation at reasonable cost. CCS CONCEPTS • Hardware → Emerging technologies; Very large scale integration design; Communication hardware, interfaces and storage; Power and energy; • Computer systems organization → Architectures; Embedded and cyber-physical systems; • Computing methodologies → Modeling and simulation;

Research paper thumbnail of A methodology for the design of dynamic accuracy operators by runtime back bias

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, Mar 1, 2017

Mobile and IoT applications must balance increasing processing demands with limited power and cos... more Mobile and IoT applications must balance increasing processing demands with limited power and cost budgets. Approximate computing achieves this goal leveraging the error tolerance features common in many emerging applications to reduce power consumption. In particular, adequate (i.e., energy/qualityconfigurable) hardware operators are key components in an error tolerant system. Existing implementations of these operators require significant architectural modifications, hence they are often design-specific and tend to have large overheads compared to accurate units. In this paper, we propose a methodology to design adequate datapath operators in an automatic way, which uses threshold voltage scaling as a knob to dynamically control the power/accuracy tradeoff. The method overcomes the limitations of previous solutions based on supply voltage scaling, in that it introduces lower overheads and it allows fine-grain regulation of this tradeoff. We demonstrate our approach on a state-of-the-art 28nm FDSOI technology, exploiting the strong effect of back biasing on threshold voltage. Results show a power consumption reduction of as much as 39% compared to solutions based only on supply voltage scaling, at iso-accuracy.

Research paper thumbnail of Dynamic Precision Numerics Using a Variable-Precision UNUM Type I HW Coprocessor

A very large internal accumulation register has been proposed to increase the accuracy of scienti... more A very large internal accumulation register has been proposed to increase the accuracy of scientific code. However, there is a general class of iterative kernels where a vector of high-precision data must be saved from one iteration to the next. Saving the large internal accumulator to memory is impractical in such cases. This work proposes a Variable Precision (VP) Floating Point (FP) arithmetic co-processor architecture based on RISC-V, which 1/ supports legacy IEEE formats for input and output variables, 2/ uses variable length internal registers (up to 512 bits of mantissa) for inner loop multiply-add and 3/ supports loads and stores of intermediate results to cache memory with a dynamically adjustable precision (up to 256 bits of mantissa). It exploits the UNUM type I floating point format, proposing solutions to address some of its pitfalls such as the variable latency of the internal operation, and the variable memory footprint of the intermediate variables. This work is inte...

Research paper thumbnail of Channel-bonding CMOS transceiver for 100 Gbps wireless point-to-point links

EURASIP Journal on Wireless Communications and Networking

5G systems and networks are expected to provide unprecedented data-rate to final users and servic... more 5G systems and networks are expected to provide unprecedented data-rate to final users and services, in combination with increased coverage and density. The traffic generated at the edges of the network should be hauled through high capacity data-conveyors. Extremely high data-rate links able to provide optical-fiber like performance in the order of 100 Gbps are required to reduce the cost and increase the flexibility of the network infrastructure deployment. This paper presents a full transceiver architecture based on a channel-bonding radio-frequency front-end operating at millimeter-wave frequencies and digital baseband processing units able to provide such data-rates with a feasible implementation in low-cost CMOS technologies. The baseband section of the receiver includes digital compensation algorithms that allow to cope with some of the radio front-end impairments. The main functionalities of the proposed transceiver architecture are validated in hardware.

Research paper thumbnail of D8.4 Technology Implementation Plan

D8.4 Technology Implementation Plan

Research paper thumbnail of Resilient protocol for control of composite services

Resilient protocol for control of composite services

Research paper thumbnail of FAUST, an Asynchronous Network-on-Chip based Architecture for Telecom Applications

We present the FAUST chip (20 NoC nodes and units in a 130µm technology) and the FAUST platform a... more We present the FAUST chip (20 NoC nodes and units in a 130µm technology) and the FAUST platform addressing Telecom applications. The demo shows the feasibility of a complex GALS NoC architecture.

Research paper thumbnail of Direct Access Memory Controller with Multiple Sources, Corresponding Method and Computer Program

Direct Access Memory Controller with Multiple Sources, Corresponding Method and Computer Program

Research paper thumbnail of Stream Management in an On-Chip Network

Stream Management in an On-Chip Network

Research paper thumbnail of Lightweight service brokering systems

Lightweight service brokering systems

Research paper thumbnail of Guaranteed Services of the NoC of a Manycore Processor

Guaranteed Services of the NoC of a Manycore Processor

Proceedings of the 2014 International Workshop, Dec 13, 2014

Research paper thumbnail of SIDRAH: A software infrastructure for a resilient community of

The SIDRAH project proposes a software package to enable spontaneous collaboration between wirele... more The SIDRAH project proposes a software package to enable spontaneous collaboration between wireless devices. Networked devices temporarily form a group, called a SIDRAH community, where they offer services to each others. We propose a complete solution, from template services down to wireless protocol stacks to realize this. We face two major issues: 1/ How to create the illusion of a homogeneous "virtual network" between devices supporting heterogeneous technologies and 2/ how to handle hazards in a way that is acceptable for the end-users. The detection of these hazards is the key for the resilience of SIDRAH community. Our approach is to instrument the network layers of the SIDRAH stack with pseudo-asynchronous "failure detectors". The events that the detectors identify are propagated and handled to compensate the failures at the most appropriate level using explicit "disconnection policies".

Research paper thumbnail of Network on Chip with Quality of Service

Network on Chip with Quality of Service

Research paper thumbnail of Direct memory access controller, corresponding method and computer program

Direct memory access controller, corresponding method and computer program

Research paper thumbnail of Power Modeling of a NoC Based Design for High Speed Telecommunication Systems

Power Modeling of a NoC Based Design for High Speed Telecommunication Systems

Lecture Notes in Computer Science, 2006

Considering the complexity of the future 4G telecommunication systems, power consumption manageme... more Considering the complexity of the future 4G telecommunication systems, power consumption management becomes a major challenge for the designers, particularly for base-band modem functionalities. System level low-power policies which optimize dynamically the consumption, achieve major power savings compared to low level optimisations (e.g gated clock or transistor optimisation). We present an innovative power modeling methodology of a 4G modem which

Research paper thumbnail of The Radio Virtual Machine: A solution for SDR portability and platform reconfigurability

The Radio Virtual Machine: A solution for SDR portability and platform reconfigurability

2009 IEEE International Symposium on Parallel & Distributed Processing, 2009

... des Arts, 69621 Villeurbanne Cedex, France {riadh.ben-abdallah, tanguy.risset, antoine.frabou... more ... des Arts, 69621 Villeurbanne Cedex, France {riadh.ben-abdallah, tanguy.risset, antoine.fraboulet}@insa-lyon.fr ... 2005. IEEE Computer Society. [6] J. Eker, JW Janneck, EA Lee, J. Liu, X. Liu, J. Lud-vig, S. Neuendorffer, S. Sachs, and Y. Xiong. ...