Serge Vernalde | IMEC - Academia.edu (original) (raw)
Papers by Serge Vernalde
Lecture Notes in Computer Science, 2003
Hardware/software (hw/sw) partitioning largely affects the system cost, performance, and power co... more Hardware/software (hw/sw) partitioning largely affects the system cost, performance, and power consumption. Most of the previous hw/sw partitioning approaches are focused on either optimising the hw area, or the performance. Thus, they ignore the influence of the partitioning process on the energy consumption. However, during this process the designer still has the maximum flexibility, hence, it is clearly the best moment to analyse the energy consumption. We have developed a new hw/sw partitioning and scheduling tool that reduces the energy consumption of an embedded system while meeting high performance constraints. We have applied it to two current multimedia applications saving up to 30% of the system energy without reducing the performance.
Due to the limited processing power of current mobile devices and the complexity of 3D content, i... more Due to the limited processing power of current mobile devices and the complexity of 3D content, interactive 3D applications on mobile devices should ideally be boosted by low-cost 3D graphics hardware acceleration. Different from the traditional hardware acceleration ASICs, we've developed a run-time reconfigurable mobile platform which consists of an instruction set processor and a run-time reconfigurable FPGA, supporting 3D real-time rendering. As a poof of concept, we present the implementation of a Quake-alike 3D game on our prototype platform. We show that with hardware implementation, the game can be played at a high, interactive frame rate.
2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003
Due to the heterogeneous nature of networks and endsystems in distributed multimedia systems, mul... more Due to the heterogeneous nature of networks and endsystems in distributed multimedia systems, multimedia applications should ideally be designed to counteract fluctuations in network bandwidth and end-system processing capacities for providing end users a certain degree of Quality of Service (QoS). This requirement can be satisfied with scalable applications. In addition, with the current evolution in run-time reconfigurable computing, runtime reconfigurable multimedia platforms are becoming increasingly viable. In this paper, an end-to-end delivery chain framework for mapping scalable networked multimedia applications on reconfigurable platforms is presented. The framework is demonstrated by a case study of a 3D game running on a prototype run-time reconfigurable platform.
* Geert Deconinck is a postdoctoral fellow of the Fund for Scientific Research-Flanders-Belgium A... more * Geert Deconinck is a postdoctoral fellow of the Fund for Scientific Research-Flanders-Belgium ABSTRACT Due to the heterogeneous nature of networks and end- systems in distributed multimedia systems, multimedia applications should ideally be designed to counteract fluctuations in network bandwidth and end-system processing capacities for providing end users a certain degree of Quality of Service (QoS). This requirement can be
2003 Design, Automation and Test in Europe Conference and Exhibition, 2000
The ability to (re)schedule a task either in hardware or software will be an important asset in a... more The ability to (re)schedule a task either in hardware or software will be an important asset in a reconfigurable systems-on-chip. To support this feature we have developed an infrastructure that, combined with a suitable design environment permits the implementation and management of hardware/software relocatable tasks. This paper presents the general scope of our research, and details the communication scheme, the design environment and the hardware/software context switching issues. The infrastructure proved its feasibility by allowing us to design a relocatable video decoder. When implemented on an embedded platform, the decoder performs at 23 frames/s (320x240 pixels, 16 bits per pixel) in reconfigurable hardware and 6 frames/s in software.
Proceedings of the IEEE 2001 Custom Integrated Circuits Conference (Cat. No.01CH37169), 2000
ABSTRACT The presented platform-based object-oriented modeling concept for system design allowed ... more ABSTRACT The presented platform-based object-oriented modeling concept for system design allowed us to create a networked hardware reconfigurable camera in a 25 man-month schedule with concurrent development of application and target FPGA platform. The developed TCP/IP layer achieves throughput of 2 Mb/s/MHz and the complete application logic consumes 700 mW at 20 MHz
Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, 2000
Complex systems-on-chip present one of the most challenging design problems of today. To meet thi... more Complex systems-on-chip present one of the most challenging design problems of today. To meet this challenge, new design languages capable to model such heterogeneous, dynamic systems are needed. For implementation of such a language, the use of an object oriented C++ class library has proven to be a promising approach, since new classes dealing with design-and platform-specific problems can be added in a conceptual and seamlessly reusable way.
Ieee Transactions on Computer Aided Design of Integrated Circuits and Systems, Nov 1, 2006
The problem of an efficient hardware implementation of multiplications with one or more constants... more The problem of an efficient hardware implementation of multiplications with one or more constants is encountered in many different digital signal-processing areas, such as image processing or digital filter optimization. In a more general form, this is a problem of common subexpression elimination, and as such it also occurs in compiler optimization and many highlevel synthesis tasks. An efficient solution of this problem can yield significant improvements in important design parameters like implementation area or power consumption. In this paper, a new solution of the multiple constant multiplication problem based on the common subexpression elimination technique is presented. The performance of our method is demonstrated primarily on a finite-duration impulse response filter design. The idea is to implement a set of constant multiplications as a set of add-shift operations and to optimize these with respect to the common subexpressions afterwards. We show that the number of add/subtract operations can be reduced significantly this way. The applicability of the presented algorithm to the different highlevel synthesis tasks is also indicated. Benchmarks demonstrating the algorithm's efficiency are included as well.
... Title: Synthesis of High Throughput DSP ASICs Using Application Specific Datapaths. Authors: ... more ... Title: Synthesis of High Throughput DSP ASICs Using Application Specific Datapaths. Authors: Vernalde, Serge × Schaumont, Patrick Bolsens, Ivo De Man, Hugo Frehel, J. Issue Date: 1994. Host Document: DSP & Multimedia Technology vol:3 pages:13-21. ...
Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences, Dec 25, 2000
This paper presents a proposed scalable architecture to enable networked reconfiguration for the ... more This paper presents a proposed scalable architecture to enable networked reconfiguration for the next generation communication terminals. A new system architecture that supports networked reconfiguration is defined. It contains two new blocks -the virtual reconfigurable architecture (VRA) and the application specific resource manager (ASRM). VRA can be considered as a hardware virtual machine, and it separates the terminal architecture into application independent and application specific parts. ASRM is used to automatically manage FPGA resources similar to the way a conventional OS manages memory or CPU resources. By providing both a hardware and a software virtual machine, the networked reconfiguration users only need to develop a design description targeted on the virtual platform that exploits VRA and ASRM.
ABSTRACT Coarse-grained reconfigurable architectures have become increasingly important in recent... more ABSTRACT Coarse-grained reconfigurable architectures have become increasingly important in recent years. Automatic design or compiling tools are essential to their success. In this paper, we present a retargetable compiler for a family of coarse-grained reconfigurable architectures. Several key issues are addressed. Program analysis and transformation prepare dataflow for scheduling. Architecture abstraction generates an internal graph representation from a concrete architecture description. A modulo scheduling algorithm is key to exploit parallelism and achieve high performance. The experimental results show up to 28.7 instructions per cycle (IPC) over tested kernels.
Proceedings of the ninth international symposium on Hardware/software codesign - CODES '01, 2001
The implementation of embedded networked appliances requires a mix of processor cores and HW acce... more The implementation of embedded networked appliances requires a mix of processor cores and HW accelerators on a single chip. When designing such complex and heterogeneous SoCs, the HW / SW partitioning decision needs to be made prior to refining the system description. With OCAPI-xl, we developed a methodology in which the partitioning decision can be made anywhere in the design flow, even just prior to doing code-generation for both HW and SW. This is made possible thanks to a refinable, implementable, architecture independent system description. The OCAPI-xl model was used to develop a stand alone, networked camera, with onboard GIF engine and network layer.
Proceedings of the conference on Design, automation and test in Europe - DATE '99, 1999
Complex signal processing algorithms are specified in floating point precision. When their hardwa... more Complex signal processing algorithms are specified in floating point precision. When their hardware implementation requires fixed point precision, type refinement is needed. The paper presents a methodology and design environment for this quantization process.
Lecture Notes in Computer Science, 2003
Hardware/software (hw/sw) partitioning largely affects the system cost, performance, and power co... more Hardware/software (hw/sw) partitioning largely affects the system cost, performance, and power consumption. Most of the previous hw/sw partitioning approaches are focused on either optimising the hw area, or the performance. Thus, they ignore the influence of the partitioning process on the energy consumption. However, during this process the designer still has the maximum flexibility, hence, it is clearly the best moment to analyse the energy consumption. We have developed a new hw/sw partitioning and scheduling tool that reduces the energy consumption of an embedded system while meeting high performance constraints. We have applied it to two current multimedia applications saving up to 30% of the system energy without reducing the performance.
Due to the limited processing power of current mobile devices and the complexity of 3D content, i... more Due to the limited processing power of current mobile devices and the complexity of 3D content, interactive 3D applications on mobile devices should ideally be boosted by low-cost 3D graphics hardware acceleration. Different from the traditional hardware acceleration ASICs, we've developed a run-time reconfigurable mobile platform which consists of an instruction set processor and a run-time reconfigurable FPGA, supporting 3D real-time rendering. As a poof of concept, we present the implementation of a Quake-alike 3D game on our prototype platform. We show that with hardware implementation, the game can be played at a high, interactive frame rate.
2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003
Due to the heterogeneous nature of networks and endsystems in distributed multimedia systems, mul... more Due to the heterogeneous nature of networks and endsystems in distributed multimedia systems, multimedia applications should ideally be designed to counteract fluctuations in network bandwidth and end-system processing capacities for providing end users a certain degree of Quality of Service (QoS). This requirement can be satisfied with scalable applications. In addition, with the current evolution in run-time reconfigurable computing, runtime reconfigurable multimedia platforms are becoming increasingly viable. In this paper, an end-to-end delivery chain framework for mapping scalable networked multimedia applications on reconfigurable platforms is presented. The framework is demonstrated by a case study of a 3D game running on a prototype run-time reconfigurable platform.
* Geert Deconinck is a postdoctoral fellow of the Fund for Scientific Research-Flanders-Belgium A... more * Geert Deconinck is a postdoctoral fellow of the Fund for Scientific Research-Flanders-Belgium ABSTRACT Due to the heterogeneous nature of networks and end- systems in distributed multimedia systems, multimedia applications should ideally be designed to counteract fluctuations in network bandwidth and end-system processing capacities for providing end users a certain degree of Quality of Service (QoS). This requirement can be
2003 Design, Automation and Test in Europe Conference and Exhibition, 2000
The ability to (re)schedule a task either in hardware or software will be an important asset in a... more The ability to (re)schedule a task either in hardware or software will be an important asset in a reconfigurable systems-on-chip. To support this feature we have developed an infrastructure that, combined with a suitable design environment permits the implementation and management of hardware/software relocatable tasks. This paper presents the general scope of our research, and details the communication scheme, the design environment and the hardware/software context switching issues. The infrastructure proved its feasibility by allowing us to design a relocatable video decoder. When implemented on an embedded platform, the decoder performs at 23 frames/s (320x240 pixels, 16 bits per pixel) in reconfigurable hardware and 6 frames/s in software.
Proceedings of the IEEE 2001 Custom Integrated Circuits Conference (Cat. No.01CH37169), 2000
ABSTRACT The presented platform-based object-oriented modeling concept for system design allowed ... more ABSTRACT The presented platform-based object-oriented modeling concept for system design allowed us to create a networked hardware reconfigurable camera in a 25 man-month schedule with concurrent development of application and target FPGA platform. The developed TCP/IP layer achieves throughput of 2 Mb/s/MHz and the complete application logic consumes 700 mW at 20 MHz
Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, 2000
Complex systems-on-chip present one of the most challenging design problems of today. To meet thi... more Complex systems-on-chip present one of the most challenging design problems of today. To meet this challenge, new design languages capable to model such heterogeneous, dynamic systems are needed. For implementation of such a language, the use of an object oriented C++ class library has proven to be a promising approach, since new classes dealing with design-and platform-specific problems can be added in a conceptual and seamlessly reusable way.
Ieee Transactions on Computer Aided Design of Integrated Circuits and Systems, Nov 1, 2006
The problem of an efficient hardware implementation of multiplications with one or more constants... more The problem of an efficient hardware implementation of multiplications with one or more constants is encountered in many different digital signal-processing areas, such as image processing or digital filter optimization. In a more general form, this is a problem of common subexpression elimination, and as such it also occurs in compiler optimization and many highlevel synthesis tasks. An efficient solution of this problem can yield significant improvements in important design parameters like implementation area or power consumption. In this paper, a new solution of the multiple constant multiplication problem based on the common subexpression elimination technique is presented. The performance of our method is demonstrated primarily on a finite-duration impulse response filter design. The idea is to implement a set of constant multiplications as a set of add-shift operations and to optimize these with respect to the common subexpressions afterwards. We show that the number of add/subtract operations can be reduced significantly this way. The applicability of the presented algorithm to the different highlevel synthesis tasks is also indicated. Benchmarks demonstrating the algorithm's efficiency are included as well.
... Title: Synthesis of High Throughput DSP ASICs Using Application Specific Datapaths. Authors: ... more ... Title: Synthesis of High Throughput DSP ASICs Using Application Specific Datapaths. Authors: Vernalde, Serge × Schaumont, Patrick Bolsens, Ivo De Man, Hugo Frehel, J. Issue Date: 1994. Host Document: DSP & Multimedia Technology vol:3 pages:13-21. ...
Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences, Dec 25, 2000
This paper presents a proposed scalable architecture to enable networked reconfiguration for the ... more This paper presents a proposed scalable architecture to enable networked reconfiguration for the next generation communication terminals. A new system architecture that supports networked reconfiguration is defined. It contains two new blocks -the virtual reconfigurable architecture (VRA) and the application specific resource manager (ASRM). VRA can be considered as a hardware virtual machine, and it separates the terminal architecture into application independent and application specific parts. ASRM is used to automatically manage FPGA resources similar to the way a conventional OS manages memory or CPU resources. By providing both a hardware and a software virtual machine, the networked reconfiguration users only need to develop a design description targeted on the virtual platform that exploits VRA and ASRM.
ABSTRACT Coarse-grained reconfigurable architectures have become increasingly important in recent... more ABSTRACT Coarse-grained reconfigurable architectures have become increasingly important in recent years. Automatic design or compiling tools are essential to their success. In this paper, we present a retargetable compiler for a family of coarse-grained reconfigurable architectures. Several key issues are addressed. Program analysis and transformation prepare dataflow for scheduling. Architecture abstraction generates an internal graph representation from a concrete architecture description. A modulo scheduling algorithm is key to exploit parallelism and achieve high performance. The experimental results show up to 28.7 instructions per cycle (IPC) over tested kernels.
Proceedings of the ninth international symposium on Hardware/software codesign - CODES '01, 2001
The implementation of embedded networked appliances requires a mix of processor cores and HW acce... more The implementation of embedded networked appliances requires a mix of processor cores and HW accelerators on a single chip. When designing such complex and heterogeneous SoCs, the HW / SW partitioning decision needs to be made prior to refining the system description. With OCAPI-xl, we developed a methodology in which the partitioning decision can be made anywhere in the design flow, even just prior to doing code-generation for both HW and SW. This is made possible thanks to a refinable, implementable, architecture independent system description. The OCAPI-xl model was used to develop a stand alone, networked camera, with onboard GIF engine and network layer.
Proceedings of the conference on Design, automation and test in Europe - DATE '99, 1999
Complex signal processing algorithms are specified in floating point precision. When their hardwa... more Complex signal processing algorithms are specified in floating point precision. When their hardware implementation requires fixed point precision, type refinement is needed. The paper presents a methodology and design environment for this quantization process.