Mladen Berekovic | Technische Universität Braunschweig (original) (raw)

Papers by Mladen Berekovic

Research paper thumbnail of Session details: Microarchitectural techniques for reliability

Design, Automation, and Test in Europe, Mar 18, 2013

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Session details: Microarchitecture

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Proceedings of the 22nd International Conference on Architecture of Computing Systems

Bookmarks Related papers MentionsView impact

Research paper thumbnail of MemOpt: Automated Memory Distribution for Multicore Microcontrollers with Hard Real-Time Requirements

Modern multicore microcontrollers for systems with hard real-time requirements increasingly use a... more Modern multicore microcontrollers for systems with hard real-time requirements increasingly use a complex memory hierarchy to improve available performance and distribute competing accesses to separate memories. With the additional computing power available, the complexity of the implemented software has also increased. These two factors make an optimized distribution of the existing software to the available memory an increasing problem for the integration into such systems. Therefore, this article presents an algorithm that calculates an optimized memory distribution based on the microcontroller used, a recording of the code execution on the target hardware, a system description and automatically generates the corresponding linker script.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Effects of Concurrent Access to Embedded Multicore Microcontrollers with Hard Real-Time Demands

Due to increased demand on the performance of real-time control units, multicore microcontrollers... more Due to increased demand on the performance of real-time control units, multicore microcontrollers are increasingly being used in this sector. The manufacturers of such microcontrollers have developed extensive model families, which have to be compatible with each other due to the same pin assignment and package form. This policy is intended to enable ECU developers to select a next larger type with more processor cores in the event of performance bottlenecks. It should be noted though that the additional processor cores increases the problem of competing access and thus also the Worst-Case Execution Time. This article identifies the impact of competing access on the performance of individual processor cores. For this purpose, three derivatives of the Infineon AURIX microcontroller family are compared and potential bottlenecks identified. Finally, recommendations are drawn up on the basis of these results in order to avoid such problems.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Echtzeitfähige Ethernet-Kommunikation in automobilen Multicore-Systemen mit hierarchischem Speicherlayout

Springer eBooks, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Hardware-Beschleuniger für automobile Multicore-Mikrocontroller mit einer harten Echtzeitanforderung

Springer eBooks, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Session details: Performance estimation and runtime management of MPSoCs

Design, Automation, and Test in Europe, Mar 8, 2010

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Session details: Emerging machine learning applications and models

Design, Automation, and Test in Europe, Mar 9, 2020

Bookmarks Related papers MentionsView impact

Research paper thumbnail of IR-drop aware design & technology co-optimization for N5 node with different device and cell height options

International Conference on Computer Aided Design, Nov 13, 2017

In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables ... more In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables PDK generation and design implementation of sub-10nm technology nodes. The framework allows to study the impact of different technology options at design level and use effective design Power, Performance and Area (PPA) to decide on right technology option. Design implementation flow is IR-drop aware, allowing integration of optimized Power Delivery Network (PDN) for different device/cell options. Using N5-like technology node assumptions (contacted poly and metallization pitch of 42 and 32nm), we generate digital PDKs for different device (finFET, 2 & 3 nanowires) and standard cell options (3, 2 or 1 fins & 7.5 or 6-Tracks cell height). Different PDKs have been used to implement and characterize a wire dominated circuit. Our study shows that the design PDN/IR-drop awareness is fundamental to complete DTCO approach for sub-10nm nodes. Using our dedicated design methodology we reach the IR-drop target of 2.5% VDD (on the lowest metal layers), while minimizing the area degradation induced by the PDN. Further, we demonstrate that such optimized PDN is mandatory to enable the 20% area gain when moving from 7.5 to 6-Tracks cell height. Finally, we show that the impact of different device options is in range of 15% Power, 2X Performance and 20% Area, further validating the need of a fully integrated DTCO.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Survey of Hardware Technologies for Mixed-Critical Integration Explored in the Project <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow></mrow><annotation encoding="application/x-tex"></annotation></semantics></math></span><span class="katex-html" aria-hidden="true"></span></span>EMC^2$$

Springer eBooks, 2017

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A scriptable, standards-compliant reporting and logging extension for SystemC

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Still Image Processing on Coarse-Grained Reconfigurable Array Architectures

Journal of Signal Processing Systems, Dec 11, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Real-time smart stereo camera based on FPGA-SoC

2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), 2017

Stereo image processing is one of the most demanding tasks in the field of 3D computer vision and... more Stereo image processing is one of the most demanding tasks in the field of 3D computer vision and robot vision requiring high-performance computing capabilities within embedded systems. Real-time constraints for autonomous vehicles such as humanoid robots, lead to hardware acceleration approaches for high resolution stereo imaging in human-like vision systems, where commonly FPGA device are employed to handle very high sensor data rates. This work presents a realtime smart stereo camera system implementation resembling the full stereo processing pipeline in a single FPGA device. We introduce the novel memory optimized stereo processing algorithm ”Sparse Retina Census Correlation” (SRCC) that embodies a combination of two well established window based stereo matching approaches. We have leveraged a Sum of Absolute Difference (SAD) of Sobel-filtered images and a Sum of Hamming Distance (SHD) using a modified Retina based Census Transform for increased robustness to lighting variations and for high accuracy. A color rectification module has been implemented to cope with the high frame rate of the stereo pipelining calculating image transformations and rectified pixel coordinates in real-time using parameters for camera intrinsic, image rotation, image distortion and image projection. In addition multiple post-processing algorithms like texture filtering, uniqueness filtering, speckle removal and disparity to depth conversion have been implemented to further enhance the output results. The presented smart camera solution has demonstrated real-time stereo processing of 1280×720 pixel depth images with 256 disparities on a Zynq XC7Z030 FPGA device at 60fps. Due to the universal USB3.0 UVC interface and the onboard depth calculation it is a replacement for RGBD 3D-Sensors with improved image quality and outdoor performance. The camera can easily be used in conjunction with ROS-enabled robots and in automotive or industrial applications.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Acharyya, M., see Yang, J.-H., T-CSVT Dec 02 1117-1127 Ahalt, S., see Dapena, A., T-CSVT Feb 02 114-121 Ahmad, MO, see Jinwen Zan, T-CSVT Sep 02 793-802 Al-Shaykh, O., S. Chen, and M. Mattavelli. Introduction to the special issue on multimedia implementation; T-CSVT Aug 02 629-632

Acharyya, M., see Yang, J.-H., T-CSVT Dec 02 1117-1127 Ahalt, S., see Dapena, A., T-CSVT Feb 02 114-121 Ahmad, MO, see Jinwen Zan, T-CSVT Sep 02 793-802 Al-Shaykh, O., S. Chen, and M. Mattavelli. Introduction to the special issue on multimedia implementation; T-CSVT Aug 02 629-632

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Proceedings of the 24th international conference on Architecture of computing systems

Bookmarks Related papers MentionsView impact

Research paper thumbnail of <title>Multimedia RISC core for efficient bitstream parsing and VLD</title>

Proceedings of SPIE, Mar 26, 1998

ABSTRACT Demand for highly flexible and fast implementations for bitstream parsing and variable-l... more ABSTRACT Demand for highly flexible and fast implementations for bitstream parsing and variable-length-decoding (VLD) arises, if applications are targeted that shall support either MPEG- 4 or multiple standards like MPEG-2, H.263 or Dolby AC3. The paper shows that especially today&#39;s multimedia oriented RISC processors incorporating multiple parallel arithmetic units are slowed down by these kind of bit-level operations. Therefore, a new architecture is proposed, that adds function specific blocks into the data path of a RISC processor, that are highly adapted to the processing of variable-length coded bitstream data. The increased functional complexity of basic instructions results in a significant speedup over software implementations on standard RISC processors. Two typical functions, that are frequently used in bitstream parsing, ShowBits and GetBits, are executed in a single clock-cycle with a 64 bit rotator circuit. Constant input-rate VLD of one, two or four bits per clock-cycle can be implemented using internal RAM. Look- up-tables can be used for word-parallel decoding and VLC. Optionally memory entries can be saved using content addressable memories in addition to a data RAM. The proposed architecture has been implemented as a functional extension to an existing RISC core with additional 9k gates of logic, 8k RAM and an interface to a CAM. Synthesis result show an estimate of 160 MHz achievable clock frequency using a 0.35 (mu) technology. The resulting performance is sufficient for MPEG-2 HDTV or MPEG-4 applications.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Session details: Emerging solutions to manage energy/performance trade-offs along the memory hierarchy

Design, Automation, and Test in Europe, Mar 18, 2013

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Energy efficient cooperative spectrum sensing in Cognitive Radio Sensor Network Using FPGA: A survey

Cognitive Radio Sensor Network (CRSN) is a network of deployed wireless sensor nodes integrated w... more Cognitive Radio Sensor Network (CRSN) is a network of deployed wireless sensor nodes integrated with Cognitive Radio (CR) capability. It is a most promising technology to resolve spectrum scarcity resources, coexistence with another network in ISM band, and prolonging the lifetime in Wireless Sensor Networks (WSN). One of the major challenges in CRSN is the energy consumption due to the inherited limited energy from its traditional WSN. Cooperative Spectrum Sensing (CSS) is utilized used to improve the sensing performance in multipath fading, shadowing and receiver uncertainty. In this paper, we present the basic difference between the conventional WSN and the CRSN, a comprehensive overview of non-cooperative spectrum sensing methods and state-of-the-art research of EE in CRSN. Furthermore, we introduce the most commonly utilized platforms and the appropriate tools in CR field.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of IR-drop aware Design & technology co-optimization for N5 node with different device and cell height options

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov 1, 2017

In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables ... more In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables PDK generation and design implementation of sub-10nm technology nodes. The framework allows to study the impact of different technology options at design level and use effective design Power, Performance and Area (PPA) to decide on right technology option. Design implementation flow is IR-drop aware, allowing integration of optimized Power Delivery Network (PDN) for different device/cell options. Using N5-like technology node assumptions (contacted poly and metallization pitch of 42 and 32nm), we generate digital PDKs for different device (finFET, 2 & 3 nanowires) and standard cell options (3, 2 or 1 fins & 7.5 or 6-Tracks cell height). Different PDKs have been used to implement and characterize a wire dominated circuit. Our study shows that the design PDN/IR-drop awareness is fundamental to complete DTCO approach for sub-10nm nodes. Using our dedicated design methodology we reach the IR-drop target of 2.5% VDD (on the lowest metal layers), while minimizing the area degradation induced by the PDN. Further, we demonstrate that such optimized PDN is mandatory to enable the 20% area gain when moving from 7.5 to 6-Tracks cell height. Finally, we show that the impact of different device options is in range of 15% Power, 2X Performance and 20% Area, further validating the need of a fully integrated DTCO.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Session details: Microarchitectural techniques for reliability

Design, Automation, and Test in Europe, Mar 18, 2013

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Session details: Microarchitecture

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Proceedings of the 22nd International Conference on Architecture of Computing Systems

Bookmarks Related papers MentionsView impact

Research paper thumbnail of MemOpt: Automated Memory Distribution for Multicore Microcontrollers with Hard Real-Time Requirements

Modern multicore microcontrollers for systems with hard real-time requirements increasingly use a... more Modern multicore microcontrollers for systems with hard real-time requirements increasingly use a complex memory hierarchy to improve available performance and distribute competing accesses to separate memories. With the additional computing power available, the complexity of the implemented software has also increased. These two factors make an optimized distribution of the existing software to the available memory an increasing problem for the integration into such systems. Therefore, this article presents an algorithm that calculates an optimized memory distribution based on the microcontroller used, a recording of the code execution on the target hardware, a system description and automatically generates the corresponding linker script.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Effects of Concurrent Access to Embedded Multicore Microcontrollers with Hard Real-Time Demands

Due to increased demand on the performance of real-time control units, multicore microcontrollers... more Due to increased demand on the performance of real-time control units, multicore microcontrollers are increasingly being used in this sector. The manufacturers of such microcontrollers have developed extensive model families, which have to be compatible with each other due to the same pin assignment and package form. This policy is intended to enable ECU developers to select a next larger type with more processor cores in the event of performance bottlenecks. It should be noted though that the additional processor cores increases the problem of competing access and thus also the Worst-Case Execution Time. This article identifies the impact of competing access on the performance of individual processor cores. For this purpose, three derivatives of the Infineon AURIX microcontroller family are compared and potential bottlenecks identified. Finally, recommendations are drawn up on the basis of these results in order to avoid such problems.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Echtzeitfähige Ethernet-Kommunikation in automobilen Multicore-Systemen mit hierarchischem Speicherlayout

Springer eBooks, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Hardware-Beschleuniger für automobile Multicore-Mikrocontroller mit einer harten Echtzeitanforderung

Springer eBooks, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Session details: Performance estimation and runtime management of MPSoCs

Design, Automation, and Test in Europe, Mar 8, 2010

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Session details: Emerging machine learning applications and models

Design, Automation, and Test in Europe, Mar 9, 2020

Bookmarks Related papers MentionsView impact

Research paper thumbnail of IR-drop aware design & technology co-optimization for N5 node with different device and cell height options

International Conference on Computer Aided Design, Nov 13, 2017

In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables ... more In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables PDK generation and design implementation of sub-10nm technology nodes. The framework allows to study the impact of different technology options at design level and use effective design Power, Performance and Area (PPA) to decide on right technology option. Design implementation flow is IR-drop aware, allowing integration of optimized Power Delivery Network (PDN) for different device/cell options. Using N5-like technology node assumptions (contacted poly and metallization pitch of 42 and 32nm), we generate digital PDKs for different device (finFET, 2 & 3 nanowires) and standard cell options (3, 2 or 1 fins & 7.5 or 6-Tracks cell height). Different PDKs have been used to implement and characterize a wire dominated circuit. Our study shows that the design PDN/IR-drop awareness is fundamental to complete DTCO approach for sub-10nm nodes. Using our dedicated design methodology we reach the IR-drop target of 2.5% VDD (on the lowest metal layers), while minimizing the area degradation induced by the PDN. Further, we demonstrate that such optimized PDN is mandatory to enable the 20% area gain when moving from 7.5 to 6-Tracks cell height. Finally, we show that the impact of different device options is in range of 15% Power, 2X Performance and 20% Area, further validating the need of a fully integrated DTCO.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Survey of Hardware Technologies for Mixed-Critical Integration Explored in the Project <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow></mrow><annotation encoding="application/x-tex"></annotation></semantics></math></span><span class="katex-html" aria-hidden="true"></span></span>EMC^2$$

Springer eBooks, 2017

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A scriptable, standards-compliant reporting and logging extension for SystemC

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Still Image Processing on Coarse-Grained Reconfigurable Array Architectures

Journal of Signal Processing Systems, Dec 11, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Real-time smart stereo camera based on FPGA-SoC

2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), 2017

Stereo image processing is one of the most demanding tasks in the field of 3D computer vision and... more Stereo image processing is one of the most demanding tasks in the field of 3D computer vision and robot vision requiring high-performance computing capabilities within embedded systems. Real-time constraints for autonomous vehicles such as humanoid robots, lead to hardware acceleration approaches for high resolution stereo imaging in human-like vision systems, where commonly FPGA device are employed to handle very high sensor data rates. This work presents a realtime smart stereo camera system implementation resembling the full stereo processing pipeline in a single FPGA device. We introduce the novel memory optimized stereo processing algorithm ”Sparse Retina Census Correlation” (SRCC) that embodies a combination of two well established window based stereo matching approaches. We have leveraged a Sum of Absolute Difference (SAD) of Sobel-filtered images and a Sum of Hamming Distance (SHD) using a modified Retina based Census Transform for increased robustness to lighting variations and for high accuracy. A color rectification module has been implemented to cope with the high frame rate of the stereo pipelining calculating image transformations and rectified pixel coordinates in real-time using parameters for camera intrinsic, image rotation, image distortion and image projection. In addition multiple post-processing algorithms like texture filtering, uniqueness filtering, speckle removal and disparity to depth conversion have been implemented to further enhance the output results. The presented smart camera solution has demonstrated real-time stereo processing of 1280×720 pixel depth images with 256 disparities on a Zynq XC7Z030 FPGA device at 60fps. Due to the universal USB3.0 UVC interface and the onboard depth calculation it is a replacement for RGBD 3D-Sensors with improved image quality and outdoor performance. The camera can easily be used in conjunction with ROS-enabled robots and in automotive or industrial applications.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Acharyya, M., see Yang, J.-H., T-CSVT Dec 02 1117-1127 Ahalt, S., see Dapena, A., T-CSVT Feb 02 114-121 Ahmad, MO, see Jinwen Zan, T-CSVT Sep 02 793-802 Al-Shaykh, O., S. Chen, and M. Mattavelli. Introduction to the special issue on multimedia implementation; T-CSVT Aug 02 629-632

Acharyya, M., see Yang, J.-H., T-CSVT Dec 02 1117-1127 Ahalt, S., see Dapena, A., T-CSVT Feb 02 114-121 Ahmad, MO, see Jinwen Zan, T-CSVT Sep 02 793-802 Al-Shaykh, O., S. Chen, and M. Mattavelli. Introduction to the special issue on multimedia implementation; T-CSVT Aug 02 629-632

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Proceedings of the 24th international conference on Architecture of computing systems

Bookmarks Related papers MentionsView impact

Research paper thumbnail of <title>Multimedia RISC core for efficient bitstream parsing and VLD</title>

Proceedings of SPIE, Mar 26, 1998

ABSTRACT Demand for highly flexible and fast implementations for bitstream parsing and variable-l... more ABSTRACT Demand for highly flexible and fast implementations for bitstream parsing and variable-length-decoding (VLD) arises, if applications are targeted that shall support either MPEG- 4 or multiple standards like MPEG-2, H.263 or Dolby AC3. The paper shows that especially today&#39;s multimedia oriented RISC processors incorporating multiple parallel arithmetic units are slowed down by these kind of bit-level operations. Therefore, a new architecture is proposed, that adds function specific blocks into the data path of a RISC processor, that are highly adapted to the processing of variable-length coded bitstream data. The increased functional complexity of basic instructions results in a significant speedup over software implementations on standard RISC processors. Two typical functions, that are frequently used in bitstream parsing, ShowBits and GetBits, are executed in a single clock-cycle with a 64 bit rotator circuit. Constant input-rate VLD of one, two or four bits per clock-cycle can be implemented using internal RAM. Look- up-tables can be used for word-parallel decoding and VLC. Optionally memory entries can be saved using content addressable memories in addition to a data RAM. The proposed architecture has been implemented as a functional extension to an existing RISC core with additional 9k gates of logic, 8k RAM and an interface to a CAM. Synthesis result show an estimate of 160 MHz achievable clock frequency using a 0.35 (mu) technology. The resulting performance is sufficient for MPEG-2 HDTV or MPEG-4 applications.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Session details: Emerging solutions to manage energy/performance trade-offs along the memory hierarchy

Design, Automation, and Test in Europe, Mar 18, 2013

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Energy efficient cooperative spectrum sensing in Cognitive Radio Sensor Network Using FPGA: A survey

Cognitive Radio Sensor Network (CRSN) is a network of deployed wireless sensor nodes integrated w... more Cognitive Radio Sensor Network (CRSN) is a network of deployed wireless sensor nodes integrated with Cognitive Radio (CR) capability. It is a most promising technology to resolve spectrum scarcity resources, coexistence with another network in ISM band, and prolonging the lifetime in Wireless Sensor Networks (WSN). One of the major challenges in CRSN is the energy consumption due to the inherited limited energy from its traditional WSN. Cooperative Spectrum Sensing (CSS) is utilized used to improve the sensing performance in multipath fading, shadowing and receiver uncertainty. In this paper, we present the basic difference between the conventional WSN and the CRSN, a comprehensive overview of non-cooperative spectrum sensing methods and state-of-the-art research of EE in CRSN. Furthermore, we introduce the most commonly utilized platforms and the appropriate tools in CR field.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of IR-drop aware Design & technology co-optimization for N5 node with different device and cell height options

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov 1, 2017

In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables ... more In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables PDK generation and design implementation of sub-10nm technology nodes. The framework allows to study the impact of different technology options at design level and use effective design Power, Performance and Area (PPA) to decide on right technology option. Design implementation flow is IR-drop aware, allowing integration of optimized Power Delivery Network (PDN) for different device/cell options. Using N5-like technology node assumptions (contacted poly and metallization pitch of 42 and 32nm), we generate digital PDKs for different device (finFET, 2 & 3 nanowires) and standard cell options (3, 2 or 1 fins & 7.5 or 6-Tracks cell height). Different PDKs have been used to implement and characterize a wire dominated circuit. Our study shows that the design PDN/IR-drop awareness is fundamental to complete DTCO approach for sub-10nm nodes. Using our dedicated design methodology we reach the IR-drop target of 2.5% VDD (on the lowest metal layers), while minimizing the area degradation induced by the PDN. Further, we demonstrate that such optimized PDN is mandatory to enable the 20% area gain when moving from 7.5 to 6-Tracks cell height. Finally, we show that the impact of different device options is in range of 15% Power, 2X Performance and 20% Area, further validating the need of a fully integrated DTCO.

Bookmarks Related papers MentionsView impact