Mladen Berekovic | Technische Universität Braunschweig (original) (raw)
Papers by Mladen Berekovic
Design, Automation, and Test in Europe, Mar 18, 2013
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Modern multicore microcontrollers for systems with hard real-time requirements increasingly use a... more Modern multicore microcontrollers for systems with hard real-time requirements increasingly use a complex memory hierarchy to improve available performance and distribute competing accesses to separate memories. With the additional computing power available, the complexity of the implemented software has also increased. These two factors make an optimized distribution of the existing software to the available memory an increasing problem for the integration into such systems. Therefore, this article presents an algorithm that calculates an optimized memory distribution based on the microcontroller used, a recording of the code execution on the target hardware, a system description and automatically generates the corresponding linker script.
Bookmarks Related papers MentionsView impact
Due to increased demand on the performance of real-time control units, multicore microcontrollers... more Due to increased demand on the performance of real-time control units, multicore microcontrollers are increasingly being used in this sector. The manufacturers of such microcontrollers have developed extensive model families, which have to be compatible with each other due to the same pin assignment and package form. This policy is intended to enable ECU developers to select a next larger type with more processor cores in the event of performance bottlenecks. It should be noted though that the additional processor cores increases the problem of competing access and thus also the Worst-Case Execution Time. This article identifies the impact of competing access on the performance of individual processor cores. For this purpose, three derivatives of the Infineon AURIX microcontroller family are compared and potential bottlenecks identified. Finally, recommendations are drawn up on the basis of these results in order to avoid such problems.
Bookmarks Related papers MentionsView impact
Springer eBooks, 2022
Bookmarks Related papers MentionsView impact
Springer eBooks, 2022
Bookmarks Related papers MentionsView impact
Design, Automation, and Test in Europe, Mar 8, 2010
Bookmarks Related papers MentionsView impact
Design, Automation, and Test in Europe, Mar 9, 2020
Bookmarks Related papers MentionsView impact
International Conference on Computer Aided Design, Nov 13, 2017
In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables ... more In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables PDK generation and design implementation of sub-10nm technology nodes. The framework allows to study the impact of different technology options at design level and use effective design Power, Performance and Area (PPA) to decide on right technology option. Design implementation flow is IR-drop aware, allowing integration of optimized Power Delivery Network (PDN) for different device/cell options. Using N5-like technology node assumptions (contacted poly and metallization pitch of 42 and 32nm), we generate digital PDKs for different device (finFET, 2 & 3 nanowires) and standard cell options (3, 2 or 1 fins & 7.5 or 6-Tracks cell height). Different PDKs have been used to implement and characterize a wire dominated circuit. Our study shows that the design PDN/IR-drop awareness is fundamental to complete DTCO approach for sub-10nm nodes. Using our dedicated design methodology we reach the IR-drop target of 2.5% VDD (on the lowest metal layers), while minimizing the area degradation induced by the PDN. Further, we demonstrate that such optimized PDN is mandatory to enable the 20% area gain when moving from 7.5 to 6-Tracks cell height. Finally, we show that the impact of different device options is in range of 15% Power, 2X Performance and 20% Area, further validating the need of a fully integrated DTCO.
Bookmarks Related papers MentionsView impact
Springer eBooks, 2017
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Journal of Signal Processing Systems, Dec 11, 2008
Bookmarks Related papers MentionsView impact
2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), 2017
Stereo image processing is one of the most demanding tasks in the field of 3D computer vision and... more Stereo image processing is one of the most demanding tasks in the field of 3D computer vision and robot vision requiring high-performance computing capabilities within embedded systems. Real-time constraints for autonomous vehicles such as humanoid robots, lead to hardware acceleration approaches for high resolution stereo imaging in human-like vision systems, where commonly FPGA device are employed to handle very high sensor data rates. This work presents a realtime smart stereo camera system implementation resembling the full stereo processing pipeline in a single FPGA device. We introduce the novel memory optimized stereo processing algorithm ”Sparse Retina Census Correlation” (SRCC) that embodies a combination of two well established window based stereo matching approaches. We have leveraged a Sum of Absolute Difference (SAD) of Sobel-filtered images and a Sum of Hamming Distance (SHD) using a modified Retina based Census Transform for increased robustness to lighting variations and for high accuracy. A color rectification module has been implemented to cope with the high frame rate of the stereo pipelining calculating image transformations and rectified pixel coordinates in real-time using parameters for camera intrinsic, image rotation, image distortion and image projection. In addition multiple post-processing algorithms like texture filtering, uniqueness filtering, speckle removal and disparity to depth conversion have been implemented to further enhance the output results. The presented smart camera solution has demonstrated real-time stereo processing of 1280×720 pixel depth images with 256 disparities on a Zynq XC7Z030 FPGA device at 60fps. Due to the universal USB3.0 UVC interface and the onboard depth calculation it is a replacement for RGBD 3D-Sensors with improved image quality and outdoor performance. The camera can easily be used in conjunction with ROS-enabled robots and in automotive or industrial applications.
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Proceedings of SPIE, Mar 26, 1998
ABSTRACT Demand for highly flexible and fast implementations for bitstream parsing and variable-l... more ABSTRACT Demand for highly flexible and fast implementations for bitstream parsing and variable-length-decoding (VLD) arises, if applications are targeted that shall support either MPEG- 4 or multiple standards like MPEG-2, H.263 or Dolby AC3. The paper shows that especially today's multimedia oriented RISC processors incorporating multiple parallel arithmetic units are slowed down by these kind of bit-level operations. Therefore, a new architecture is proposed, that adds function specific blocks into the data path of a RISC processor, that are highly adapted to the processing of variable-length coded bitstream data. The increased functional complexity of basic instructions results in a significant speedup over software implementations on standard RISC processors. Two typical functions, that are frequently used in bitstream parsing, ShowBits and GetBits, are executed in a single clock-cycle with a 64 bit rotator circuit. Constant input-rate VLD of one, two or four bits per clock-cycle can be implemented using internal RAM. Look- up-tables can be used for word-parallel decoding and VLC. Optionally memory entries can be saved using content addressable memories in addition to a data RAM. The proposed architecture has been implemented as a functional extension to an existing RISC core with additional 9k gates of logic, 8k RAM and an interface to a CAM. Synthesis result show an estimate of 160 MHz achievable clock frequency using a 0.35 (mu) technology. The resulting performance is sufficient for MPEG-2 HDTV or MPEG-4 applications.
Bookmarks Related papers MentionsView impact
Design, Automation, and Test in Europe, Mar 18, 2013
Bookmarks Related papers MentionsView impact
Cognitive Radio Sensor Network (CRSN) is a network of deployed wireless sensor nodes integrated w... more Cognitive Radio Sensor Network (CRSN) is a network of deployed wireless sensor nodes integrated with Cognitive Radio (CR) capability. It is a most promising technology to resolve spectrum scarcity resources, coexistence with another network in ISM band, and prolonging the lifetime in Wireless Sensor Networks (WSN). One of the major challenges in CRSN is the energy consumption due to the inherited limited energy from its traditional WSN. Cooperative Spectrum Sensing (CSS) is utilized used to improve the sensing performance in multipath fading, shadowing and receiver uncertainty. In this paper, we present the basic difference between the conventional WSN and the CRSN, a comprehensive overview of non-cooperative spectrum sensing methods and state-of-the-art research of EE in CRSN. Furthermore, we introduce the most commonly utilized platforms and the appropriate tools in CR field.
Bookmarks Related papers MentionsView impact
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov 1, 2017
In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables ... more In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables PDK generation and design implementation of sub-10nm technology nodes. The framework allows to study the impact of different technology options at design level and use effective design Power, Performance and Area (PPA) to decide on right technology option. Design implementation flow is IR-drop aware, allowing integration of optimized Power Delivery Network (PDN) for different device/cell options. Using N5-like technology node assumptions (contacted poly and metallization pitch of 42 and 32nm), we generate digital PDKs for different device (finFET, 2 & 3 nanowires) and standard cell options (3, 2 or 1 fins & 7.5 or 6-Tracks cell height). Different PDKs have been used to implement and characterize a wire dominated circuit. Our study shows that the design PDN/IR-drop awareness is fundamental to complete DTCO approach for sub-10nm nodes. Using our dedicated design methodology we reach the IR-drop target of 2.5% VDD (on the lowest metal layers), while minimizing the area degradation induced by the PDN. Further, we demonstrate that such optimized PDN is mandatory to enable the 20% area gain when moving from 7.5 to 6-Tracks cell height. Finally, we show that the impact of different device options is in range of 15% Power, 2X Performance and 20% Area, further validating the need of a fully integrated DTCO.
Bookmarks Related papers MentionsView impact
Design, Automation, and Test in Europe, Mar 18, 2013
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Modern multicore microcontrollers for systems with hard real-time requirements increasingly use a... more Modern multicore microcontrollers for systems with hard real-time requirements increasingly use a complex memory hierarchy to improve available performance and distribute competing accesses to separate memories. With the additional computing power available, the complexity of the implemented software has also increased. These two factors make an optimized distribution of the existing software to the available memory an increasing problem for the integration into such systems. Therefore, this article presents an algorithm that calculates an optimized memory distribution based on the microcontroller used, a recording of the code execution on the target hardware, a system description and automatically generates the corresponding linker script.
Bookmarks Related papers MentionsView impact
Due to increased demand on the performance of real-time control units, multicore microcontrollers... more Due to increased demand on the performance of real-time control units, multicore microcontrollers are increasingly being used in this sector. The manufacturers of such microcontrollers have developed extensive model families, which have to be compatible with each other due to the same pin assignment and package form. This policy is intended to enable ECU developers to select a next larger type with more processor cores in the event of performance bottlenecks. It should be noted though that the additional processor cores increases the problem of competing access and thus also the Worst-Case Execution Time. This article identifies the impact of competing access on the performance of individual processor cores. For this purpose, three derivatives of the Infineon AURIX microcontroller family are compared and potential bottlenecks identified. Finally, recommendations are drawn up on the basis of these results in order to avoid such problems.
Bookmarks Related papers MentionsView impact
Springer eBooks, 2022
Bookmarks Related papers MentionsView impact
Springer eBooks, 2022
Bookmarks Related papers MentionsView impact
Design, Automation, and Test in Europe, Mar 8, 2010
Bookmarks Related papers MentionsView impact
Design, Automation, and Test in Europe, Mar 9, 2020
Bookmarks Related papers MentionsView impact
International Conference on Computer Aided Design, Nov 13, 2017
In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables ... more In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables PDK generation and design implementation of sub-10nm technology nodes. The framework allows to study the impact of different technology options at design level and use effective design Power, Performance and Area (PPA) to decide on right technology option. Design implementation flow is IR-drop aware, allowing integration of optimized Power Delivery Network (PDN) for different device/cell options. Using N5-like technology node assumptions (contacted poly and metallization pitch of 42 and 32nm), we generate digital PDKs for different device (finFET, 2 & 3 nanowires) and standard cell options (3, 2 or 1 fins & 7.5 or 6-Tracks cell height). Different PDKs have been used to implement and characterize a wire dominated circuit. Our study shows that the design PDN/IR-drop awareness is fundamental to complete DTCO approach for sub-10nm nodes. Using our dedicated design methodology we reach the IR-drop target of 2.5% VDD (on the lowest metal layers), while minimizing the area degradation induced by the PDN. Further, we demonstrate that such optimized PDN is mandatory to enable the 20% area gain when moving from 7.5 to 6-Tracks cell height. Finally, we show that the impact of different device options is in range of 15% Power, 2X Performance and 20% Area, further validating the need of a fully integrated DTCO.
Bookmarks Related papers MentionsView impact
Springer eBooks, 2017
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Journal of Signal Processing Systems, Dec 11, 2008
Bookmarks Related papers MentionsView impact
2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), 2017
Stereo image processing is one of the most demanding tasks in the field of 3D computer vision and... more Stereo image processing is one of the most demanding tasks in the field of 3D computer vision and robot vision requiring high-performance computing capabilities within embedded systems. Real-time constraints for autonomous vehicles such as humanoid robots, lead to hardware acceleration approaches for high resolution stereo imaging in human-like vision systems, where commonly FPGA device are employed to handle very high sensor data rates. This work presents a realtime smart stereo camera system implementation resembling the full stereo processing pipeline in a single FPGA device. We introduce the novel memory optimized stereo processing algorithm ”Sparse Retina Census Correlation” (SRCC) that embodies a combination of two well established window based stereo matching approaches. We have leveraged a Sum of Absolute Difference (SAD) of Sobel-filtered images and a Sum of Hamming Distance (SHD) using a modified Retina based Census Transform for increased robustness to lighting variations and for high accuracy. A color rectification module has been implemented to cope with the high frame rate of the stereo pipelining calculating image transformations and rectified pixel coordinates in real-time using parameters for camera intrinsic, image rotation, image distortion and image projection. In addition multiple post-processing algorithms like texture filtering, uniqueness filtering, speckle removal and disparity to depth conversion have been implemented to further enhance the output results. The presented smart camera solution has demonstrated real-time stereo processing of 1280×720 pixel depth images with 256 disparities on a Zynq XC7Z030 FPGA device at 60fps. Due to the universal USB3.0 UVC interface and the onboard depth calculation it is a replacement for RGBD 3D-Sensors with improved image quality and outdoor performance. The camera can easily be used in conjunction with ROS-enabled robots and in automotive or industrial applications.
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Proceedings of SPIE, Mar 26, 1998
ABSTRACT Demand for highly flexible and fast implementations for bitstream parsing and variable-l... more ABSTRACT Demand for highly flexible and fast implementations for bitstream parsing and variable-length-decoding (VLD) arises, if applications are targeted that shall support either MPEG- 4 or multiple standards like MPEG-2, H.263 or Dolby AC3. The paper shows that especially today's multimedia oriented RISC processors incorporating multiple parallel arithmetic units are slowed down by these kind of bit-level operations. Therefore, a new architecture is proposed, that adds function specific blocks into the data path of a RISC processor, that are highly adapted to the processing of variable-length coded bitstream data. The increased functional complexity of basic instructions results in a significant speedup over software implementations on standard RISC processors. Two typical functions, that are frequently used in bitstream parsing, ShowBits and GetBits, are executed in a single clock-cycle with a 64 bit rotator circuit. Constant input-rate VLD of one, two or four bits per clock-cycle can be implemented using internal RAM. Look- up-tables can be used for word-parallel decoding and VLC. Optionally memory entries can be saved using content addressable memories in addition to a data RAM. The proposed architecture has been implemented as a functional extension to an existing RISC core with additional 9k gates of logic, 8k RAM and an interface to a CAM. Synthesis result show an estimate of 160 MHz achievable clock frequency using a 0.35 (mu) technology. The resulting performance is sufficient for MPEG-2 HDTV or MPEG-4 applications.
Bookmarks Related papers MentionsView impact
Design, Automation, and Test in Europe, Mar 18, 2013
Bookmarks Related papers MentionsView impact
Cognitive Radio Sensor Network (CRSN) is a network of deployed wireless sensor nodes integrated w... more Cognitive Radio Sensor Network (CRSN) is a network of deployed wireless sensor nodes integrated with Cognitive Radio (CR) capability. It is a most promising technology to resolve spectrum scarcity resources, coexistence with another network in ISM band, and prolonging the lifetime in Wireless Sensor Networks (WSN). One of the major challenges in CRSN is the energy consumption due to the inherited limited energy from its traditional WSN. Cooperative Spectrum Sensing (CSS) is utilized used to improve the sensing performance in multipath fading, shadowing and receiver uncertainty. In this paper, we present the basic difference between the conventional WSN and the CRSN, a comprehensive overview of non-cooperative spectrum sensing methods and state-of-the-art research of EE in CRSN. Furthermore, we introduce the most commonly utilized platforms and the appropriate tools in CR field.
Bookmarks Related papers MentionsView impact
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov 1, 2017
In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables ... more In this paper we propose a novel Design-Technology Co-Optimization (DTCO) framework that enables PDK generation and design implementation of sub-10nm technology nodes. The framework allows to study the impact of different technology options at design level and use effective design Power, Performance and Area (PPA) to decide on right technology option. Design implementation flow is IR-drop aware, allowing integration of optimized Power Delivery Network (PDN) for different device/cell options. Using N5-like technology node assumptions (contacted poly and metallization pitch of 42 and 32nm), we generate digital PDKs for different device (finFET, 2 & 3 nanowires) and standard cell options (3, 2 or 1 fins & 7.5 or 6-Tracks cell height). Different PDKs have been used to implement and characterize a wire dominated circuit. Our study shows that the design PDN/IR-drop awareness is fundamental to complete DTCO approach for sub-10nm nodes. Using our dedicated design methodology we reach the IR-drop target of 2.5% VDD (on the lowest metal layers), while minimizing the area degradation induced by the PDN. Further, we demonstrate that such optimized PDN is mandatory to enable the 20% area gain when moving from 7.5 to 6-Tracks cell height. Finally, we show that the impact of different device options is in range of 15% Power, 2X Performance and 20% Area, further validating the need of a fully integrated DTCO.
Bookmarks Related papers MentionsView impact