Exploring design space of parallel realizations: MPEG2 decoder case study (original) (raw)
Related papers
A YAPI system level optimized parallel model of a H.264/AVC video encoder
2009
H.264/AVC (Advanced Video Codec) is a new video coding standard developed by a joint effort of the ITU-TVCEG and ISO/IEC MPEG. This standard provides higher coding efficiency relative to former standards at the expense of higher computational requirements. Implementing the H.264 video encoder for an embedded System-on-Chip (SoC) is a big challenge. For an efficient implementation, we motivate the use of multiprocessor platforms for the execution of a parallel model of the encoder. In this paper, we propose a high-level independent target-architecture parallelization methodology for the development of an optimized parallel model of a H.264/AVC encoder. This methodology is used independently of the architectural issues of any target platform. It is based on the exploration of the task and data levels forms of parallelism simultaneously, and the use of the parallel Kahn Process Network (KPN) model of computation and the YAPI programming C++ runtime library. The encoding performances of the obtained parallel model have been evaluated by system-level simulations targeting multiple multiprocessors platforms.
2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003
Sequential Mpeg-4 solutions actually developed for single processors try to integrate the most functionalities as possible in an unique software, and are generally oversized compared with the actual service requirement. Moreover, they can hardly be projected onto multiprocessors targets, leading to an extra load of source code and calculations, but also to a sub-optimal use of the architecture parallelism. This paper introduces a distributed Mpeg-4 application, where the system part is hosted by a standard PC, and the video decoder is supported by a multi-DSPs board. In particular, we present our AVSynDEx methodology allowing both an incremental building, an easy update on the video decoder description, and a quasiautomatic implementation onto a multi-C6x platform. We also define a global scheduler managing the parallel execution of the video and system applications. DSP Board Bitsream capture Im format. displaying Image decoding Receive decoded Im Send bitsream Send decoded I Receive bitstream PC Global parallel scheduling Receiver Sync. Receiver Sender Wr data in DMA mem Rd data in DMA mem Wait Wait Sender Sync.
High Level Optimized Parallel Specification of a H . 264 / AVC Video Encoder
2011
H.264/AVC (Advanced Video Codec) is a new video coding standard developed by a joint effort of the ITU-TVCEG and ISO/IEC MPEG. This standard provides higher coding efficiency relative to former standards at the expense of higher computational requirements. Implementing the H.264 video encoder for an embedded Systemon-Chip (SoC) is thus a big challenge. For an efficient implementation, we motivate the use of multiprocessor platforms for the execution of a parallel model of the encoder. For this purpose, we proposed a high-level parallelization approach for the development of an optimized parallel model of a H.264/AVC encoder for embedded SoCs. This approach is used independently of the architectural issues of any target platform. It is based on the exploration of the task and data levels forms of parallelism simultaneously, and the use of the parallel Kahn process network (KPN) model of computation and the YAPI programming C++ runtime library. To demonstrate the effectiveness of the ...
Video Encoding Analysis for Parallel Execution on Reconfigurable Architectures
Performance improvement on heterogeneous reconfigurable architectures depends on application analysis for parallel execution. This paper describes a performance analysis methodology for video encoding applications to estimate the expected performance of parallel execution on reconfigurable architectures. We formulate the performance estimation of a video encoding application on a target architecture with an equation. The equation shows the overhead factors that hinder the speed-up of parallel execution.
An MPEG-2 decoder case study as a driver for a system level design methodology
Proceedings of the seventh international workshop on Hardware/software codesign - CODES '99, 1999
... The Thdr pro-cess is aware of the high level bitstream organization and distributes the retrieved sequence and picture properties to other processes. ... The parameterization would then allow us to do sensitivity analysis and some design space exploration for this architecture. ...
A Parallel Transaction-Level Model of H. 264 Video Decoder
2011
Abstract H. 264 video decoder is a computationally demanding application. In resource-limited embedded environment, it is desirable to exploit parallelism in order to implement a H. 264 decoder. After reviewing a list of technical details of H. 264 standard, we have discussed several possiblities of parallelization and developed a TLM model with parallel slice decoders. Extensive experiments are performed to demonstrate the benefit of the our model.
A High-Performance Architecture with a Macroblock-Level-Pipeline for MPEG-2 Coding
Real-Time Imaging, 1996
Recently, video codec algorithms have received a lot of attention in the area of multimedia applications for digital image storage and communications . Several standards have been examined by ITU-T, such as JPEG (Joint Photographic Experts Group) for still picture and MPEG (Moving Picture Experts Group) for motion video. In 1993, the draft of the MPEG-2 [3] at MP@ML was fixed after trials with several test models . The MPEG-2 standard was developed mainly to deal with HDTV. The high demands of this video coding standard require special architectural approaches adapted to these schemes [5, 6], since current generic purpose processors and DSPs do not provide sufficient performance for executing the algorithm in real time. Several architectural solutions have been proposed for MPEG-2 video coding. However, today's technology is not able to get the whole coder into a single VLSI . Anyway, the ITU-T H.262 standard gives freedom to the designer in most tasks of coder design, unlike the Test Models, and many attempts are nowadays oriented to decreasing the performance requirements for managing hardware complexity.
ViPar: High-Level Design Space Exploration for Parallel Video Processing Architectures
International Journal of Reconfigurable Computing
Embedded video applications are now involved in sophisticated transportation systems like autonomous vehicles and driver assistance systems. As silicon capacity increases, the design productivity gap grows up for the current available design tools. Hence, high-level synthesis (HLS) tools emerged in order to reduce that gap by shifting the design efforts to higher abstraction levels. In this paper, we present ViPar as a tool for exploring different video processing architectures at higher design level. First, we proposed a parametrizable parallel architectural model dedicated for video applications. Second, targeting this architectural model, we developed ViPar tool with two main features: (1) An empirical model was introduced to estimate the power consumption based on hardware utilization and operating frequency. In addition to that, we derived the equations for estimating the hardware utilization and execution time for each design point during the space exploration process. (2) By ...
Int'l J. of Communications, Network and System Sciences, 2011
Given the substantially increasing complexity of embedded systems, the use of relatively detailed clock cycle-accurate simulators for the design-space exploration is impractical in the early design stages. Raising the abstraction level is nowadays widely seen as a solution to bridge the gap between the increasing system complexity and the low design productivity. For this, several system-level design tools and methodologies have been introduced to efficiently explore the design space of heterogeneous signal processing systems. In this paper, we demonstrate the effectiveness and the flexibility of the Sesame/Artemis system-level modeling and simulation methodology for efficient performance evaluation and rapid architectural exploration of the increasing complexity heterogeneous embedded media systems. For this purpose, we have selected a system level design of a very high complexity media application; a H.264/AVC (Advanced Video Codec) video encoder. The encoding performances will be evaluated using system-level simulations targeting multiple heterogeneous multiprocessors platforms.
An MPEG-4 performance study for non-SIMD, general purpose architectures
2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003.
MPEG-4 is an important international standard with wide applicability. This paper focuses on MPEG-4's main profile, video, whose approach allows more efficiency in coding and more flexibility in managing heterogeneous media objects than previous MPEG standards. This study presents evidence to support the assertion that for non-SIMD architectures and computational models, most memory-system optimizations will have little effect on MPEG-4 performance. This paper makes two contributions. First, it serves as an independent confirmation that for current, general-purpose architectures, MPEG-4 video is computation bound (just like most other media processing applications). Second, our findings should prove useful to other researchers and practitioners considering how to (or how not to) optimize MPEG-4 performance.