H.264 Research Papers - Academia.edu (original) (raw)

Within an Internet of Multimedia Things, the risk of disclosing streamed video content, such as that arising from video surveillance, is of heightened concern. This leads to the encryption of that content. To reduce the overhead and the... more

Within an Internet of Multimedia Things, the risk of disclosing streamed video content, such as that arising from video surveillance, is of heightened concern. This leads to the encryption of that content. To reduce the overhead and the lack of flexibility arising from full encryption of the content, a good number of selective-encryption algorithms have been proposed in the last decade. Some of them have limitations, in terms of: significant delay due to computational cost, or excess memory utilization, or, despite being energy efficient, not providing a satisfactory level of confidentiality, due to their simplicity. To address such limitations, this paper presents a lightweight selective encryption scheme, in which encoder syntax elements are encrypted with the innovative EXPer (extended permutation with exclusive OR). The selected syntax elements are taken from the final stage of video encoding that is during the entropy coding stage. As a diagnostic tool, the Encryption Space Ratio measures encoding complexity of the video relative to the level of encryption so as to judge the success of the encryption process, according to entropy coder. A detailed comparative analysis of EXPer with other state-of-the-art encryption algorithms confirms that EXPer provides significant confidentiality with a small computational cost and a negligible encryption bitrate overhead. Thus, the results demonstrate that the proposed security scheme is a suitable choice for constrained devices in an Internet of Multimedia Things environment.

Quality of Service (QoS) and Quality of Experience (QoE) are considered key performance indicators of video and audio conference performance of video call streaming service. Since Skype operates and evolves around the scope of achieving... more

Quality of Service (QoS) and Quality of Experience (QoE) are considered key performance indicators of video and audio conference performance of video call streaming service. Since Skype operates and evolves around the scope of achieving best possible quality service across all available operative environments, it is important to investigate and examine how Skype has evolved around this cross platform scope, what kind of encoding improvements have been implemented through years and how efficient are all these changes, objectively and subjectively.

Video codecs have undergone dramatic improvements and increased in complexity over the years owing to various commercial products like mobiles and Tablet PCs. With the emergence of standards, such H.264 which has emerged as the de facto... more

Video codecs have undergone dramatic improvements and increased in complexity over the years owing to various commercial products like mobiles and Tablet PCs. With the emergence of standards, such H.264 which has emerged as the de facto standard for video, uniformity in the delivery of video is observed. With constraints of memory and transmission bandwidth, focus has been on the effective compression and decompression of video. Multicore architectures have increasingly becoming available on mobiles and Tablet PCs. As codecs have increased in complexity and become computationally intensive, it is all the more important to leverage such computation over multicore hardware architectures. OpenCL programming framework for programming multicore hardware architectures such as CPUs, GPUs and DSPs has grown to a high level of maturity. In this study an efficient H.264 video codec is developed using OpenCL for multicore architectures based on the x264 open source H.264 library. The x264 library is profiled using sample videos on a CPU and performance hotspots are identified for optimisation. These hotspots are optimized by means of encapsulation into the OpenCL kernel loops where 4 parallel threads are created by OpenMP. Further, compiler optimization flags and assembly instructions within the x264 library are used to improve memory efficiency and execution speed. Programs to identify and use the queried OpenCL CPU device and analyze the PCI bandwidth between the host and the device are developed. When launched over CPU and GPU platforms, with OpenCL API's and multi threading, improvements in time of execution and the number of systems calls made are observed. The hotspot of x264_pixel_satd_8*4 resulted in 1.2 seconds gain as compared with earlier non OpenCL based optimization on CPU and 0.4 seconds gain for a GPU. The degradation in performance on a GPU platform is due to the read and write latencies. However, along with the use of compiler optimization flags and invoking assembly instructions in the entire x264 library resulted in a 4.3X improvement on a CPU and a 4.2X on a GPU platform. It can be concluded that, along with multithreading with OpenCL, the traditional approach of compiler level optimization is important as it deals with the core improvement in the application considered.

Several military files and medical research reports do not tolerate any type of distortion and are at the same time secret to owners. In order to preserve source information and guarantee data security, lossless video compression with... more

Several military files and medical research reports do not tolerate any type of distortion and are at the same time secret to owners. In order to preserve source information and guarantee data security, lossless video compression with encryption seems to be the best alternative. In this work, cryptography was grafted with lossless compression to ensure a high image quality and secure the transmission of such data. Thus, the insertion of the encryption module in the lossless video compression chain appears to be better than the application of compression and encryption separately in terms of computing time. Our developed method reached about 30% gain of total bits and around 75% improvement of the compression ratio proving promising results.

Data broadcasting services are required to provide user interactivity through connecting additional contents such as object information to audio-visual contents. H.264/AVC-based metadata authoring tools include functions which identify... more

Data broadcasting services are required to provide user interactivity through connecting additional contents such as object information to audio-visual contents. H.264/AVC-based metadata authoring tools include functions which identify and track position and motion of objects. In this work, we propose a method for tracking the target object by using partially decoded texture data and motion vectors extracted directly from H.264/AVC bitstream. This method achieves low computational complexity and high performance through the dissimilarity energy minimization algorithm which tracks feature points adaptively according to these characteristics. The experiment has shown that the proposed method had high performance with fast processing time.

Presentation from my own course on HEVC/H.265

This paper describes an innovative, pipelined, cache-based architecture for a motion estimation coprocessor based on a predictive/recursive algorithm whose computational complexity is low and independent from the search window. The... more

This paper describes an innovative, pipelined, cache-based architecture for a motion estimation coprocessor based on a predictive/recursive algorithm whose computational complexity is low and independent from the search window. The algorithm and the associated architecture yields itself very well to low-power, low-cost video capture devices with low processing capabilities, such as mobile phones, PDAs, or handhelds. The synergy between architecture and algorithmic features allows a high quality output, low memory to cache bandwidth requirements, and a search window independent implementation for H.264/AVC real time video encoding of up to high definition video (HDTV).

In this paper, we show that we can apply probabilistic spatiotemporal macroblock filtering (PSMF) and partial decoding processes to effectively detect and track multiple objects in real time in H.264|AVC bitstreams with stationary... more

In this paper, we show that we can apply probabilistic spatiotemporal macroblock filtering (PSMF) and partial decoding processes to effectively detect and track multiple objects in real time in H.264|AVC bitstreams with stationary background. Our contribution is that our method cannot only show fast processing time but also handle multiple moving objects that are articulated, changing in size or internally have monotonous color, even though they contain a chaotic set of non-homogeneous motion vectors inside. In addition, our partial decoding process for H.264|AVC bitstreams enables to improve the accuracy of object trajectories and overcome long occlusion by using extracted color information.

Dans un contexte de codage vidéo hybride comme définit par la norme H.264/AVC, l'efficacité maximale de compression est atteinte par l'optimisation d'un ensemble de paramètres de codage. Parmi les plus cruciaux figure le choix du mode de... more

Dans un contexte de codage vidéo hybride comme définit par la norme H.264/AVC, l'efficacité maximale de compression est atteinte par l'optimisation d'un ensemble de paramètres de codage. Parmi les plus cruciaux figure le choix du mode de prédiction. Ce dernier est communé-ment sélectionné suivant un critère débit-distorsion RDO au cours d'une recherche exhaustive, d'une complexité de calcul importante notamment due à l'étape d'estimation de mouvement (ME). La solution proposée ici adapte un algo-rithme de décision rapide basé analyse d'image, de ma-nière à limiter la complexité introduite par la ME. En effet, la mise en concurrence des corrélations spatiales et tem-porelles relatives au macrobloc permet une décision rapide et fiable du mode de prédiction, évitant les MEs inutiles et coûteuses en calcul. Basé sur un apprentissage hors-ligne, le modèle proposé dans cet article permet d'accélérer sys-tématiquement l'encodeur temps-réel x264, de 11,29% en moyenne et jusqu'à 20% selon les séquences, pour une perte en efficacité de codage négligeable (<1%).

Currently, even considering the recent advances in the microprocessor power computing, high definition multimedia applications still require very complex demands to allow real-time video encoding. Particularly, modern video encoders... more

Currently, even considering the recent advances in the
microprocessor power computing, high definition multimedia
applications still require very complex demands to allow real-time
video encoding. Particularly, modern video encoders (MPEG/ITU
H.26x series) depend of complex and computationally exhaustive
motion estimation algorithms to identify and remove temporal
redundancy among consecutive (or not) frames inside a video
sequence, as strategy to reduce the final compressed bit rate. In
fact, the mechanism of block matching can be considered the most
critical encoder algorithm, in terms of computational demands,
like it is responsible for searching, in distinct reference frames, for
similar pixel blocks related with each one of the input image
blocks. The number of required block comparisons for high
definition videos represents a clear and important restriction for
real-time implementations. This paper introduces an improved
strategy of block matching method, which was optimized for
multiprocessing execution, mainly focusing in implementation
over general purpose graphical processing unit technologies, as
the NVidia CUDA® GPUs. The improved motion estimation
solution was implemented in the JSVM reference code (scalable
version of H.264 video encoder), when it was registered a speed
up gain of more than 350% in average for 4CIF videos.

This paper presents a transcoding scheme for the conver-sion from one bitstream coded with the H.264/AVC stan-dard into another compliant with the MPEG-2 standard. This type of transcoding is very interesting nowadays both from the point... more

This paper presents a transcoding scheme for the conver-sion from one bitstream coded with the H.264/AVC stan-dard into another compliant with the MPEG-2 standard. This type of transcoding is very interesting nowadays both from the point of view of the broadcaster and of the customer. Due to the many differences between these two standards, the overall transcoding process has to be per-formed in the spatial domain; nevertheless, the proposed transcoding scheme adapts all the motion and coding modes information, avoiding the complete decoding and re-encoding of the incoming bitstream. Experimental re-sults and comparisons with the classical decoder-encoder cascade are provided over the common video sequences benchmarks.

This paper presents a two-dimensional histogram shifting technique for reversible data hiding algorithm. In order to avoid the distortion drift caused by hiding data into stereo H.264 video, we choose arbitrary... more

This paper presents a two-dimensional histogram shifting technique for reversible data hiding algorithm. In order to avoid the distortion drift caused by hiding data into stereo H.264 video, we choose arbitrary embeddable blocks from 4×4 quantized discrete cosine transform luminance blocks which will not affect their adjacent blocks. Two coefficients in each embeddable block are chosen as a hiding coefficient pair. The selected coefficient pairs are classified into different sets on the basis of their values. Data could be hidden according to the set which the value of the coefficient pair belongs to. When the value of one coefficient may be changed by adding or subtracting 1, two data bits could be hidden by using the proposed method, whereas only one data bit could be embedded by employing the conventional histogram shifting. Experiments show that this two-dimensional histogram shifting method can be used to improve the hiding performance.

We present a new preprocessing technique for two-dimensional compression of surface electromyographic (S-EMG) signals, based on correlation sorting. We show that the JPEG2000 coding system (originally designed for compression of still... more

We present a new preprocessing technique for two-dimensional compression of surface electromyographic (S-EMG) signals, based on correlation sorting. We show that the JPEG2000 coding system (originally designed for compression of still images) and the H.264/AVC encoder (video compression algorithm operating in intraframe mode) can be used for compression of S-EMG signals. We compare the performance of these two off-the-shelf image compression algorithms for S-EMG compression, with and without the proposed preprocessing step. Compression of both isotonic and isometric contraction S-EMG signals is evaluated. The proposed methods were compared with other S-EMG compression algorithms from the literature.

We present a novel arbitrary spatial downsizing method in H.264 to MPEG4 simple profile transcoder. Using median filtering and scaling of input H.264 4times4 block level motion vectors, downsizing module generates target resolution... more

We present a novel arbitrary spatial downsizing method in H.264 to MPEG4 simple profile transcoder. Using median filtering and scaling of input H.264 4times4 block level motion vectors, downsizing module generates target resolution compatible 4times4 block level motion vector, which is fed at MPEG4 encoder to generate MPEG4 compatible final 16times16 or 8times8 block level motion vector. Downsizing and encoding modules are differentiated for better portability. Our approach, being approximately 6-8 times faster compared to full encode system, achieves almost same compression as compared to full encoding.

Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity (SSIM) are metrics initially used to evaluate the visual quality of compressed images or sequences compared to the original ones. By analogy to compressed sequences, researchers... more

Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity (SSIM) are metrics initially used to evaluate the visual quality of compressed images or sequences compared to the original ones. By analogy to compressed sequences, researchers use these metrics to evaluate the degradation of encrypted sequences. Video encryption algorithms target a maximum scrambling so that their contents become imperceptible to the human visual system. The distortion of PSNR and SSIM values comes from both compression and encryption. The use of these metrics to measure the degradation of joint compressed and encrypted sequences cannot give us a precise evaluation of the distortion. For a better evaluation, a Contrast Sensitivity Function (CSF) metric was used. This article aims to provide a perceptual evaluation of the encryption effect for the H.264 Advanced Video Coding compressed and encrypted sequences, using the CSF metric. The visual quality of the encrypted video is degraded and proven bad from different viewing distances.

Recent MPEG video compression standards are still block-based: blocks of pixels are sequentially coded using spatial or temporal prediction schemes. For each block, a vector of coding parameters has to be selected. In order to limit the... more

Recent MPEG video compression standards are still block-based: blocks of pixels are sequentially coded using spatial or temporal prediction schemes. For each block, a vector of coding parameters has to be selected. In order to limit the complexity of this decision, independence between blocks is assumed, and coding parameters are locally optimized to maximize the coding efficiency. Few studies have investigated the benefits of inter-block dependencies consideration using Joint Rate-Distortion Optimization (JRDO), specially in Intra coding. To the best of our knowledge, maximum achievable gains of such approaches have never been exhibited. In this paper, we propose two JRDO models performing joint optimization of multiple blocks applied to intra prediction mode decision. The proposed models have been evaluated in both H.264/AVC and HEVC standards. These two models enables a bitrate saving with respect to the classical RDO model up to -3.10% and -2.31% in H.264/AVC and HEVC, respectively.