State-of-the Art Motion Estimation in the Context of 3D TV june 18 2012 (original) (raw)

State-of-the Art Motion Estimation in the Context of 3D TV

2013

Progress in image sensors and computation power has fueled studies to improve acquisition, processing, and analysis of 3D streams along with 3D scenes/objects reconstruction. The role of motion compensation/ motion estimation (MCME) in 3D TV from end-to-end user is investigated in this chapter. Motion vectors (MVs) are closely related to the concept of disparities, and they can help improving dynamic scene acquisition, content creation, 2D to 3D conversion, compression coding, decompression/decoding, scene rendering, error concealment, virtual/augmented reality handling, intelligent content retrieval, and displaying. Although there are different 3D shape extraction methods, this chapter focuses mostly on shape-from-motion (SfM) techniques due to their relevance to 3D TV. SfM extraction can restore 3D shape information from a single camera data.

Scene Representation Technologies for 3DTV—A Survey

IEEE Transactions on Circuits and Systems for Video Technology, 2000

3-D scene representation is utilized during scene extraction, modeling, transmission and display stages of a 3DTV framework. To this end, different representation technologies are proposed to fulfill the requirements of 3DTV paradigm. Dense point-based methods are appropriate for free-view 3DTV applications, since they can generate novel views easily. As surface representations, polygonal meshes are quite popular due to their generality and current hardware support. Unfortunately, there is no inherent smoothness in their description and the resulting renderings may contain unrealistic artifacts. NURBS surfaces have embedded smoothness and efficient tools for editing and animation, but they are more suitable for synthetic content. Smooth subdivision surfaces, which offer a good compromise between polygonal meshes and NURBS surfaces, require sophisticated geometry modeling tools and are usually difficult to obtain. One recent trend in surface representation is point-based modeling which can meet most of the requirements of 3DTV, however the relevant state-of-the-art is not yet mature enough. On the other hand, volumetric representations encapsulate neighborhood information that is useful for the reconstruction of surfaces with their parallel implementations for multiview stereo algorithms. Apart from the representation of 3-D structure by different primitives, texturing of scenes is also essential for a realistic scene rendering. Image-based rendering techniques directly render novel views of a scene from the acquired images, since they do not require any explicit geometry or texture representation. 3-D human face and body modeling facilitate the realistic animation and rendering of human figures that is quite crucial for 3DTV that might demand real-time animation of human bodies. Physically based modeling and animation techniques produce impressive results, thus have

3DTV: 3D Time-varying Scene Capture Technologies -- A Survey

2008

Advances in image sensors and evolution of digital computation is a strong stimulus for development and implementation of sophisticated methods for capturing, processing and analysis of 3D data from dynamic scenes. Research on perspective time-varying 3D scene capture technologies is important for the upcoming 3DTV displays. Methods such as shape-from-texture, shape-from-shading, shape-from-focus and shape-from-motion extraction can restore 3D shape information from a single camera data. The existing techniques for 3D extraction from single camera video sequences are especially useful for conversion of the already available vast mono-view content to the 3DTV systems. Scene-oriented single camera methods as human face reconstruction and facial motion analysis, body modeling and body motion tracking and motion recognition solve efficiently a variety of tasks. Intensive area of research is 3D multicamera dynamic acquisition and reconstruction with its hardware specifics as calibration ...

Constrained disparity and motion estimators for 3DTV image sequence coding

Signal Processing: Image Communication, 1991

This paper presents two-dimensional motion estimation methods which take advantage of the intrinsic redundancies inside 3DTV stereoscopic image sequences. Most of the previous studies extract, either disparity vector fields if they are involved in stereovision, or apparent ...

Advances in 3DTV: Theory and Practice

International Journal of Digital Multimedia Broadcasting, 2010

Being a goal over decades, the extension of television, and visual communications in general, to the third dimension (3DTV) has been almost reached. Currently, 3D motion pictures are shown in theatres and the first 3D broadcasts are being planned to initiate in the next few years. Albeit the progress, state of the art has not yet reached the goal of acquiring a three-dimensional scene in full detail and creating a precise optical duplicate at remote site in real-time. Limitations in reconstruction accuracy and visual quality as well as user acceptance of pertinent technologies have, to date, prevented the creation of infrastructures for the delivery of 3D content to mass markets. Thereby, relevant scientific research is at the center of interest regarding the improvement of current 3DTV technologies.

3D image sequence acquisition for TV & film production

2002

This paper considers techniques for capturing 3D information from image sequences for applications in film and TV production. The potential applications fall into two classes, one requiring 3D data that can be represented as a depth map from a single viewpoint, and the other requiring a full 3D model. Applications for both classes of data are briefly reviewed, and current work on 3D data capture in two EU-funded projects is described. The MetaVision project is considering depth map acquisition, and results based on a three-camera stereo system are presented. The development of a multi-camera system using widelyseparated cameras in a studio environment is being carried out as a part of the ORIGAMI project.

Converting 2D Video to 3D: An Efficient Path to a 3D Experience

IEEE Multimedia, 2000

W ide-scale deployment of 3D video technologies continues to experience rapid growth in such high-visibility areas as cinema, TV, and mobile devices. Of course, the visualization of 3D videos is actually a 4D experience, because three spatial dimensions are perceived as the video changes over the fourth dimension of time. However, because it's common to describe these videos as simply ''3D,'' we shall do the same, and understand that the time dimension is being ignored. So why is 3D suddenly so popular? For many, watching a 3D video allows for a highly realistic and immersive perception of dynamic scenes, with more deeply engaging experiences, as compared to traditional 2D video. This, coupled with great advances in 3D technologies and the appearance of the most successful movie of all time in vivid 3D (Avatar), has apparently put 3D video production on the map for good.

3D MOTION ESTIMATION FOR 3D VIDEO CODING

H.264/MVC multi-view video coding provides a better compression rate compared to the simulcast coding using hierarchical B-picture prediction structure exploiting inter-and intra-view redundancy. However, this technique imposes random access frame delay as well as requiring huge computational time. In this paper a novel technique is proposed using 3D motion estimation (3D-ME) to overcome the problems. In the 3D-ME technique, a 3D frame is formed using the same temporal frames of all views and ME is carried out for the current 3D frame using the immediate previous 3D frame as a reference frame. As the correlation among the intra-view images is higher compared to the correlation among the inter-view images, the proposed 3D-ME technique reduces the overall computational time and eliminates the frame delay with comparable rate-distortion (RD) performance compared to H.264/MVC. Another technique is also proposed in the paper where an extra reference 3D frame comprising dynamic background frames (the most common frame of a scene i.e., McFIS) of each view is used for 3D-ME. Experimental results reveal that the proposed 3D-ME-McFIS technique outperforms the H.264/MVC in terms of improved RD performance by reducing computational time and by eliminating the random access frame delay.

Three-dimensional motion estimation of objects for video coding

IEEE Journal on Selected Areas in Communications, 1998

In this work, three-dimensional (3-D) motion estimation is applied to the problem of motion compensation for video coding. We suppose that the video sequence consists of the perspective projections of a collection of rigid bodies which undergo a rototranslational motion. Motion compensation can be performed on the sequence once the shape of the objects and the motion parameters are determined. We show that the motion equations of a rigid body can be formulated as a nonlinear dynamic system whose state is represented by the motion parameters and by the scaled depths of the object feature points. An extended Kalman filter is used to estimate both the motion and the object shape parameters simultaneously. The inclusion of the shape parameters in the estimation procedure adds a set of constraints to the filter equations that appear to be essential for reliable motion estimation. Our experiments show that the proposed approach gives two advantages. First, the filter can give more reliable estimates in the presence of measurement noise in comparison with other motion estimators that separately compute motion and structure. Second, the filter can efficiently track abrupt motion changes. Moreover, the structure imposed by the model implies that the reconstructed motion is very natural as opposed to more common block-based schemes. Also, the parameterization of the model allows for a very efficient coding of motion information.