P. Eisert | Humboldt Universität zu Berlin (original) (raw)

Papers by P. Eisert

In this paper, augmented reality techniques are used in order to create a Virtual Mirror for the ... more In this paper, augmented reality techniques are used in order to create a Virtual Mirror for the real-time visual-ization of customized sports shoes. Similar to looking into a mirror when trying on new shoes in a shop, we create the same impression but for virtual shoes that the customer can design individually. For that purpose, we replace the real mirror by a large display that shows the mirrored in-put of a camera capturing the legs and shoes of a person. 3-D Tracking of both feet and exchanging the real shoes by computer graphics models gives the impression of ac-tually wearing the virtual shoes. The 3-D motion tracker presented in this paper, exploits mainly silhouette informa-tion to achieve robust estimates for both shoes from a single camera view. The use of a hierarchical approach in an im-age pyramid enables real-time estimation at frame rates of more than 30 frames per second. 1.

Endoscopic videokymography is a method for visualizing the motion of the plica vocalis (vocal fol... more Endoscopic videokymography is a method for visualizing the motion of the plica vocalis (vocal folds) for medical diagnosis. The diagnostic interpretability of a kymogram deteriorates if camera motion interferes with vocal fold motion, which is hard to avoid in practice. We propose an algorithm for compensating strong camera motion for videokymography. The approach is based on an image-based inverse warping scheme that can be stated as an optimization problem. The algorithm is parallelizable and real-time capable on the CPU. We discuss advantages of the image-based approach and address its use for approximate structure visualization of the endoscopic scene.

Proceedings of the 12th European Conference on Visual Media Production - CVMP '15, 2015

Generating photorealistic facial animations is still a challenging task in computer graphics, and... more Generating photorealistic facial animations is still a challenging task in computer graphics, and synthetically generated facial animations often do not meet the visual quality of captured video sequences. Video sequences on the other hand need to be captured prior to the animation stage and do not offer the same animation flexibility as computer graphics models. We present a method for video-based facial animation, which combines the photorealism of real videos with the flexibility of CGI-based animation by extracting dynamic texture sequences from existing multi-view footage. To synthesize new facial performances, these texture sequences are concatenated in a motion-graph-like way. In order to ensure realistic appearance, we combine a warp-based optimization scheme with a modified cross dissolve to prevent visual artifacts during the transition between texture sequences. Our approach makes photorealistic facial re-animation from existing video footage possible, which is especially useful in applications like video editing or the animation of digital characters.

2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008

In this paper, we present a direct method for deformable surface tracking in monocular image sequ... more In this paper, we present a direct method for deformable surface tracking in monocular image sequences. We use the optical flow constraint instead of working with distinct features. The optical flow field is regularized with a 2dimensional mesh-based deformation model. The formulation of the deformation model contains weighted smoothing constraints defined locally on topological vertex neighborhoods. 2-dimensional deformation estimation in the presence of self-occlusion is a very challenging problem. Naturally, a 2-dimensional mesh folds in the presence of self-occlusion. We address this problem by weighting the smoothness constraints locally according to the occlusion of a region. Thereby, the mesh is forced to shrink instead of fold in occluded regions. Occlusion estimates are established from shrinking regions in the deformation mesh. Finding the best transformation then amounts to minimizing an error function that can be solved efficiently in a linear least squares sense.

2008 15th IEEE International Conference on Image Processing, 2008

In this paper, we present a method for tracking and retexturing of garments that exploits the ent... more In this paper, we present a method for tracking and retexturing of garments that exploits the entire image information using the optical flow constraint instead of working with distinct features. In a hierarchical framework we refine the motion model with every level. The motion model is used to regularize the optical flow field such that finding the best transformation amounts in minimizing an error function that can be solved in a least squares sense. Knowledge about the position and deformation of the garment in 2D allows us to erase the old texture and replace it by a new one with correct deformation and shading properties without 3D reconstruction. Additionally, it provides an estimation of the irradiance such that the new texture can be illuminated realistically.

This document presents a novel multidimensional scene representation architecture which bridges t... more This document presents a novel multidimensional scene representation architecture which bridges the gap between classical model based approaches, such as meshes, and vision based approaches, such as video plus depth. The architecture is described conceptually and a proposed implementation is presented. The layered architecture and its implementation present a tidy way of conceptualizing the interactions of data up the production chain. Beyond that this architecture enables innovative computational videography processing of multidimensional material. High quality storage of computer generated and captured video data as well as support for intermediate processing steps and novel content representation and interaction complete the architecture to provide a means for future developments for enhanced scene visualization.

Journal of Visual Communication and Image Representation, 2014

ABSTRACT Content production for stereoscopic 3D-TV displays has become mature in the past years w... more ABSTRACT Content production for stereoscopic 3D-TV displays has become mature in the past years while huge progress has also been achieved in the improvement of the image quality of glasses-free auto-stereoscopic displays and light-field displays. Concerning the latter two display families, the content production workflow is less elaborated and more complex, as the number of required views not only differs considerably but is also likely to increase in the near future. As a co-existence of all 3D display families can be expected for the next years, one aims to establish an efficient content production workflow which yields to high quality content for all 3D-TV displays. Against this background we present a real-time capable multi-view video plus depth (MVD) content production workflow based on a four-camera rig with mixed narrow and wide baseline. Results show the suitability of the approach to simultaneously produce high quality MVD4 and native stereoscopic 3D content.

IEEE Transactions on Circuits and Systems for Video Technology, 2000

A system for the automatic reconstruction of real world objects from multiple uncalibrated camera... more A system for the automatic reconstruction of real world objects from multiple uncalibrated camera views is presented. The camera position and orientation for all views, the 3-D shape of the rigid object as well as associated color information are recovered from the image sequence. The system proceeds in four steps. First, the internal camera parameters describing the imaging geometry of the camera are calibrated using a reference object. Second, an initial 3-D description of the object is computed from two views. This model information is then used in a third step to estimate the camera positions for all available views using a novel linear 3-D motion and shape estimation algorithm. The main feature of this third step is the simultaneous estimation of 3-D camera motion parameters and object shape refinement with respect to the initial 3-D model. The initial 3-D shape model exhibits only a few degrees of freedom and the object shape refinement is defined as flexible deformation of the initial shape model. Our formulation of the shape deformation allows the object texture to slide on the surface, which differs from traditional flexible body modeling. This novel combined shape and motion estimation using sliding texture considerably improves the calibration data of the individual views in comparison to fixed-shape model-based camera motion estimation. Since the shape model used for model-based camera motion estimation is approximate only, a volumetric 3-D reconstruction process is initiated in the fourth step that combines the information from all views simultaneously. The recovered object consists of a set of voxels with associated color information that describe even fine structures and details of the object. New views of the object can be rendered from the recovered 3-D model, which has potential applications in virtual reality or multimedia systems and the emerging field of video coding using 3-D scene models.

IEEE Transactions on Circuits and Systems for Video Technology, 2000

We show that traditional waveform-coding and 3-D model-based coding are not competing alternative... more We show that traditional waveform-coding and 3-D model-based coding are not competing alternatives but should be combined to support and complement each other. Both approaches are combined such that the generality of waveform coding and the efficiency of 3-D model-based coding are available where needed. The combination is achieved by providing the block-based video coder with a second reference frame for prediction which is synthesized by the modelbased coder. The model-based coder uses a parameterized 3-D head model specifying shape and color of a person. We therefore restrict our investigations to typical videotelephony scenarios that show head-and-shoulder scenes. Motion and deformation of the 3-D head model constitute facial expressions which are represented by facial animation parameters (FAPs) based on the MPEG-4 standard. An intensity gradient-based approach that exploits the 3-D model information is used to estimate the FAPs as well as illumination parameters that describe changes of the brightness in the scene. Model failures and objects that are not known at the decoder are handled by standard block-based motion-compensated prediction which is not restricted to a special scene content, but results in lower coding efficiency. A Lagrangian approach is employed to determine the most efficient prediction for each block from either the synthesized model frame or the previous decoded frame. Experiments on five video sequences show that bit-rate savings of about 35 % are achieved at equal average PSNR when comparing the model-aided codec to TMN-10, the state-of-the-art test model of the H.263 standard. This corresponds to a gain of 2-3 dB in PSNR when encoding at the same average bit-rate.

Computer Graphics Forum, 2013

Figure 1: Our approach synthesizes images of clothes from a database of images by interpolating i... more Figure 1: Our approach synthesizes images of clothes from a database of images by interpolating image warps as well as intensities in pose space.

We present a video streaming solution to provide fluent remote access to highly interactive 3D ap... more We present a video streaming solution to provide fluent remote access to highly interactive 3D applications, such as games. To fulfill the very low delay and low complexity constraint for this class of applications, several optimizations have been developed. Image preprocessing is implemented on the graphics card to make efficient reuse of the rendered output, as well as the GPU's parallel processing capabilities. H.264/AVC video encoding is accelerated by extracting additional information from the rendering context, which allows for direct calculation of motion vectors and partitioning of macroblocks, thereby omitting the demanding search of generic video encoders. A highly optimized client software has been developed to provide very low delay playback of streamed video and audio, using minimum buffering. In experiments a hardly noticeable delay of less than 40 ms could be achieved.

This paper addresses the synthesis of near-regular textures, i.e. textures that consist of a regu... more This paper addresses the synthesis of near-regular textures, i.e. textures that consist of a regular global structure plus subtle yet very characteristic stochastic irregularities. Such textures are difficult to synthesize due to the complementary characteristics of these structures. In this paper, we propose a method which we call Random Sampling and Gap Filling (RSGF) to synthesize near-regular textures. The synthesis approach is guided by a lattice of the global structure estimated from a generalized normalized autocorrelation of the sample image. This lattice constrains a random sampling process to maintain the global regular structure yet ensuring the characteristic randomness of the irregular structures. Results presented in this paper show that our method does not only produce convincing results for regular or near-regular textures but also for irregular textures.

Retexturing is the process of realistically replacing the texture of an object or surface in a gi... more Retexturing is the process of realistically replacing the texture of an object or surface in a given image by a new, synthetic one, such that texture distortion as well as lighting conditions of the original image are preserved. The key challenge is to separate the shading information from the actual local texture and to retrieve the texture distortion from an image without any knowledge of the underlying scene. In this paper, we introduce an approach for automatic retexturing that models an image of a deformed regular texture as a combination of its deformed surface albedo, a shading map and additional high frequency details.

Traditional set-top camera video-conferencing systems still fail to meet the ‘telepresence challe... more Traditional set-top camera video-conferencing systems still fail to meet the ‘telepresence challenge ’ of providing a viable alternative for physical business travel, which is nowadays characterized by unacceptable delays, costs, inconvenience, and an increasingly large ecological footprint. Even recent high-end commercial solutions, while partially removing some of these traditional shortcomings, still present the problems of not scaling easily, expensive implementations, not utilizing 3D life-sized representations of the remote participants and addressing only eye contact and gesturebased interactions in very limited ways. The European FP7 project 3DPresence will develop a multi-party, high-end 3D videoconferencing concept that will tackle the problem of transmitting the feeling of physical presence in real-time to

In this paper, augmented reality techniques are used in order to create a Virtual Mirror for the ... more In this paper, augmented reality techniques are used in order to create a Virtual Mirror for the real-time visualization of customized sports shoes. Similar to looking into a mirror when trying on new shoes in a shop, we create the same impression but for virtual shoes that the customer can design individually. For that purpose, we replace the real mirror by a large display that shows the mirrored input of a camera capturing the legs and shoes of a person. 3-D Tracking of both feet and exchanging the real shoes by computer graphics models gives the impression of actually wearing the virtual shoes. The 3-D motion tracker presented in this paper, exploits mainly silhouette information to achieve robust estimates for both shoes from a single camera view. The use of a hierarchical approach in an image pyramid enables real-time estimation at frame rates of more than 30 frames per second. 1.

Free vicwpoint video provides the possibility to freely navigate within dynamic real world video ... more Free vicwpoint video provides the possibility to freely navigate within dynamic real world video scenes by choosing arbitrary viewpoints and view directions. So far. relatcd work only considered frec viewpoint video extraction, representation. and rendering methods. Compression and transmission has not yet been studied in detail and combined with the other components into one complete system. In this paper, we present such a complete system for eficient free viewpoint video extraction, representation, coding, and interactive rendering. Data representation is based on 3D mesh models and view-dependent texture mapping using video textures. The geometry extraction is based on a shape-from-silhouette algorithm. The resulting voxel models are converted into 3D meshes that are codcd using MPEG-4 SNHC tools. The corresponding video textures are coded using an H.264IAVC codec. Our algorithms for view-dependent texture mapping have been adopted as an extension of MPEG-4 AFX. The presented re...

We propose a novel face capture system for security gateways which allows for inexpensive rapid a... more We propose a novel face capture system for security gateways which allows for inexpensive rapid automated or computer-aided face-based person authentication employing 3D head and face data. By face-based person authentication we refer to the process of comparing the appearance of a person to a visual representation of that person stored on a security document. This comparison can be done either manually or automatically and the data to be compared may be a standard facial image or a 3D representation of the person's head, depending on the capabilities of the security document. We propose algorithms for 3D reconstruction of persons passing the security gate as well as for head pose estimation and compensation to enable precise alignment of the 3D representation to be compared to the document. Furthermore, we show how eyeglasses affect reconstruction results and propose methods to compensate for these effects as ongoing research.

Proceedings of the 12th European Conference on Visual Media Production - CVMP '15, 2015

2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008

2008 15th IEEE International Conference on Image Processing, 2008

Journal of Visual Communication and Image Representation, 2014

IEEE Transactions on Circuits and Systems for Video Technology, 2000

Computer Graphics Forum, 2013

In this paper, augmented reality techniques are used in order to create a Virtual Mirror for the ... more In this paper, augmented reality techniques are used in order to create a Virtual Mirror for the real-time visualization of customized sports shoes. Similar to looking into a mirror when trying on new shoes in a shop, we create the same impression but for virtual shoes that the customer can design individually. For that purpose, we replace the real mirror by a large display that shows the mirrored input of a camera capturing the legs and shoes of a person. 3-D Tracking of both feet and exchanging the real shoes by computer graphics models gives the impression of actually wearing the virtual shoes. The 3-D motion tracker presented in this paper, exploits mainly silhouette information to achieve robust estimates for both shoes from a single camera view. The use of a hierarchical approach in an image pyramid enables real-time estimation at frame rates of more than 30 frames per second. 1.