Irfan Essa | Georgia Institute of Technology (original) (raw)

Papers by Irfan Essa

Research paper thumbnail of Facial Expression Recognition Using a Dynamic Model and Motion Energy

Previous efforts at facial expression recognition have been based on the Facial Action Coding Sys... more Previous efforts at facial expression recognition have been based on the Facial Action Coding System (FACS), a representation developed in order to allow human psychologists to code expression from static facial "mugshots. " In this paper we develop new, more accurate representations for facial expression by building a video database of facial expressions and then probabilistically characterizing the facial muscle activation associated with each expression using a detailed physical model of the skin and muscles. This produces a muscle-based representation of facial motion, which is then used to recognize facial expressions in two different ways. The first method uses the physics-based model directly, by recognizing expressions through comparison of estimated muscle activations. The second method uses the physics-based model to generate spatio-temporal motion-energy templates of the whole face for each different expression. These simple, biologically-plausible motion energy "templates" are then used for recognition. Both methods show substantially greater accuracy at expression recognition than has been previously achieved.

Research paper thumbnail of Coding, Analysis, Interpretation, and Recognition of Facial Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997

We describe a computer vision system for observing facial motion by using an optimal estimation o... more We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure. Our method produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion.

Research paper thumbnail of The Aware Home: A Living Laboratory for Ubiquitous Computing Research

We are building a home, called the Aware Home, to create a living laboratory for research in ubiq... more We are building a home, called the Aware Home, to create a living laboratory for research in ubiquitous computing for everyday activities. This paper introduces the Aware Home project and outlines some of our technologyand human-centered research objectives in creating the Aware Home.

Research paper thumbnail of Graphcut textures: image and video synthesis using graph cuts

ACM Transactions on Graphics, 2003

This banner was generated by merging the source images in using our interactive texture merging t... more This banner was generated by merging the source images in using our interactive texture merging technique.

Research paper thumbnail of Video textures

This paper introduces a new type of medium, called a video texture, which has qualities somewhere... more This paper introduces a new type of medium, called a video texture, which has qualities somewhere between those of a photograph and a video. A video texture provides a continuous infinitely varying stream of images. While the individual frames of a video texture may be repeated from time to time, the video sequence as a whole is never repeated exactly. Video textures can be used in place of digital photos to infuse a static image with dynamic qualities and explicit action. We present techniques for analyzing a video clip to extract its structure, and for synthesizing a new, similar looking video of arbitrary length. We combine video textures with view morphing techniques to obtain 3D video textures. We also introduce videobased animation, in which the synthesis of video textures can be guided by a user through high-level interactive controls. Applications of video textures and their extensions include the display of dynamic scenes on web pages, the creation of dynamic backdrops for special effects and games, and the interactive control of video-based animation.

Research paper thumbnail of Physically-based modeling for graphics and vision

The elastic properties of materials constrain the motion and dynamics of objects in the real worl... more The elastic properties of materials constrain the motion and dynamics of objects in the real world, hence modeling and simulating the physical characteristics of these objects is essential to obtain realistic computer modeling for graphics, vision and animation. This type of modeling is referred to as physically-based modeling and is the main focus of this chapter.

Research paper thumbnail of Guest Editors' Introduction to the Special Section on Award-Winning Papers from the IEEE Conference on Computer Vision and Pattern Recognition 2009 (CVPR 2009)

Research paper thumbnail of Weakly Supervised Learning of Object Segmentations from Web-Scale Video

Abstract. We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) i... more Abstract. We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as ���dog���, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments.

Research paper thumbnail of Unsupervised activity discovery and characterization from event-streams

Abstract: We present a framework to discover and characterize different classes of everyday activ... more Abstract: We present a framework to discover and characterize different classes of everyday activities from event-streams. We begin by representing activities as bags of event n-grams. This allows us to analyze the global structural information of activities, using their local event statistics. We demonstrate how maximal cliques in an undirected edge-weighted graph of activities, can be used for activity-class discovery in an unsupervised manner.

Research paper thumbnail of Hello, Are You Human?

In this paper, we propose the concept of a humanizer and explore its applications in network secu... more In this paper, we propose the concept of a humanizer and explore its applications in network security and E-commerce. A humanizer is a novel authentication scheme that asks the question" are you human?"(instead of" who are you?"), and upon the correct answer to this question, can prove a principal to be a human being instead of a computer program. We demonstrate that the humanizer helps solve problems in network security and E-commerce that existing security measures can not address properly.

Research paper thumbnail of Fast multiple camera head pose tracking

Abstract This paper presents a multiple camera system to determine the head pose of people in an ... more Abstract This paper presents a multiple camera system to determine the head pose of people in an indoor setting. Our approach extends current eye tracking techniques from a single camera system to a multiple camera system. The head pose of a person is determined by triangulating multiple facial features that are obtained in real-time from eye trackers. Our work is unique in that it allows us to observe user head orientation in real-time using several cameras over a much larger space than covered by a single camera.

Research paper thumbnail of 3D shape context and distance transform for action recognition

Abstract We propose the use of 3D (2D+ time) shape context to recognize the spatial and temporal ... more Abstract We propose the use of 3D (2D+ time) shape context to recognize the spatial and temporal details inherent in human actions. We represent an action in a video sequence by a 3D point cloud extracted by sampling 2D silhouettes over time. A non-uniform sampling method is introduced that gives preference to fast moving body parts using a Euclidean 3D distance transform. Actions are then classified by matching the extracted point clouds.

Research paper thumbnail of Image and video based painterly animation

Abstract We present techniques for transforming images and videos into painterly animations depic... more Abstract We present techniques for transforming images and videos into painterly animations depicting different artistic styles. Our techniques rely on image and video analysis to compute appearance and motion properties. We also determine and apply motion information from different (user-specified) sources to static and moving images. These properties that encode spatio-temporal variations are then used to render (or paint) effects of selected styles to generate images and videos with a painted look.

Research paper thumbnail of Segmental boosting algorithm for time-seris feature selection

Discriminative feature selection paradigms, eg,[8, 9] usually consider observation frames in an i... more Discriminative feature selection paradigms, eg,[8, 9] usually consider observation frames in an isolated manner, neglecting temporal dependency in time series. Such temporal relationships provide important information for recognition. We propose Segmental Boosting Algorithm (SBA), which applies feature selection only to the ���static segments��� of the timeseries. It smoothly fills in the gap between the dynamic nature of the time-series data and the static nature of the feature selection methods.

Research paper thumbnail of NARC: The News Article Revision Comparator

ABSTRACT Currency of information in news consumption is an important facet of information quality... more ABSTRACT Currency of information in news consumption is an important facet of information quality which involves both the journalist providing updated information and the consumer being aware of updates and changes to the news stream. We are addressing information quality and currency in online news articles from the viewpoint of news consumption with the intent of reducing the consumption effort involved in getting the most up-to-date information on a breaking news story.

Research paper thumbnail of Audio Puzzler: piecing together time-stamped speech transcripts with a puzzle game

Abstract We have developed an audio-based casual puzzle game which produces a time-stamped transc... more Abstract We have developed an audio-based casual puzzle game which produces a time-stamped transcription of spoken audio as a by-product of play. Our evaluation of the game indicates that it is both fun and challenging. The transcripts generated using the game are more accurate than those produced using a standard automatic transcription system and the time-stamps of words are within several hundred milliseconds of ground truth.

Research paper thumbnail of Element-free elastic models for volume fitting and capture

Abstract We present a new method of fitting an element-free volumetric model to a sequence of def... more Abstract We present a new method of fitting an element-free volumetric model to a sequence of deforming surfaces of a moving object. Given a sequence of visual hulls, we iteratively fit an element-free elastic model to the visual hull in order to extract the optimal pose of the captured volume. The fitting of the volumetric model is acheived by minimizing a combination of elastic potential energy, a surface distance measure, and a self-intersection penalty for each frame.

Research paper thumbnail of Feature weighting for segmentation

ABSTRACT This paper proposes the use of feature weights to reveal the hierarchical nature of musi... more ABSTRACT This paper proposes the use of feature weights to reveal the hierarchical nature of music audio. Feature weighting has been exploited in machine learning, but has not been applied to music audio segmentation. We describe both a global and a local approach to automatic feature weighting. The global approach assigns a single weighting to all features in a song. The local approach uses the local separability directly.

Research paper thumbnail of An annotation model for making sense of information quality in online video

Abstract Making sense of the information quality of online media including things such as the acc... more Abstract Making sense of the information quality of online media including things such as the accuracy and validity of claims and the reliability of sources is essential for people to be well-informed. We are developing Videolyzer to address the challenge of information quality sense-making by allowing motivated individuals to analyze, collect, share, and respond to criticisms of the information quality of online political videos and their transcripts.

Research paper thumbnail of Boosted audio-visual HMM for speech reading

Abstract We propose a new approach for combining acoustic and visual measurements to aid in recog... more Abstract We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability.

Research paper thumbnail of Facial Expression Recognition Using a Dynamic Model and Motion Energy

Previous efforts at facial expression recognition have been based on the Facial Action Coding Sys... more Previous efforts at facial expression recognition have been based on the Facial Action Coding System (FACS), a representation developed in order to allow human psychologists to code expression from static facial "mugshots. " In this paper we develop new, more accurate representations for facial expression by building a video database of facial expressions and then probabilistically characterizing the facial muscle activation associated with each expression using a detailed physical model of the skin and muscles. This produces a muscle-based representation of facial motion, which is then used to recognize facial expressions in two different ways. The first method uses the physics-based model directly, by recognizing expressions through comparison of estimated muscle activations. The second method uses the physics-based model to generate spatio-temporal motion-energy templates of the whole face for each different expression. These simple, biologically-plausible motion energy "templates" are then used for recognition. Both methods show substantially greater accuracy at expression recognition than has been previously achieved.

Research paper thumbnail of Coding, Analysis, Interpretation, and Recognition of Facial Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997

We describe a computer vision system for observing facial motion by using an optimal estimation o... more We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure. Our method produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion.

Research paper thumbnail of The Aware Home: A Living Laboratory for Ubiquitous Computing Research

We are building a home, called the Aware Home, to create a living laboratory for research in ubiq... more We are building a home, called the Aware Home, to create a living laboratory for research in ubiquitous computing for everyday activities. This paper introduces the Aware Home project and outlines some of our technologyand human-centered research objectives in creating the Aware Home.

Research paper thumbnail of Graphcut textures: image and video synthesis using graph cuts

ACM Transactions on Graphics, 2003

This banner was generated by merging the source images in using our interactive texture merging t... more This banner was generated by merging the source images in using our interactive texture merging technique.

Research paper thumbnail of Video textures

This paper introduces a new type of medium, called a video texture, which has qualities somewhere... more This paper introduces a new type of medium, called a video texture, which has qualities somewhere between those of a photograph and a video. A video texture provides a continuous infinitely varying stream of images. While the individual frames of a video texture may be repeated from time to time, the video sequence as a whole is never repeated exactly. Video textures can be used in place of digital photos to infuse a static image with dynamic qualities and explicit action. We present techniques for analyzing a video clip to extract its structure, and for synthesizing a new, similar looking video of arbitrary length. We combine video textures with view morphing techniques to obtain 3D video textures. We also introduce videobased animation, in which the synthesis of video textures can be guided by a user through high-level interactive controls. Applications of video textures and their extensions include the display of dynamic scenes on web pages, the creation of dynamic backdrops for special effects and games, and the interactive control of video-based animation.

Research paper thumbnail of Physically-based modeling for graphics and vision

The elastic properties of materials constrain the motion and dynamics of objects in the real worl... more The elastic properties of materials constrain the motion and dynamics of objects in the real world, hence modeling and simulating the physical characteristics of these objects is essential to obtain realistic computer modeling for graphics, vision and animation. This type of modeling is referred to as physically-based modeling and is the main focus of this chapter.

Research paper thumbnail of Guest Editors' Introduction to the Special Section on Award-Winning Papers from the IEEE Conference on Computer Vision and Pattern Recognition 2009 (CVPR 2009)

Research paper thumbnail of Weakly Supervised Learning of Object Segmentations from Web-Scale Video

Abstract. We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) i... more Abstract. We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as ���dog���, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments.

Research paper thumbnail of Unsupervised activity discovery and characterization from event-streams

Abstract: We present a framework to discover and characterize different classes of everyday activ... more Abstract: We present a framework to discover and characterize different classes of everyday activities from event-streams. We begin by representing activities as bags of event n-grams. This allows us to analyze the global structural information of activities, using their local event statistics. We demonstrate how maximal cliques in an undirected edge-weighted graph of activities, can be used for activity-class discovery in an unsupervised manner.

Research paper thumbnail of Hello, Are You Human?

In this paper, we propose the concept of a humanizer and explore its applications in network secu... more In this paper, we propose the concept of a humanizer and explore its applications in network security and E-commerce. A humanizer is a novel authentication scheme that asks the question" are you human?"(instead of" who are you?"), and upon the correct answer to this question, can prove a principal to be a human being instead of a computer program. We demonstrate that the humanizer helps solve problems in network security and E-commerce that existing security measures can not address properly.

Research paper thumbnail of Fast multiple camera head pose tracking

Abstract This paper presents a multiple camera system to determine the head pose of people in an ... more Abstract This paper presents a multiple camera system to determine the head pose of people in an indoor setting. Our approach extends current eye tracking techniques from a single camera system to a multiple camera system. The head pose of a person is determined by triangulating multiple facial features that are obtained in real-time from eye trackers. Our work is unique in that it allows us to observe user head orientation in real-time using several cameras over a much larger space than covered by a single camera.

Research paper thumbnail of 3D shape context and distance transform for action recognition

Abstract We propose the use of 3D (2D+ time) shape context to recognize the spatial and temporal ... more Abstract We propose the use of 3D (2D+ time) shape context to recognize the spatial and temporal details inherent in human actions. We represent an action in a video sequence by a 3D point cloud extracted by sampling 2D silhouettes over time. A non-uniform sampling method is introduced that gives preference to fast moving body parts using a Euclidean 3D distance transform. Actions are then classified by matching the extracted point clouds.

Research paper thumbnail of Image and video based painterly animation

Abstract We present techniques for transforming images and videos into painterly animations depic... more Abstract We present techniques for transforming images and videos into painterly animations depicting different artistic styles. Our techniques rely on image and video analysis to compute appearance and motion properties. We also determine and apply motion information from different (user-specified) sources to static and moving images. These properties that encode spatio-temporal variations are then used to render (or paint) effects of selected styles to generate images and videos with a painted look.

Research paper thumbnail of Segmental boosting algorithm for time-seris feature selection

Discriminative feature selection paradigms, eg,[8, 9] usually consider observation frames in an i... more Discriminative feature selection paradigms, eg,[8, 9] usually consider observation frames in an isolated manner, neglecting temporal dependency in time series. Such temporal relationships provide important information for recognition. We propose Segmental Boosting Algorithm (SBA), which applies feature selection only to the ���static segments��� of the timeseries. It smoothly fills in the gap between the dynamic nature of the time-series data and the static nature of the feature selection methods.

Research paper thumbnail of NARC: The News Article Revision Comparator

ABSTRACT Currency of information in news consumption is an important facet of information quality... more ABSTRACT Currency of information in news consumption is an important facet of information quality which involves both the journalist providing updated information and the consumer being aware of updates and changes to the news stream. We are addressing information quality and currency in online news articles from the viewpoint of news consumption with the intent of reducing the consumption effort involved in getting the most up-to-date information on a breaking news story.

Research paper thumbnail of Audio Puzzler: piecing together time-stamped speech transcripts with a puzzle game

Abstract We have developed an audio-based casual puzzle game which produces a time-stamped transc... more Abstract We have developed an audio-based casual puzzle game which produces a time-stamped transcription of spoken audio as a by-product of play. Our evaluation of the game indicates that it is both fun and challenging. The transcripts generated using the game are more accurate than those produced using a standard automatic transcription system and the time-stamps of words are within several hundred milliseconds of ground truth.

Research paper thumbnail of Element-free elastic models for volume fitting and capture

Abstract We present a new method of fitting an element-free volumetric model to a sequence of def... more Abstract We present a new method of fitting an element-free volumetric model to a sequence of deforming surfaces of a moving object. Given a sequence of visual hulls, we iteratively fit an element-free elastic model to the visual hull in order to extract the optimal pose of the captured volume. The fitting of the volumetric model is acheived by minimizing a combination of elastic potential energy, a surface distance measure, and a self-intersection penalty for each frame.

Research paper thumbnail of Feature weighting for segmentation

ABSTRACT This paper proposes the use of feature weights to reveal the hierarchical nature of musi... more ABSTRACT This paper proposes the use of feature weights to reveal the hierarchical nature of music audio. Feature weighting has been exploited in machine learning, but has not been applied to music audio segmentation. We describe both a global and a local approach to automatic feature weighting. The global approach assigns a single weighting to all features in a song. The local approach uses the local separability directly.

Research paper thumbnail of An annotation model for making sense of information quality in online video

Abstract Making sense of the information quality of online media including things such as the acc... more Abstract Making sense of the information quality of online media including things such as the accuracy and validity of claims and the reliability of sources is essential for people to be well-informed. We are developing Videolyzer to address the challenge of information quality sense-making by allowing motivated individuals to analyze, collect, share, and respond to criticisms of the information quality of online political videos and their transcripts.

Research paper thumbnail of Boosted audio-visual HMM for speech reading

Abstract We propose a new approach for combining acoustic and visual measurements to aid in recog... more Abstract We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability.