Real-time Facial Expression Recognition “In The Wild” by Disentangling 3D Expression from Identity (original) (raw)

3D-CNN for Facial Emotion Recognition in Videos

Advances in Visual Computing

In this paper, we present a video-based emotion recognition neural network operating on three dimensions. We show that 3D convolutional neural networks (3D-CNN) can be very good for predicting facial emotions that are expressed over a sequence of frames. We optimize the 3D-CNN architecture through hyper-parameters search, and prove that this has a very strong influence on the results, even if architecture tuning of 3D CNNs has not been much addressed in the literature. Our proposed resulting architecture improves over the results of the state-ofthe-art techniques when tested on the CK+ and Oulu-CASIA datasets. We compare the results with cross-validation methods. The designed 3D-CNN yields a 97.56% using Leave-One-Subject-Out cross-validation, and 100% using 10-fold cross-validation on the CK+ dataset, and 84.17% using 10-fold cross-validation on the Oulu-CASIA dataset.

Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks

Deep Neural Networks (DNNs) have shown to outper-form traditional methods in various visual recognition tasks including Facial Expression Recognition (FER). In spite of efforts made to improve the accuracy of FER systems using DNN, existing methods still are not generalizable enough in practical applications. This paper proposes a 3D Con-volutional Neural Network method for FER in videos. This new network architecture consists of 3D Inception-ResNet layers followed by an LSTM unit that together extracts the spatial relations within facial images as well as the temporal relations between different frames in the video. Facial landmark points are also used as inputs to our network which emphasize on the importance of facial components rather than the facial regions that may not contribute significantly to generating facial expressions. Our proposed method is evaluated using four publicly available databases in subject-independent and cross-database tasks and out-performs state-of-the-art methods.

3D Facial Expression Recognition Using Multi-channel Deep Learning Framework

Circuits, Systems, and Signal Processing

Facial expression offers an important way of detecting the affective state of a human being. It plays a major role in various fields such as the estimation of students' attention level in online education, intelligent transportation systems and interactive games. This paper proposes a facial expression recognition system in which two channels of featured images are used to represent a 3D facial scan. Features are extracted from the local binary pattern and local directional pattern using a fine-tuned pre-trained AlexNet and a shallow convolutional neural network. The feature sets are then fused together using canonical correlation analysis. The fused feature set is fed into a multisupport vector machine (mSVM) classifier to classify the expressions into seven basic categories: anger, disgust, fear, happiness, neutral, sadness and surprise. Experiments were carried out on the Bosphorus database using tenfold cross-validation with mutually exclusive training and testing samples. The results show an average accuracy of 87.69% using an mSVM classifier with a polynomial kernel and demonstrate that the system performs better by characterizing the peculiarities in facial expressions than alternative state-of-the-art approaches.

The Florence 4D Facial Expression Dataset

2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)

Human facial expressions change dynamically, so their recognition / analysis should be conducted by accounting for the temporal evolution of face deformations either in 2D or 3D. While abundant 2D video data do exist, this is not the case in 3D, where few 3D dynamic (4D) datasets were released for public use. The negative consequence of this scarcity of data is amplified by current deep learning based-methods for facial expression analysis that require large quantities of variegate samples to be effectively trained. With the aim of smoothing such limitations, in this paper we propose a large dataset, named Florence 4D, composed of dynamic sequences of 3D face models, where a combination of synthetic and real identities exhibit an unprecedented variety of 4D facial expressions, with variations that include the classical neutralapex transition, but generalize to expression-to-expression. All these characteristics are not exposed by any of the existing 4D datasets and they cannot even be obtained by combining more than one dataset. We strongly believe that making such a data corpora publicly available to the community will allow designing and experimenting new applications that were not possible to investigate till now. To show at some extent the difficulty of our data in terms of different identities and varying expressions, we also report a baseline experimentation on the proposed dataset that can be used as baseline.

Facial Emotion Recognition Using 3D Face Reconstruction

2021

In recent days, autonomous driving systems (ADS) effectively utilize facial emotion recognition (FER) results for safe driving. In FER, the system provides the user emotions such as happy, sad, anger, surprise, disgust, fear, or neutral. These emotions provide helpful information for safe driving and reduce the chances of road accidents. The conventional FER approaches use 2D images as their inputs and classify the user emotions. However, the 2D face images in the conventional FER approaches have limited features for model training. In addition, the features from the 2D face images themselves are not sufficient for accurate emotion classification. To reduce the feature extraction issues in the conventional FER approaches, we propose a 3D face image-based FER approach that uses the 3D face reconstruction technique for converting the 2D face images into 3D face images. The deep convolutional neural networks (DCNNs) used in the proposed FER approach efficiently use the 3D face images a...

Facial expression recognition in the wild using rich deep features

2015 IEEE International Conference on Image Processing (ICIP), 2015

Facial Expression Recognition is an active area of research in computer vision with a wide range of applications. Several approaches have been developed to solve this problem for different benchmark datasets. However, Facial Expression Recognition in the wild remains an area where much work is still needed to serve real-world applications. To this end, in this paper we present a novel approach towards facial expression recognition. We fuse rich deep features with domain knowledge through encoding discriminant facial patches. We conduct experiments on two of the most popular benchmark datasets; CK and TFE. Moreover, we present a novel dataset that, unlike its precedents, consists of natural-not acted-expression images. Experimental results show that our approach achieves state-of-the-art results over standard benchmarks and our own dataset.

Facial Affect Estimation in the Wild Using Deep Residual and Convolutional Networks

Automated affective computing in the wild is a challenging task in the field of computer vision. This paper presents three neural network-based methods proposed for the task of facial affect estimation submitted to the First Affect-in-the-Wild challenge. These methods are based on Inception-ResNet modules redesigned specifically for the task of facial affect estimation. These methods are: Shallow Inception-ResNet, Deep Inception-ResNet, and Inception-ResNet with LSTMs. These networks extract facial features in different scales and simultaneously estimate both the valence and arousal in each frame. Root Mean Square Error (RMSE) rates of 0.4 and 0.3 are achieved for the valence and arousal respectively with corresponding Concordance Correlation Coefficient (CCC) rates of 0.04 and 0.29 using Deep Inception-ResNet method.

A recursive framework for expression recognition: from web images to deep models to game dataset

Machine Vision and Applications, 2018

In this paper, we propose a recursive framework to recognize facial expressions from images in real scenes. Unlike traditional approaches that typically focus on developing and refining algorithms for improving recognition performance on an existing dataset, we integrate three important components in a recursive manner: facial dataset generation, facial expression recognition model building, and interactive interfaces for testing and new data collection. To start with, we first create a candid-images-for-facial-expression (CIFE) dataset. We then apply a convolutional neural network (CNN) to CIFE and build a CNN model for web image expression classification. In order to increase the expression recognition accuracy, we also fine-tune the CNN model and thus obtain a better CNN facial expression recognition model. Based on the fine-tuned CNN model, we design a facial expression game engine and collect a new and more balanced dataset, GaMo. The images of this dataset are collected from the different expressions our game users make when playing the game. Finally, we

Accurate Facial Parts Localization and Deep Learning for 3D Facial Expression Recognition

2018

Meaningful facial parts can convey key cues for both facial action unit detection and expression prediction. Textured 3D face scan can provide both detailed 3D geometric shape and 2D texture appearance cues of the face which are beneficial for Facial Expression Recognition (FER). However, accurate facial parts extraction as well as their fusion are challenging tasks. In this paper, a novel system for 3D FER is designed based on accurate facial parts extraction and deep feature fusion of facial parts. Experiments are conducted on the BU-3DFE database, demonstrating the effectiveness of combing different facial parts, texture and depth cues and reporting the state-of-the-art results in comparison with all existing methods under the same setting.

Taking facial expression recognition outside the lab and into the wild by using challenging datasets and improved performance metrics

F1000Research

Background: Facial expression recognition is a challenging field, evident by the ineffectiveness of current state-of-the-art techniques that aim to classify facial expressions. Despite showing high levels of accuracy, these methods perform poorly in real-life implementation. This poor performance is because the training sets used are usually simple, limited, and in a controlled lab environment. Methods: This paper explores newer datasets that consist of images taken in challenging conditions with many variations. Using such datasets improves the accuracy of classification because it exposes the model to a variety of samples. In addition, we used new performance metrics to reflect the challenging conditions for classification. We reviewed the current best techniques for expression recognition and laid out a method to design an improved deep neural network using AffectNet, a newer and more challenging dataset. The implementation method is an iterative process that trains a convolution...