Kha Gia Quach | Concordia University (Canada) (original) (raw)

Papers by Kha Gia Quach

Research paper thumbnail of Multi-Camera Multiple 3D Object Tracking on the Move for Autonomous Vehicles

ArXiv, 2022

The development of autonomous vehicles provides an opportunity to have a complete set of camera s... more The development of autonomous vehicles provides an opportunity to have a complete set of camera sensors capturing the environment around the car. Thus, it is important for object detection and tracking to address new challenges, such as achieving consistent results across views of cameras. To address these challenges, this work presents a new Global Association Graph Model with Link Prediction approach to predict existing tracklets location and link detections with tracklets via cross-attention motion modeling and appearance re-identification. This approach aims at solving issues caused by inconsistent 3D object detection. Moreover, our model exploits to improve the detection accuracy of a standard 3D object detector in the nuScenes detection challenge. The experimental results on the nuScenes dataset demonstrate the benefits of the proposed method to produce SOTA performance on the existing vision-based tracking dataset.

Research paper thumbnail of A Multi-task Contextual Atrous Residual Network for Brain Tumor Detection & Segmentation

2020 25th International Conference on Pattern Recognition (ICPR)

In recent years, deep neural networks have achieved state-of-the-art performance in a variety of ... more In recent years, deep neural networks have achieved state-of-the-art performance in a variety of recognition and segmentation tasks in medical imaging including brain tumor segmentation. We investigate that segmenting a brain tumor is facing to the imbalanced data problem where the number of pixels belonging to the background class (non tumor pixel) is much larger than the number of pixels belonging to the foreground class (tumor pixel). To address this problem, we propose a multitask network which is formed as a cascaded structure. Our model consists of two targets, i.e., (i) effectively differentiate the brain tumor regions and (ii) estimate the brain tumor mask. The first objective is performed by our proposed contextual brain tumor detection network, which plays a role of an attention gate and focuses on the region around brain tumor only while ignoring the far neighbor background which is less correlated to the tumor. Different from other existing object detection networks which process every pixel, our contextual brain tumor detection network only processes contextual regions around ground-truth instances and this strategy aims at producing meaningful regions proposals. The second objective is built upon a 3D atrous residual network and under an encode-decode network in order to effectively segment both large and small objects (brain tumor). Our 3D atrous residual network is designed with a skip connection to enables the gradient from the deep layers to be directly propagated to shallow layers, thus, features of different depths are preserved and used for refining each other. In order to incorporate larger contextual information from volume MRI data, our network utilizes the 3D atrous convolution with various kernel sizes, which enlarges the receptive field of filters. Our proposed network has been evaluated on various datasets including BRATS2015, BRATS2017 and BRATS2018 datasets with both validation set and testing set. Our performance has been benchmarked by both regionbased metrics and surface-based metrics. We also have conducted comparisons against state-of-the-art approaches. 1

Research paper thumbnail of Lp Norm Relaxation Approach for Large Scale Data Analysis: A Review

Lecture Notes in Computer Science, 2018

Teaching and scientific research are two main tasks that interact which help university lecturers... more Teaching and scientific research are two main tasks that interact which help university lecturers improve their capacities and abilities in order to integrate with the scientific flow of the country, the region as well as the world. By approaching the data science, accurate assessments of the quantity, quality, and relationship between lecturers' scientific publications has been modeled based on published scientific data of the lecturers of University of Education in period 2010-2019. Techniques of data preparation, data analysis and data modeling were initially applied in the case of research as the system of published scientific data which has not been yet synchronized. These analytical results can be used as a basis for management levels, policy makers, and the process of developing scientific and technological capacity of officials and lecturers in the University.

Research paper thumbnail of Overcomplete Dictionary and Deep Learning Approachesto Image and Video Analysis

Extracting useful information while ignoring others (e.g. noise, occlusion, lighting) is an essen... more Extracting useful information while ignoring others (e.g. noise, occlusion, lighting) is an essential and challenging data analyzing step for many computer vision tasks such as facial recognition, scene reconstruction, event detection, image restoration, etc. Data analyzing of those tasks can be formulated as a form of matrix decomposition or factorization to separate useful and/or fill in missing information based on sparsity and/or low-rankness of the data. There has been an increasing number of non-convex approaches including conventional matrix norm optimizing and emerging deep learning models. However, it is hard to optimize the ideal l0-norm or learn the deep models directly and efficiently. Motivated from this challenging process, this thesis proposes two sets of approaches: conventional and deep learning based. For conventional approaches, this thesis proposes a novel online non-convex lp-norm based Robust PCA (OLP-RPCA) approach for matrix decomposition, where 0 < p <...

Research paper thumbnail of ShrinkTeaNet: Million-scale Lightweight Face Recognition via Shrinking Teacher-Student Networks

ArXiv, 2019

Large-scale face recognition in-the-wild has been recently achieved matured performance in many r... more Large-scale face recognition in-the-wild has been recently achieved matured performance in many real work applications. However, such systems are built on GPU platforms and mostly deploy heavy deep network architectures. Given a high-performance heavy network as a teacher, this work presents a simple and elegant teacher-student learning paradigm, namely ShrinkTeaNet, to train a portable student network that has significantly fewer parameters and competitive accuracy against the teacher network. Far apart from prior teacher-student frameworks mainly focusing on accuracy and compression ratios in closed-set problems, our proposed teacher-student network is proved to be more robust against open-set problem, i.e. large-scale face recognition. In addition, this work introduces a novel Angular Distillation Loss for distilling the feature direction and the sample distributions of the teacher's hypersphere to its student. Then ShrinkTeaNet framework can efficiently guide the student&#39...

Research paper thumbnail of Longitudinal Face Aging in the Wild - Recent Deep Learning Approaches

Face Aging has raised considerable attentions and interest from the computer vision community in ... more Face Aging has raised considerable attentions and interest from the computer vision community in recent years. Numerous approaches ranging from purely image processing techniques to deep learning structures have been proposed in literature. In this paper, we aim to give a review of recent developments of modern deep learning based approaches, i.e. Deep Generative Models, for Face Aging task. Their structures, formulation, learning algorithms as well as synthesized results are also provided with systematic discussions. Moreover, the aging databases used in most methods to learn the aging process are also reviewed.

Research paper thumbnail of Beyond Disentangled Representations: An Attentive Angular Distillation Approach to Large-scale Lightweight Age-Invariant Face Recognition

ArXiv, 2020

Disentangled representations have been commonly adopted to Age-invariant Face Recognition (AiFR) ... more Disentangled representations have been commonly adopted to Age-invariant Face Recognition (AiFR) tasks. However, these methods have reached some limitations with (1) the requirement of large-scale face recognition (FR) training data with age labels, which is limited in practice; (2) heavy deep network architecture for high performance; and (3) their evaluations are usually taken place on age-related face databases while neglecting the standard large-scale FR databases to guarantee its robustness. This work presents a novel Attentive Angular Distillation (AAD) approach to Large-scale Lightweight AiFR that overcomes these limitations. Given two high-performance heavy networks as teachers with different specialized knowledge, AAD introduces a learning paradigm to efficiently distill the age-invariant attentive and angular knowledge from those teachers to a lightweight student network making it more powerful with higher FR accuracy and robust against age factor. Consequently, AAD approa...

Research paper thumbnail of DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate Multi-Camera Multiple Object Tracking

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to it... more Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to its emerging applicability in several real-world applications. Despite a large number of existing works, solving the data association problem in any MC-MOT pipeline is arguably one of the most challenging tasks. Developing a robust MC-MOT system, however, is still highly challenging due to many practical issues such as inconsistent lighting conditions, varying object movement patterns, or the trajectory occlusions of the objects between the cameras. To address these problems, this work, therefore, proposes a new Dynamic Graph Model with Link Prediction (DyGLIP) approach 1 to solve the data association task. Compared to existing methods, our new model offers several advantages, including better feature representations and the ability to recover from lost tracks during camera transitions. Moreover, our model works gracefully regardless of the overlapping ratios between the cameras. Experimental results show that we outperform existing MC-MOT algorithms by a large margin on several practical datasets. Notably, our model works favorably on online settings but can be extended to an incremental approach for large-scale datasets.

Research paper thumbnail of Active Contour Model in Deep Learning Era: A Revise and Review

Applications of Hybrid Metaheuristic Algorithms for Image Processing, 2020

Active Contour (AC)-based segmentation has been widely used to solve many image processing proble... more Active Contour (AC)-based segmentation has been widely used to solve many image processing problems, specially image segmentation. While these AC-based methods offer object shape constraints, they typically look for strong edges or statistical modeling for successful segmentation. Clearly, AC-based approaches lack a way to work with labeled images in a supervised machine learning framework. Furthermore, they are unsupervised approaches and strongly depend on many parameters which are chosen by empirical results. Recently, Deep Learning (DL) has become the go-to method for solving many problems in various areas. Over the past decade, DL has achieved remarkable success in various artificial intelligence research areas. DL is supervised methods and requires large volume ground-truth. This paper first provides a fundamental of both Active Contour techniques and Deep Learning framework. We then present the state-of-the-art approaches of Active Contour techniques incorporating in Deep Learning framework.

Research paper thumbnail of MobiFace: A Lightweight Deep Learning Face Recognition on Mobile Devices

2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), 2019

Deep neural networks have been widely used in numerous computer vision applications, particularly... more Deep neural networks have been widely used in numerous computer vision applications, particularly in face recognition. However, deploying deep neural network face recognition on mobile devices has recently become a trend but still limited since most high-accuracy deep models are both time and GPU consumption in the inference stage. Therefore, developing a lightweight deep neural network is one of the most practical solutions to deploy face recognition on mobile devices. Such the lightweight deep neural network requires efficient memory with small number of weights representation and low cost operators. In this paper, a novel deep neural network named MobiFace, a simple but effective approach, is proposed for productively deploying face recognition on mobile devices. The experimental results have shown that our lightweight MobiFace is able to achieve high performance with 99.73% on LFW database and 91.3% on large-scale challenging Megaface database. It is also eventually competitive against large-scale deepnetworks face recognition while significant reducing computational time and memory consumption.

Research paper thumbnail of Automatic Face Aging in Videos via Deep Reinforcement Learning

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

This paper presents a novel approach to synthesize automatically age-progressed facial images in ... more This paper presents a novel approach to synthesize automatically age-progressed facial images in video sequences using Deep Reinforcement Learning. The proposed method models facial structures and the longitudinal face-aging process of given subjects coherently across video frames. The approach is optimized using a long-term reward, Reinforcement Learning function with deep feature extraction from Deep Convolutional Neural Network. Unlike previous age-progression methods that are only able to synthesize an aged likeness of a face from a single input image, the proposed approach is capable of age-progressing facial likenesses in videos with consistently synthesized facial features across frames. In addition, the deep reinforcement learning method guarantees preservation of the visual identity of input faces after age-progression. Results on videos of our new collected aging face AGFW-v2 database demonstrate the advantages of the proposed solution in terms of both quality of age-progressed faces, temporal smoothness, and cross-age face verification.

Research paper thumbnail of Vec2Face: Unveil Human Faces From Their Blackbox Features in Face Recognition

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Unveiling face images of a subject given his/her highlevel representations extracted from a black... more Unveiling face images of a subject given his/her highlevel representations extracted from a blackbox Face Recognition engine is extremely challenging. It is because the limitations of accessible information from that engine including its structure and uninterpretable extracted features. This paper presents a novel generative structure with Bijective Metric Learning, namely Bijective Generative Adversarial Networks in a Distillation framework (DiBiGAN), for synthesizing faces of an identity given that person's features. In order to effectively address this problem, this work firstly introduces a bijective metric so that the distance measurement and metric learning process can be directly adopted in image domain for an image reconstruction task. Secondly, a distillation process is introduced to maximize the information exploited from the blackbox face recognition engine. Then a Feature-Conditional Generator Structure with Exponential Weighting Strategy is presented for a more robust generator that can synthesize realistic faces with ID preservation. Results on several benchmarking datasets including CelebA, LFW, AgeDB, CFP-FP against matching engines have demonstrated the effectiveness of DiBiGAN on both image realism and ID preservation properties.

Research paper thumbnail of Temporal Non-volume Preserving Approach to Facial Age-Progression and Age-Invariant Face Recognition

2017 IEEE International Conference on Computer Vision (ICCV), 2017

Modeling the long-term facial aging process is extremely challenging due to the presence of large... more Modeling the long-term facial aging process is extremely challenging due to the presence of large and non-linear variations during the face development stages. In order to efficiently address the problem, this work first decomposes the aging process into multiple short-term stages. Then, a novel generative probabilistic model, named Temporal Non-Volume Preserving (TNVP) transformation, is presented to model the facial aging process at each stage. Unlike Generative Adversarial Networks (GANs), which requires an empirical balance threshold, and Restricted Boltzmann Machines (RBM), an intractable model, our proposed TNVP approach guarantees a tractable density function, exact inference and evaluation for embedding the feature transformations between faces in consecutive stages. Our model shows its advantages not only in capturing the non-linear age related variance in each stage but also producing a smooth synthesis in age progression across faces. Our approach can model any face in the wild provided with only four basic landmark points. Moreover, the structure can be transformed into a deep convolutional network while keeping the advantages of probabilistic models with tractable log-likelihood density estimation. Our method is evaluated in both terms of synthesizing age-progressed faces and cross-age face verification and consistently shows the state-of-the-art results in various face aging databases, i.e. FG-NET, MORPH, AginG Faces in the Wild (AGFW), and Cross-Age Celebrity Dataset (CACD). A large-scale face verification on Megaface challenge 1 is also performed to further show the advantages of our proposed approach.

Research paper thumbnail of Recurrent Level Set Networks for Instance Segmentation

Pattern Recognition - Selected Methods and Applications [Working Title], 2019

Level set (LS)-based segmentation has been widely used in medical imaging domain. It however has ... more Level set (LS)-based segmentation has been widely used in medical imaging domain. It however has some difficulty when dealing with multi-instance objects in the real world. Furthermore, LS's performance is generally quite sensitive to some initial settings and parameters such as the number of iterations. To address these issues and promote the classic LS methods to a new degree of performance in a trainable deep learning framework, we are presenting a novel approach contextual recurrent level sets (CRLS) for object instance segmentation. In the proposed networks, the curve deformation process is formed as a hidden state evolution procedure in gated recurrent units (GRUs) and updated by minimizing an energy functional composed of fitting forces and contour length.

Research paper thumbnail of Deep Appearance Models: A Deep Boltzmann Machine Approach for Face Modeling

International Journal of Computer Vision, 2018

The "interpretation through synthesis" approach to analyze face images, particularly Active Appea... more The "interpretation through synthesis" approach to analyze face images, particularly Active Appearance Models (AAMs) method, has become one of the most successful face modeling approaches over the last two decades. AAM models have ability to represent face images through synthesis using a controllable parameterized Principal Component Analysis (PCA) model. However, the accuracy and robustness of the synthesized faces of AAM are highly depended on the training sets and inherently on the generalizability of PCA subspaces. This paper presents a novel Deep Appearance Models (DAMs) approach, an efficient replacement for AAMs, to accurately capture both shape and texture of face images under large variations. In this approach, three crucial components represented in hierarchical layers are modeled using the Deep Boltzmann Machines (DBM) to robustly capture the variations of facial shapes and appearances. DAMs are therefore superior to AAMs in inferencing a representation for new face images under various challenging conditions. The proposed approach is evaluated in various applications to demonstrate its robustness and capabilities, i.e. facial super-resolution reconstruction, facial off-angle reconstruction or face frontalization, facial occlusion removal and age estimation using challenging face databases, i.e. Labeled Face Parts in the Wild (LFPW), Helen and FG-NET. Comparing to AAMs and other deep learning based approaches, the proposed DAMs achieve competitive results in those applications, thus this showed their advantages in handling occlusions, facial representation, and reconstruction.

Research paper thumbnail of Deep contextual recurrent residual networks for scene labeling

Pattern Recognition, 2018

Designed as extremely deep architectures, deep residual networks which provide a rich visual repr... more Designed as extremely deep architectures, deep residual networks which provide a rich visual representation and offer robust convergence behaviors have recently achieved exceptional performance in numerous computer vision problems. Being directly applied to a scene labeling problem, however, they were limited to capture long-range contextual dependence, which is a critical aspect. To address this issue, we propose a novel approach, Contextual Recurrent Residual Networks (CRRN) which is able to simultaneously handle rich visual representation learning and long-range context modeling within a fully end-to-end deep network. Furthermore, our proposed end-to-end CRRN is completely trained from scratch, without using any pre-trained models in contrast to most existing methods usually fine-tuned from the state-of-theart pre-trained models, e.g. VGG-16, ResNet, etc. The experiments are conducted on four challenging scene labeling datasets, i.e. Sift-Flow, CamVid, Stanford background and SUN datasets, and compared against various state-ofthe-art scene labeling methods.

Research paper thumbnail of Robust Deep Appearance Models

2016 23rd International Conference on Pattern Recognition (ICPR), 2016

This paper presents a novel Robust Deep Appearance Models to learn the non-linear correlation bet... more This paper presents a novel Robust Deep Appearance Models to learn the non-linear correlation between shape and texture of face images. In this approach, two crucial components of face images, i.e. shape and texture, are represented by Deep Boltzmann Machines and Robust Deep Boltzmann Machines (RDBM), respectively. The RDBM, an alternative form of Robust Boltzmann Machines, can separate corrupted/occluded pixels in the texture modeling to achieve better reconstruction results. The two models are connected by Restricted Boltzmann Machines at the top layer to jointly learn and capture the variations of both facial shapes and appearances. This paper also introduces new fitting algorithms with occlusion awareness through the mask obtained from the RDBM reconstruction. The proposed approach is evaluated in various applications by using challenging face datasets, i.e. Labeled Face Parts in the Wild (LFPW), Helen, EURECOM and AR databases, to demonstrate its robustness and capabilities.

Research paper thumbnail of Depth-based 3D hand pose tracking

2016 23rd International Conference on Pattern Recognition (ICPR), 2016

In this paper, we propose two new approaches using the Convolution Neural Network (CNN) and the R... more In this paper, we propose two new approaches using the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) for tracking 3D hand poses. The first approach is a detection based algorithm while the second is a data driven method. Our first contribution is a new trackingby-detection strategy extending the CNN based single frame detection method to a multiple frame tracking approach by taking into account prediction history using RNN. Our second contribution is the use of RNN to simulate the fitting of a 3D model to the input data. It helps to relax the need of a carefully designed fitting function and optimization algorithm. With such strategies, we show that our tracking frameworks can automatically correct the fail detection made in previous frames due to occlusions. Our proposed method is evaluated on two public hand datasets, i.e. NYU and ICVL, and compared against other recent hand tracking methods. Experimental results show that our approaches achieve the state-of-the-art accuracy and efficiency in the challenging problem of 3D hand pose estimation.

Research paper thumbnail of Non-convex online robust PCA: Enhance sparsity via ℓ p -norm minimization

Computer Vision and Image Understanding, 2017

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service... more This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Highlights • Two non-convex l p-norm (0 < p < 1) relaxation forms of the RPCA problem are proposed.

Research paper thumbnail of Longitudinal Face Modeling via Temporal Deep Restricted Boltzmann Machines

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

Modeling the face aging process is a challenging task due to large and non-linear variations pres... more Modeling the face aging process is a challenging task due to large and non-linear variations present in different stages of face development. This paper presents a deep model approach for face age progression that can efficiently capture the non-linear aging process and automatically synthesize a series of age-progressed faces in various age ranges. In this approach, we first decompose the longterm age progress into a sequence of short-term changes and model it as a face sequence. The Temporal Deep Restricted Boltzmann Machines based age progression model together with the prototype faces are then constructed to learn the aging transformation between faces in the sequence. In addition, to enhance the wrinkles of faces in the later age ranges, the wrinkle models are further constructed using Restricted Boltzmann Machines to capture their variations in different facial regions. The geometry constraints are also taken into account in the last step for more consistent age-progressed results. The proposed approach is evaluated using various face aging databases, i.e. FG-NET, Cross-Age Celebrity Dataset (CACD) and MORPH, and our collected large-scale aging database named AginG Faces in the Wild (AGFW). In addition, when ground-truth age is not available for input image, our proposed system is able to automatically estimate the age of the input face before aging process is employed.

Research paper thumbnail of Multi-Camera Multiple 3D Object Tracking on the Move for Autonomous Vehicles

ArXiv, 2022

The development of autonomous vehicles provides an opportunity to have a complete set of camera s... more The development of autonomous vehicles provides an opportunity to have a complete set of camera sensors capturing the environment around the car. Thus, it is important for object detection and tracking to address new challenges, such as achieving consistent results across views of cameras. To address these challenges, this work presents a new Global Association Graph Model with Link Prediction approach to predict existing tracklets location and link detections with tracklets via cross-attention motion modeling and appearance re-identification. This approach aims at solving issues caused by inconsistent 3D object detection. Moreover, our model exploits to improve the detection accuracy of a standard 3D object detector in the nuScenes detection challenge. The experimental results on the nuScenes dataset demonstrate the benefits of the proposed method to produce SOTA performance on the existing vision-based tracking dataset.

Research paper thumbnail of A Multi-task Contextual Atrous Residual Network for Brain Tumor Detection & Segmentation

2020 25th International Conference on Pattern Recognition (ICPR)

In recent years, deep neural networks have achieved state-of-the-art performance in a variety of ... more In recent years, deep neural networks have achieved state-of-the-art performance in a variety of recognition and segmentation tasks in medical imaging including brain tumor segmentation. We investigate that segmenting a brain tumor is facing to the imbalanced data problem where the number of pixels belonging to the background class (non tumor pixel) is much larger than the number of pixels belonging to the foreground class (tumor pixel). To address this problem, we propose a multitask network which is formed as a cascaded structure. Our model consists of two targets, i.e., (i) effectively differentiate the brain tumor regions and (ii) estimate the brain tumor mask. The first objective is performed by our proposed contextual brain tumor detection network, which plays a role of an attention gate and focuses on the region around brain tumor only while ignoring the far neighbor background which is less correlated to the tumor. Different from other existing object detection networks which process every pixel, our contextual brain tumor detection network only processes contextual regions around ground-truth instances and this strategy aims at producing meaningful regions proposals. The second objective is built upon a 3D atrous residual network and under an encode-decode network in order to effectively segment both large and small objects (brain tumor). Our 3D atrous residual network is designed with a skip connection to enables the gradient from the deep layers to be directly propagated to shallow layers, thus, features of different depths are preserved and used for refining each other. In order to incorporate larger contextual information from volume MRI data, our network utilizes the 3D atrous convolution with various kernel sizes, which enlarges the receptive field of filters. Our proposed network has been evaluated on various datasets including BRATS2015, BRATS2017 and BRATS2018 datasets with both validation set and testing set. Our performance has been benchmarked by both regionbased metrics and surface-based metrics. We also have conducted comparisons against state-of-the-art approaches. 1

Research paper thumbnail of Lp Norm Relaxation Approach for Large Scale Data Analysis: A Review

Lecture Notes in Computer Science, 2018

Teaching and scientific research are two main tasks that interact which help university lecturers... more Teaching and scientific research are two main tasks that interact which help university lecturers improve their capacities and abilities in order to integrate with the scientific flow of the country, the region as well as the world. By approaching the data science, accurate assessments of the quantity, quality, and relationship between lecturers' scientific publications has been modeled based on published scientific data of the lecturers of University of Education in period 2010-2019. Techniques of data preparation, data analysis and data modeling were initially applied in the case of research as the system of published scientific data which has not been yet synchronized. These analytical results can be used as a basis for management levels, policy makers, and the process of developing scientific and technological capacity of officials and lecturers in the University.

Research paper thumbnail of Overcomplete Dictionary and Deep Learning Approachesto Image and Video Analysis

Extracting useful information while ignoring others (e.g. noise, occlusion, lighting) is an essen... more Extracting useful information while ignoring others (e.g. noise, occlusion, lighting) is an essential and challenging data analyzing step for many computer vision tasks such as facial recognition, scene reconstruction, event detection, image restoration, etc. Data analyzing of those tasks can be formulated as a form of matrix decomposition or factorization to separate useful and/or fill in missing information based on sparsity and/or low-rankness of the data. There has been an increasing number of non-convex approaches including conventional matrix norm optimizing and emerging deep learning models. However, it is hard to optimize the ideal l0-norm or learn the deep models directly and efficiently. Motivated from this challenging process, this thesis proposes two sets of approaches: conventional and deep learning based. For conventional approaches, this thesis proposes a novel online non-convex lp-norm based Robust PCA (OLP-RPCA) approach for matrix decomposition, where 0 < p <...

Research paper thumbnail of ShrinkTeaNet: Million-scale Lightweight Face Recognition via Shrinking Teacher-Student Networks

ArXiv, 2019

Large-scale face recognition in-the-wild has been recently achieved matured performance in many r... more Large-scale face recognition in-the-wild has been recently achieved matured performance in many real work applications. However, such systems are built on GPU platforms and mostly deploy heavy deep network architectures. Given a high-performance heavy network as a teacher, this work presents a simple and elegant teacher-student learning paradigm, namely ShrinkTeaNet, to train a portable student network that has significantly fewer parameters and competitive accuracy against the teacher network. Far apart from prior teacher-student frameworks mainly focusing on accuracy and compression ratios in closed-set problems, our proposed teacher-student network is proved to be more robust against open-set problem, i.e. large-scale face recognition. In addition, this work introduces a novel Angular Distillation Loss for distilling the feature direction and the sample distributions of the teacher's hypersphere to its student. Then ShrinkTeaNet framework can efficiently guide the student&#39...

Research paper thumbnail of Longitudinal Face Aging in the Wild - Recent Deep Learning Approaches

Face Aging has raised considerable attentions and interest from the computer vision community in ... more Face Aging has raised considerable attentions and interest from the computer vision community in recent years. Numerous approaches ranging from purely image processing techniques to deep learning structures have been proposed in literature. In this paper, we aim to give a review of recent developments of modern deep learning based approaches, i.e. Deep Generative Models, for Face Aging task. Their structures, formulation, learning algorithms as well as synthesized results are also provided with systematic discussions. Moreover, the aging databases used in most methods to learn the aging process are also reviewed.

Research paper thumbnail of Beyond Disentangled Representations: An Attentive Angular Distillation Approach to Large-scale Lightweight Age-Invariant Face Recognition

ArXiv, 2020

Disentangled representations have been commonly adopted to Age-invariant Face Recognition (AiFR) ... more Disentangled representations have been commonly adopted to Age-invariant Face Recognition (AiFR) tasks. However, these methods have reached some limitations with (1) the requirement of large-scale face recognition (FR) training data with age labels, which is limited in practice; (2) heavy deep network architecture for high performance; and (3) their evaluations are usually taken place on age-related face databases while neglecting the standard large-scale FR databases to guarantee its robustness. This work presents a novel Attentive Angular Distillation (AAD) approach to Large-scale Lightweight AiFR that overcomes these limitations. Given two high-performance heavy networks as teachers with different specialized knowledge, AAD introduces a learning paradigm to efficiently distill the age-invariant attentive and angular knowledge from those teachers to a lightweight student network making it more powerful with higher FR accuracy and robust against age factor. Consequently, AAD approa...

Research paper thumbnail of DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate Multi-Camera Multiple Object Tracking

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to it... more Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to its emerging applicability in several real-world applications. Despite a large number of existing works, solving the data association problem in any MC-MOT pipeline is arguably one of the most challenging tasks. Developing a robust MC-MOT system, however, is still highly challenging due to many practical issues such as inconsistent lighting conditions, varying object movement patterns, or the trajectory occlusions of the objects between the cameras. To address these problems, this work, therefore, proposes a new Dynamic Graph Model with Link Prediction (DyGLIP) approach 1 to solve the data association task. Compared to existing methods, our new model offers several advantages, including better feature representations and the ability to recover from lost tracks during camera transitions. Moreover, our model works gracefully regardless of the overlapping ratios between the cameras. Experimental results show that we outperform existing MC-MOT algorithms by a large margin on several practical datasets. Notably, our model works favorably on online settings but can be extended to an incremental approach for large-scale datasets.

Research paper thumbnail of Active Contour Model in Deep Learning Era: A Revise and Review

Applications of Hybrid Metaheuristic Algorithms for Image Processing, 2020

Active Contour (AC)-based segmentation has been widely used to solve many image processing proble... more Active Contour (AC)-based segmentation has been widely used to solve many image processing problems, specially image segmentation. While these AC-based methods offer object shape constraints, they typically look for strong edges or statistical modeling for successful segmentation. Clearly, AC-based approaches lack a way to work with labeled images in a supervised machine learning framework. Furthermore, they are unsupervised approaches and strongly depend on many parameters which are chosen by empirical results. Recently, Deep Learning (DL) has become the go-to method for solving many problems in various areas. Over the past decade, DL has achieved remarkable success in various artificial intelligence research areas. DL is supervised methods and requires large volume ground-truth. This paper first provides a fundamental of both Active Contour techniques and Deep Learning framework. We then present the state-of-the-art approaches of Active Contour techniques incorporating in Deep Learning framework.

Research paper thumbnail of MobiFace: A Lightweight Deep Learning Face Recognition on Mobile Devices

2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), 2019

Deep neural networks have been widely used in numerous computer vision applications, particularly... more Deep neural networks have been widely used in numerous computer vision applications, particularly in face recognition. However, deploying deep neural network face recognition on mobile devices has recently become a trend but still limited since most high-accuracy deep models are both time and GPU consumption in the inference stage. Therefore, developing a lightweight deep neural network is one of the most practical solutions to deploy face recognition on mobile devices. Such the lightweight deep neural network requires efficient memory with small number of weights representation and low cost operators. In this paper, a novel deep neural network named MobiFace, a simple but effective approach, is proposed for productively deploying face recognition on mobile devices. The experimental results have shown that our lightweight MobiFace is able to achieve high performance with 99.73% on LFW database and 91.3% on large-scale challenging Megaface database. It is also eventually competitive against large-scale deepnetworks face recognition while significant reducing computational time and memory consumption.

Research paper thumbnail of Automatic Face Aging in Videos via Deep Reinforcement Learning

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

This paper presents a novel approach to synthesize automatically age-progressed facial images in ... more This paper presents a novel approach to synthesize automatically age-progressed facial images in video sequences using Deep Reinforcement Learning. The proposed method models facial structures and the longitudinal face-aging process of given subjects coherently across video frames. The approach is optimized using a long-term reward, Reinforcement Learning function with deep feature extraction from Deep Convolutional Neural Network. Unlike previous age-progression methods that are only able to synthesize an aged likeness of a face from a single input image, the proposed approach is capable of age-progressing facial likenesses in videos with consistently synthesized facial features across frames. In addition, the deep reinforcement learning method guarantees preservation of the visual identity of input faces after age-progression. Results on videos of our new collected aging face AGFW-v2 database demonstrate the advantages of the proposed solution in terms of both quality of age-progressed faces, temporal smoothness, and cross-age face verification.

Research paper thumbnail of Vec2Face: Unveil Human Faces From Their Blackbox Features in Face Recognition

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Unveiling face images of a subject given his/her highlevel representations extracted from a black... more Unveiling face images of a subject given his/her highlevel representations extracted from a blackbox Face Recognition engine is extremely challenging. It is because the limitations of accessible information from that engine including its structure and uninterpretable extracted features. This paper presents a novel generative structure with Bijective Metric Learning, namely Bijective Generative Adversarial Networks in a Distillation framework (DiBiGAN), for synthesizing faces of an identity given that person's features. In order to effectively address this problem, this work firstly introduces a bijective metric so that the distance measurement and metric learning process can be directly adopted in image domain for an image reconstruction task. Secondly, a distillation process is introduced to maximize the information exploited from the blackbox face recognition engine. Then a Feature-Conditional Generator Structure with Exponential Weighting Strategy is presented for a more robust generator that can synthesize realistic faces with ID preservation. Results on several benchmarking datasets including CelebA, LFW, AgeDB, CFP-FP against matching engines have demonstrated the effectiveness of DiBiGAN on both image realism and ID preservation properties.

Research paper thumbnail of Temporal Non-volume Preserving Approach to Facial Age-Progression and Age-Invariant Face Recognition

2017 IEEE International Conference on Computer Vision (ICCV), 2017

Modeling the long-term facial aging process is extremely challenging due to the presence of large... more Modeling the long-term facial aging process is extremely challenging due to the presence of large and non-linear variations during the face development stages. In order to efficiently address the problem, this work first decomposes the aging process into multiple short-term stages. Then, a novel generative probabilistic model, named Temporal Non-Volume Preserving (TNVP) transformation, is presented to model the facial aging process at each stage. Unlike Generative Adversarial Networks (GANs), which requires an empirical balance threshold, and Restricted Boltzmann Machines (RBM), an intractable model, our proposed TNVP approach guarantees a tractable density function, exact inference and evaluation for embedding the feature transformations between faces in consecutive stages. Our model shows its advantages not only in capturing the non-linear age related variance in each stage but also producing a smooth synthesis in age progression across faces. Our approach can model any face in the wild provided with only four basic landmark points. Moreover, the structure can be transformed into a deep convolutional network while keeping the advantages of probabilistic models with tractable log-likelihood density estimation. Our method is evaluated in both terms of synthesizing age-progressed faces and cross-age face verification and consistently shows the state-of-the-art results in various face aging databases, i.e. FG-NET, MORPH, AginG Faces in the Wild (AGFW), and Cross-Age Celebrity Dataset (CACD). A large-scale face verification on Megaface challenge 1 is also performed to further show the advantages of our proposed approach.

Research paper thumbnail of Recurrent Level Set Networks for Instance Segmentation

Pattern Recognition - Selected Methods and Applications [Working Title], 2019

Level set (LS)-based segmentation has been widely used in medical imaging domain. It however has ... more Level set (LS)-based segmentation has been widely used in medical imaging domain. It however has some difficulty when dealing with multi-instance objects in the real world. Furthermore, LS's performance is generally quite sensitive to some initial settings and parameters such as the number of iterations. To address these issues and promote the classic LS methods to a new degree of performance in a trainable deep learning framework, we are presenting a novel approach contextual recurrent level sets (CRLS) for object instance segmentation. In the proposed networks, the curve deformation process is formed as a hidden state evolution procedure in gated recurrent units (GRUs) and updated by minimizing an energy functional composed of fitting forces and contour length.

Research paper thumbnail of Deep Appearance Models: A Deep Boltzmann Machine Approach for Face Modeling

International Journal of Computer Vision, 2018

The "interpretation through synthesis" approach to analyze face images, particularly Active Appea... more The "interpretation through synthesis" approach to analyze face images, particularly Active Appearance Models (AAMs) method, has become one of the most successful face modeling approaches over the last two decades. AAM models have ability to represent face images through synthesis using a controllable parameterized Principal Component Analysis (PCA) model. However, the accuracy and robustness of the synthesized faces of AAM are highly depended on the training sets and inherently on the generalizability of PCA subspaces. This paper presents a novel Deep Appearance Models (DAMs) approach, an efficient replacement for AAMs, to accurately capture both shape and texture of face images under large variations. In this approach, three crucial components represented in hierarchical layers are modeled using the Deep Boltzmann Machines (DBM) to robustly capture the variations of facial shapes and appearances. DAMs are therefore superior to AAMs in inferencing a representation for new face images under various challenging conditions. The proposed approach is evaluated in various applications to demonstrate its robustness and capabilities, i.e. facial super-resolution reconstruction, facial off-angle reconstruction or face frontalization, facial occlusion removal and age estimation using challenging face databases, i.e. Labeled Face Parts in the Wild (LFPW), Helen and FG-NET. Comparing to AAMs and other deep learning based approaches, the proposed DAMs achieve competitive results in those applications, thus this showed their advantages in handling occlusions, facial representation, and reconstruction.

Research paper thumbnail of Deep contextual recurrent residual networks for scene labeling

Pattern Recognition, 2018

Designed as extremely deep architectures, deep residual networks which provide a rich visual repr... more Designed as extremely deep architectures, deep residual networks which provide a rich visual representation and offer robust convergence behaviors have recently achieved exceptional performance in numerous computer vision problems. Being directly applied to a scene labeling problem, however, they were limited to capture long-range contextual dependence, which is a critical aspect. To address this issue, we propose a novel approach, Contextual Recurrent Residual Networks (CRRN) which is able to simultaneously handle rich visual representation learning and long-range context modeling within a fully end-to-end deep network. Furthermore, our proposed end-to-end CRRN is completely trained from scratch, without using any pre-trained models in contrast to most existing methods usually fine-tuned from the state-of-theart pre-trained models, e.g. VGG-16, ResNet, etc. The experiments are conducted on four challenging scene labeling datasets, i.e. Sift-Flow, CamVid, Stanford background and SUN datasets, and compared against various state-ofthe-art scene labeling methods.

Research paper thumbnail of Robust Deep Appearance Models

2016 23rd International Conference on Pattern Recognition (ICPR), 2016

This paper presents a novel Robust Deep Appearance Models to learn the non-linear correlation bet... more This paper presents a novel Robust Deep Appearance Models to learn the non-linear correlation between shape and texture of face images. In this approach, two crucial components of face images, i.e. shape and texture, are represented by Deep Boltzmann Machines and Robust Deep Boltzmann Machines (RDBM), respectively. The RDBM, an alternative form of Robust Boltzmann Machines, can separate corrupted/occluded pixels in the texture modeling to achieve better reconstruction results. The two models are connected by Restricted Boltzmann Machines at the top layer to jointly learn and capture the variations of both facial shapes and appearances. This paper also introduces new fitting algorithms with occlusion awareness through the mask obtained from the RDBM reconstruction. The proposed approach is evaluated in various applications by using challenging face datasets, i.e. Labeled Face Parts in the Wild (LFPW), Helen, EURECOM and AR databases, to demonstrate its robustness and capabilities.

Research paper thumbnail of Depth-based 3D hand pose tracking

2016 23rd International Conference on Pattern Recognition (ICPR), 2016

In this paper, we propose two new approaches using the Convolution Neural Network (CNN) and the R... more In this paper, we propose two new approaches using the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) for tracking 3D hand poses. The first approach is a detection based algorithm while the second is a data driven method. Our first contribution is a new trackingby-detection strategy extending the CNN based single frame detection method to a multiple frame tracking approach by taking into account prediction history using RNN. Our second contribution is the use of RNN to simulate the fitting of a 3D model to the input data. It helps to relax the need of a carefully designed fitting function and optimization algorithm. With such strategies, we show that our tracking frameworks can automatically correct the fail detection made in previous frames due to occlusions. Our proposed method is evaluated on two public hand datasets, i.e. NYU and ICVL, and compared against other recent hand tracking methods. Experimental results show that our approaches achieve the state-of-the-art accuracy and efficiency in the challenging problem of 3D hand pose estimation.

Research paper thumbnail of Non-convex online robust PCA: Enhance sparsity via ℓ p -norm minimization

Computer Vision and Image Understanding, 2017

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service... more This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Highlights • Two non-convex l p-norm (0 < p < 1) relaxation forms of the RPCA problem are proposed.

Research paper thumbnail of Longitudinal Face Modeling via Temporal Deep Restricted Boltzmann Machines

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

Modeling the face aging process is a challenging task due to large and non-linear variations pres... more Modeling the face aging process is a challenging task due to large and non-linear variations present in different stages of face development. This paper presents a deep model approach for face age progression that can efficiently capture the non-linear aging process and automatically synthesize a series of age-progressed faces in various age ranges. In this approach, we first decompose the longterm age progress into a sequence of short-term changes and model it as a face sequence. The Temporal Deep Restricted Boltzmann Machines based age progression model together with the prototype faces are then constructed to learn the aging transformation between faces in the sequence. In addition, to enhance the wrinkles of faces in the later age ranges, the wrinkle models are further constructed using Restricted Boltzmann Machines to capture their variations in different facial regions. The geometry constraints are also taken into account in the last step for more consistent age-progressed results. The proposed approach is evaluated using various face aging databases, i.e. FG-NET, Cross-Age Celebrity Dataset (CACD) and MORPH, and our collected large-scale aging database named AginG Faces in the Wild (AGFW). In addition, when ground-truth age is not available for input image, our proposed system is able to automatically estimate the age of the input face before aging process is employed.