Dimitris Metaxas | Rutgers, The State University of New Jersey (original) (raw)
Papers by Dimitris Metaxas
Lecture Notes in Computer Science, Dec 31, 2022
Springer eBooks, 2011
In this paper, we propose a method to segment multiple rodent brain structures simultaneously. Th... more In this paper, we propose a method to segment multiple rodent brain structures simultaneously. This method combines deformable models and hierarchical shape priors within one framework. The deformation module employs both gradient and appearance information to generate image forces to deform the shape. The shape prior module uses Principal Component Analysis to hierarchically model the multiple structures at both global and local levels. At the global level, the statistics of relative positions among different structures are modeled. At the local level, the shape statistics within each structure is learned from training samples. Our segmentation method adaptively employs both priors to constrain the intermediate deformation result. This prior constraint improves the robustness of the model and benefits the segmentation accuracy. Another merit of our prior module is that the size of the training data can be small, because the shape prior module models each structure individually and combines them using global statistics. This scheme can preserve shape details better than directly applying PCA on all structures. We use this method to segment rodent brain structures, such as the cerebellum, the left and right striatum, and the left and right hippocampus. The experiments show that our method works effectively and this hierarchical prior improves the segmentation performance.
In this paper, we propose a data privacy-preserving and communication efficient distributed GAN l... more In this paper, we propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN). Our proposed framework aims to train a central generator learns from distributed discriminator, and use the generated synthetic image solely to train the segmentation model. We validate the proposed framework on the application of health entities learning problem which is known to be privacy sensitive. Our experiments show that our approach: 1) could learn the real image's distribution from multiple datasets without sharing the patient's raw data. 2) is more efficient and requires lower bandwidth than other distributed deep learning methods. 3) achieves higher performance compared to the model trained by one real dataset, and almost the same performance compared to the model trained by all real datasets. 4) has provable guarantees that the generator could learn the distributed distribution in an all important fashion thus is unbiased.We release our AsynDGAN source code at: https://github.com/tommyqichang/AsynDGAN * equal contribution
Nuclei segmentation is a fundamental task in histopathological image analysis. Typically, such se... more Nuclei segmentation is a fundamental task in histopathological image analysis. Typically, such segmentation tasks require significant effort to manually generate pixel-wise annotations for fully supervised training. To alleviate the manual effort, in this paper we propose a novel approach using points only annotation. Two types of coarse labels with complementary information are derived from the points annotation, and are then utilized to train a deep neural network. The fullyconnected conditional random field loss is utilized to further refine the model without introducing extra computational complexity during inference. Experimental results on two nuclei segmentation datasets reveal that the proposed method is able to achieve competitive performance compared to the fully supervised counterpart and the state-of-the-art methods while requiring significantly less annotation effort. Our code is publicly available 1 .
Springer eBooks, 2020
In this work, we propose a method for training distributed GAN with sequential temporary discrimi... more In this work, we propose a method for training distributed GAN with sequential temporary discriminators. Our proposed method tackles the challenge of training GAN in the federated learning manner: How to update the generator with a flow of temporary discriminators? We apply our proposed method to learn a self-adaptive generator with a series of local discriminators from multiple data centers. We show our design of loss function indeed learns the correct distribution with provable guarantees. The empirical experiments show that our approach is capable of generating synthetic data which is practical for real-world applications such as training a segmentation model. Our TDGAN Code is
Medical Image Analysis, Dec 1, 2000
Right ventricular (RV) dysfunction can serve as an indicator of heart and lung disease and can ad... more Right ventricular (RV) dysfunction can serve as an indicator of heart and lung disease and can adversely affect the left ventricle (LV). However, normal RV function must be characterized before abnormal states can be detected. We can describe a method for reconstructing the 3D motion of the RV images by fitting of a deformable model to extracted tag and contour data from multiview tagged magnetic resonance images(MRI). The deformable model is a biventricular finite element mesh built directly from the contours. Our approach accommodates the geometrically complex RV by using the entire lengths of the tags, localized degrees of freedom (DOFs), and finite elements for geometric modeling. We convert the results of the reconstruction into potentially useful motion variables, such as strains and displacements. The fitting technique is applied to synthetic data, two normal hearts, and a heart with right ventricular hypertrophy (RVH). The results in this paper are limited to the RV free wall and septum. We find noticeable differences between the motion variables calculated for the normal volunteers and the RVH patient.
arXiv (Cornell University), Aug 25, 2020
Accurate estimation of shape thickness from medical images is crucial in clinical applications. F... more Accurate estimation of shape thickness from medical images is crucial in clinical applications. For example, the thickness of myocardium is one of the key to cardiac disease diagnosis. While mathematical models are available to obtain accurate dense thickness estimation, they suffer from heavy computational overhead due to iterative solvers. To this end, we propose novel methods for dense thickness estimation, including a fast solver that estimates thickness from binary annular shapes and an end-to-end network that estimates thickness directly from raw cardiac images.We test the proposed models on three cardiac datasets and one synthetic dataset, achieving impressive results and generalizability on all. Thickness estimation is performed without iterative solvers or manual correction, which is 100× faster than the mathematical model. We also analyze thickness patterns on different cardiac pathologies with a standard clinical model and the results demonstrate the potential clinical value of our method for thickness based cardiac disease diagnosis.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Oct 1, 2009
Lecture Notes in Computer Science, Jun 14, 2022
Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental in building statistica... more Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental in building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high intersubject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an endto-end latent-space-based framework, DeepRecon, that generates multiple clinically essential outcomes, including accurate image segmentation, synthetic high-resolution 3D image, and 3D reconstructed volume. Our method identifies the optimal latent representation of the cine image that contains accurate semantic information for cardiac structures. In particular, our model jointly generates synthetic images with accurate semantic information and segmentation of the cardiac structures using the optimal latent representation. We further explore downstream applications of 3D shape reconstruction and 4D motion pattern adaptation by the different latent-space manipulation strategies. The simultaneously generated high-resolution images present a high interpretable value to assess the cardiac shape and motion. Experimental results demonstrate the effectiveness of our approach on multiple fronts including 2D segmentation, 3D reconstruction, downstream 4D motion pattern adaption performance.
From 11.06.06 to 16.06.06, the Dagstuhl Seminar 06241 ``Human Motion - Understanding, Modeling, C... more From 11.06.06 to 16.06.06, the Dagstuhl Seminar 06241 ``Human Motion - Understanding, Modeling, Capture and Animation. 13th Workshop "Theoretical Foundations of Computer Vision"'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general.
arXiv (Cornell University), Mar 5, 2022
Top-down instance segmentation framework has shown its superiority in object detection compared t... more Top-down instance segmentation framework has shown its superiority in object detection compared to the bottom-up framework. While it is efficient in addressing over-segmentation, top-down instance segmentation suffers from over-crop problem. However, a complete segmentation mask is crucial for biological image analysis as it delivers important morphological properties such as shapes and volumes. In this paper, we propose a region proposal rectification (RPR) module to address this challenging incomplete segmentation problem. In particular, we offer a progressive ROIAlign module to introduce neighbor information into a series of ROIs gradually. The ROI features are fed into an attentive feed-forward network (FFN) for proposal box regression. With additional neighbor information, the proposed RPR module shows significant improvement in correction of region proposal locations and thereby exhibits favorable instance segmentation performances on three biological image datasets compared to state-of-the-art baseline methods. Experimental results demonstrate that the proposed RPR module is effective in both anchor-based and anchor-free top-down instance segmentation approaches, suggesting the proposed method can be applied to general top-down instance segmentation of biological images.
arXiv (Cornell University), Mar 21, 2022
Combining information from multi-view images is crucial to improve the performance and robustness... more Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms. In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining, addressing the critical issue of capturing long-range correlations between unaligned data from different image views. We further propose the Multi-Scale Attention (MSA) to collect global correspondence of multi-scale feature representations. We evaluate TransFusion on the Multi-Disease, Multi-View & Multi-Center Right Ventricular Segmentation in Cardiac MRI (M&Ms-2) challenge cohort. TransFusion demonstrates leading performance against the state-of-the-art methods and opens up new perspectives for multi-view imaging integration towards robust medical image segmentation.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
StyleGANs have shown impressive results on data generation and manipulation in recent years, than... more StyleGANs have shown impressive results on data generation and manipulation in recent years, thanks to its disentangled style latent space. A lot of efforts have been made in inverting a pretrained generator, where an encoder is trained ad hoc after the generator is trained in a two-stage fashion. In this paper, we focus on style-based generators asking a scientific question: Does forcing such a generator to reconstruct real data lead to more disentangled latent space and make the inversion process from image to latent space easy? We describe a new methodology to train a style-based autoencoder where the encoder and generator are optimized end-to-end. We show that our proposed model consistently outperforms baselines in terms of image inversion and generation quality. Supplementary, code, and pretrained models are available on the project website 1 .
This dissertation explores original techniques for the construction of hypergraph models for comp... more This dissertation explores original techniques for the construction of hypergraph models for computer vision applications. A hypergraph is a generalization of a pairwise simple graph, where an edge can connect any number of vertices. The expressive power of the hypergraph models places a special emphasis on the relationship among three or more objects, which has made hypergraphs better models of choice in a lot of problems. This is in sharp contrast with the more conventional graph representation of visual patterns where only pairwise connectivity between objects is described. The contribution of this thesis is fourfold: (i) For the first time the advantage of the hypergraph neighborhood structure is analyzed. We argue that the summarized local grouping information contained in hypergraphs causes an ‘averaging’ effect which is beneficial to the clustering problems, just as local image smoothing may be beneficial to the image segmentation task. (ii) We discuss how to build hypergraph...
Today, sparsity techniques have been widely used to address practical problems in the fields of m... more Today, sparsity techniques have been widely used to address practical problems in the fields of medical imaging, machine learning, computer vision, data mining, compressive sensing, image processing, video analysis and multimedia. We will briefly introduce the related sparsity techniques and their successful applications on compressive sensing, sparse learning, computer vision and medical imaging. Then, we propose a new concept called strong group sparsity to develop a theory for group Lasso, which shows that group Lasso is superior to standard Lasso for strongly group-sparse data. It provides a convincing theoretical justification for using group sparsity regularization when the underlying group structure is consistent with the data. Moreover, the theory also predicts the limitations of the group Lasso formulations. To address those limitations, we further build a new framework called structured sparsity, which is a natural extension of the standard sparsity concept in statistical ...
Face tracking has numerous applications in the field of Human Computer Interaction and behavior u... more Face tracking has numerous applications in the field of Human Computer Interaction and behavior understanding in general. Yet, face tracking is a difficult problem because the tracker must generalize to new faces, adapt to changing illumination, keep up with fast motions and pose changes, and tolerate target occlusion. We first present our efforts to develop a system for probabilistic face tracking, using anthropometric and appearance constraints. We then move onto the focus of our work, which is the application of the face tracker to two interesting recognition problems. Firstly, given that sign language is used as a primary means of communication by deaf individuals and as augmentative communication by hearing individuals with a variety of disabilities, the development of robust, real-time sign language recognition technologies would be a major step forward in making computers equally accessible to everyone. However, most research in the field of sign language recognition has focu...
ArXiv, 2020
In this work, we propose a method for training distributed GAN with sequential temporary discrimi... more In this work, we propose a method for training distributed GAN with sequential temporary discriminators. Our proposed method tackles the challenge of training GAN in the federated learning manner: How to update the generator with a flow of temporary discriminators? We apply our proposed method to learn a self-adaptive generator with a series of local discriminators from multiple data centers. We show our design of loss function indeed learns the correct distribution with provable guarantees. The empirical experiments show that our approach is capable of generating synthetic data which is practical for real-world applications such as training a segmentation model.
We study the problem of transferring facial expressions from one face to another. Direct copying ... more We study the problem of transferring facial expressions from one face to another. Direct copying and blending face components using existing methods results in semantically unnatural composites, since expression is a global effect and a local face component in one expression is often incompatible with the shape and other components in another expression. To solve this problem we present the expression flow method, which is a 2D flow field that can warp the target face globally. We develop a shape fitting algorithm, which jointly constructs 3D face shapes to the input images with the same identity but different expressions. The expression flow is computed by projecting the difference between the two 3D shapes to 2D image plane. We apply our algorithms in several applications including face compositing, face morphing, video stitching, and facial expression exaggeration. Our system is able to generate faces with much higher fidelity than existing methods.
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
The ability to reliably perceive the environmental states, particularly the existence of objects ... more The ability to reliably perceive the environmental states, particularly the existence of objects and their motion behavior, is crucial for autonomous driving. In this work, we propose an efficient deep model, called MotionNet, to jointly perform perception and motion prediction from 3D point clouds. MotionNet takes a sequence of LiDAR sweeps as input and outputs a bird's eye view (BEV) map, which encodes the object category and motion information in each grid cell. The backbone of MotionNet is a novel spatiotemporal pyramid network, which extracts deep spatial and temporal features in a hierarchical fashion. To enforce the smoothness of predictions over both space and time, the training of MotionNet is further regularized with novel spatial and temporal consistency losses. Extensive experiments show that the proposed method overall outperforms the state-of-the-arts, including the latest scene-flow-and 3D-object-detection-based methods. This indicates the potential value of the proposed method serving as a backup to the bounding-box-based system, and providing complementary information to the motion planner in autonomous driving.
IEEE Transactions on Medical Imaging, 2020
Nuclei segmentation is a fundamental task in histopathology image analysis. Typically, such segme... more Nuclei segmentation is a fundamental task in histopathology image analysis. Typically, such segmentation tasks require significant effort to manually generate accurate pixel-wise annotations for fully supervised training. To alleviate such tedious and manual effort, in this paper we propose a novel weakly supervised segmentation framework based on partial points annotation, i.e., only a small portion of nuclei locations in each image are labeled. The framework consists of two learning stages. In the first stage, we design a semi-supervised strategy to learn a detection model from partially labeled nuclei locations. Specifically, an extended Gaussian mask is designed to train an initial model with partially labeled data. Then, selftraining with background propagation is proposed to make use of the unlabeled regions to boost nuclei detection and suppress false positives. In the second stage, a segmentation model is trained from the detected nuclei locations in a weakly-supervised fashion. Two types of coarse labels with complementary information are derived from the detected points and are then utilized to train a deep neural network. The fully-connected conditional random field loss is utilized in training to further refine the model without introducing extra computational complexity during inference. The proposed method is extensively evaluated on two nuclei segmentation datasets. The experimental results demonstrate that our method can achieve competitive performance compared to the fully supervised counterpart and the state-of-the-art methods while requiring significantly less annotation effort.
Lecture Notes in Computer Science, Dec 31, 2022
Springer eBooks, 2011
In this paper, we propose a method to segment multiple rodent brain structures simultaneously. Th... more In this paper, we propose a method to segment multiple rodent brain structures simultaneously. This method combines deformable models and hierarchical shape priors within one framework. The deformation module employs both gradient and appearance information to generate image forces to deform the shape. The shape prior module uses Principal Component Analysis to hierarchically model the multiple structures at both global and local levels. At the global level, the statistics of relative positions among different structures are modeled. At the local level, the shape statistics within each structure is learned from training samples. Our segmentation method adaptively employs both priors to constrain the intermediate deformation result. This prior constraint improves the robustness of the model and benefits the segmentation accuracy. Another merit of our prior module is that the size of the training data can be small, because the shape prior module models each structure individually and combines them using global statistics. This scheme can preserve shape details better than directly applying PCA on all structures. We use this method to segment rodent brain structures, such as the cerebellum, the left and right striatum, and the left and right hippocampus. The experiments show that our method works effectively and this hierarchical prior improves the segmentation performance.
In this paper, we propose a data privacy-preserving and communication efficient distributed GAN l... more In this paper, we propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN). Our proposed framework aims to train a central generator learns from distributed discriminator, and use the generated synthetic image solely to train the segmentation model. We validate the proposed framework on the application of health entities learning problem which is known to be privacy sensitive. Our experiments show that our approach: 1) could learn the real image's distribution from multiple datasets without sharing the patient's raw data. 2) is more efficient and requires lower bandwidth than other distributed deep learning methods. 3) achieves higher performance compared to the model trained by one real dataset, and almost the same performance compared to the model trained by all real datasets. 4) has provable guarantees that the generator could learn the distributed distribution in an all important fashion thus is unbiased.We release our AsynDGAN source code at: https://github.com/tommyqichang/AsynDGAN * equal contribution
Nuclei segmentation is a fundamental task in histopathological image analysis. Typically, such se... more Nuclei segmentation is a fundamental task in histopathological image analysis. Typically, such segmentation tasks require significant effort to manually generate pixel-wise annotations for fully supervised training. To alleviate the manual effort, in this paper we propose a novel approach using points only annotation. Two types of coarse labels with complementary information are derived from the points annotation, and are then utilized to train a deep neural network. The fullyconnected conditional random field loss is utilized to further refine the model without introducing extra computational complexity during inference. Experimental results on two nuclei segmentation datasets reveal that the proposed method is able to achieve competitive performance compared to the fully supervised counterpart and the state-of-the-art methods while requiring significantly less annotation effort. Our code is publicly available 1 .
Springer eBooks, 2020
In this work, we propose a method for training distributed GAN with sequential temporary discrimi... more In this work, we propose a method for training distributed GAN with sequential temporary discriminators. Our proposed method tackles the challenge of training GAN in the federated learning manner: How to update the generator with a flow of temporary discriminators? We apply our proposed method to learn a self-adaptive generator with a series of local discriminators from multiple data centers. We show our design of loss function indeed learns the correct distribution with provable guarantees. The empirical experiments show that our approach is capable of generating synthetic data which is practical for real-world applications such as training a segmentation model. Our TDGAN Code is
Medical Image Analysis, Dec 1, 2000
Right ventricular (RV) dysfunction can serve as an indicator of heart and lung disease and can ad... more Right ventricular (RV) dysfunction can serve as an indicator of heart and lung disease and can adversely affect the left ventricle (LV). However, normal RV function must be characterized before abnormal states can be detected. We can describe a method for reconstructing the 3D motion of the RV images by fitting of a deformable model to extracted tag and contour data from multiview tagged magnetic resonance images(MRI). The deformable model is a biventricular finite element mesh built directly from the contours. Our approach accommodates the geometrically complex RV by using the entire lengths of the tags, localized degrees of freedom (DOFs), and finite elements for geometric modeling. We convert the results of the reconstruction into potentially useful motion variables, such as strains and displacements. The fitting technique is applied to synthetic data, two normal hearts, and a heart with right ventricular hypertrophy (RVH). The results in this paper are limited to the RV free wall and septum. We find noticeable differences between the motion variables calculated for the normal volunteers and the RVH patient.
arXiv (Cornell University), Aug 25, 2020
Accurate estimation of shape thickness from medical images is crucial in clinical applications. F... more Accurate estimation of shape thickness from medical images is crucial in clinical applications. For example, the thickness of myocardium is one of the key to cardiac disease diagnosis. While mathematical models are available to obtain accurate dense thickness estimation, they suffer from heavy computational overhead due to iterative solvers. To this end, we propose novel methods for dense thickness estimation, including a fast solver that estimates thickness from binary annular shapes and an end-to-end network that estimates thickness directly from raw cardiac images.We test the proposed models on three cardiac datasets and one synthetic dataset, achieving impressive results and generalizability on all. Thickness estimation is performed without iterative solvers or manual correction, which is 100× faster than the mathematical model. We also analyze thickness patterns on different cardiac pathologies with a standard clinical model and the results demonstrate the potential clinical value of our method for thickness based cardiac disease diagnosis.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Oct 1, 2009
Lecture Notes in Computer Science, Jun 14, 2022
Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental in building statistica... more Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental in building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high intersubject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an endto-end latent-space-based framework, DeepRecon, that generates multiple clinically essential outcomes, including accurate image segmentation, synthetic high-resolution 3D image, and 3D reconstructed volume. Our method identifies the optimal latent representation of the cine image that contains accurate semantic information for cardiac structures. In particular, our model jointly generates synthetic images with accurate semantic information and segmentation of the cardiac structures using the optimal latent representation. We further explore downstream applications of 3D shape reconstruction and 4D motion pattern adaptation by the different latent-space manipulation strategies. The simultaneously generated high-resolution images present a high interpretable value to assess the cardiac shape and motion. Experimental results demonstrate the effectiveness of our approach on multiple fronts including 2D segmentation, 3D reconstruction, downstream 4D motion pattern adaption performance.
From 11.06.06 to 16.06.06, the Dagstuhl Seminar 06241 ``Human Motion - Understanding, Modeling, C... more From 11.06.06 to 16.06.06, the Dagstuhl Seminar 06241 ``Human Motion - Understanding, Modeling, Capture and Animation. 13th Workshop "Theoretical Foundations of Computer Vision"'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general.
arXiv (Cornell University), Mar 5, 2022
Top-down instance segmentation framework has shown its superiority in object detection compared t... more Top-down instance segmentation framework has shown its superiority in object detection compared to the bottom-up framework. While it is efficient in addressing over-segmentation, top-down instance segmentation suffers from over-crop problem. However, a complete segmentation mask is crucial for biological image analysis as it delivers important morphological properties such as shapes and volumes. In this paper, we propose a region proposal rectification (RPR) module to address this challenging incomplete segmentation problem. In particular, we offer a progressive ROIAlign module to introduce neighbor information into a series of ROIs gradually. The ROI features are fed into an attentive feed-forward network (FFN) for proposal box regression. With additional neighbor information, the proposed RPR module shows significant improvement in correction of region proposal locations and thereby exhibits favorable instance segmentation performances on three biological image datasets compared to state-of-the-art baseline methods. Experimental results demonstrate that the proposed RPR module is effective in both anchor-based and anchor-free top-down instance segmentation approaches, suggesting the proposed method can be applied to general top-down instance segmentation of biological images.
arXiv (Cornell University), Mar 21, 2022
Combining information from multi-view images is crucial to improve the performance and robustness... more Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms. In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining, addressing the critical issue of capturing long-range correlations between unaligned data from different image views. We further propose the Multi-Scale Attention (MSA) to collect global correspondence of multi-scale feature representations. We evaluate TransFusion on the Multi-Disease, Multi-View & Multi-Center Right Ventricular Segmentation in Cardiac MRI (M&Ms-2) challenge cohort. TransFusion demonstrates leading performance against the state-of-the-art methods and opens up new perspectives for multi-view imaging integration towards robust medical image segmentation.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
StyleGANs have shown impressive results on data generation and manipulation in recent years, than... more StyleGANs have shown impressive results on data generation and manipulation in recent years, thanks to its disentangled style latent space. A lot of efforts have been made in inverting a pretrained generator, where an encoder is trained ad hoc after the generator is trained in a two-stage fashion. In this paper, we focus on style-based generators asking a scientific question: Does forcing such a generator to reconstruct real data lead to more disentangled latent space and make the inversion process from image to latent space easy? We describe a new methodology to train a style-based autoencoder where the encoder and generator are optimized end-to-end. We show that our proposed model consistently outperforms baselines in terms of image inversion and generation quality. Supplementary, code, and pretrained models are available on the project website 1 .
This dissertation explores original techniques for the construction of hypergraph models for comp... more This dissertation explores original techniques for the construction of hypergraph models for computer vision applications. A hypergraph is a generalization of a pairwise simple graph, where an edge can connect any number of vertices. The expressive power of the hypergraph models places a special emphasis on the relationship among three or more objects, which has made hypergraphs better models of choice in a lot of problems. This is in sharp contrast with the more conventional graph representation of visual patterns where only pairwise connectivity between objects is described. The contribution of this thesis is fourfold: (i) For the first time the advantage of the hypergraph neighborhood structure is analyzed. We argue that the summarized local grouping information contained in hypergraphs causes an ‘averaging’ effect which is beneficial to the clustering problems, just as local image smoothing may be beneficial to the image segmentation task. (ii) We discuss how to build hypergraph...
Today, sparsity techniques have been widely used to address practical problems in the fields of m... more Today, sparsity techniques have been widely used to address practical problems in the fields of medical imaging, machine learning, computer vision, data mining, compressive sensing, image processing, video analysis and multimedia. We will briefly introduce the related sparsity techniques and their successful applications on compressive sensing, sparse learning, computer vision and medical imaging. Then, we propose a new concept called strong group sparsity to develop a theory for group Lasso, which shows that group Lasso is superior to standard Lasso for strongly group-sparse data. It provides a convincing theoretical justification for using group sparsity regularization when the underlying group structure is consistent with the data. Moreover, the theory also predicts the limitations of the group Lasso formulations. To address those limitations, we further build a new framework called structured sparsity, which is a natural extension of the standard sparsity concept in statistical ...
Face tracking has numerous applications in the field of Human Computer Interaction and behavior u... more Face tracking has numerous applications in the field of Human Computer Interaction and behavior understanding in general. Yet, face tracking is a difficult problem because the tracker must generalize to new faces, adapt to changing illumination, keep up with fast motions and pose changes, and tolerate target occlusion. We first present our efforts to develop a system for probabilistic face tracking, using anthropometric and appearance constraints. We then move onto the focus of our work, which is the application of the face tracker to two interesting recognition problems. Firstly, given that sign language is used as a primary means of communication by deaf individuals and as augmentative communication by hearing individuals with a variety of disabilities, the development of robust, real-time sign language recognition technologies would be a major step forward in making computers equally accessible to everyone. However, most research in the field of sign language recognition has focu...
ArXiv, 2020
In this work, we propose a method for training distributed GAN with sequential temporary discrimi... more In this work, we propose a method for training distributed GAN with sequential temporary discriminators. Our proposed method tackles the challenge of training GAN in the federated learning manner: How to update the generator with a flow of temporary discriminators? We apply our proposed method to learn a self-adaptive generator with a series of local discriminators from multiple data centers. We show our design of loss function indeed learns the correct distribution with provable guarantees. The empirical experiments show that our approach is capable of generating synthetic data which is practical for real-world applications such as training a segmentation model.
We study the problem of transferring facial expressions from one face to another. Direct copying ... more We study the problem of transferring facial expressions from one face to another. Direct copying and blending face components using existing methods results in semantically unnatural composites, since expression is a global effect and a local face component in one expression is often incompatible with the shape and other components in another expression. To solve this problem we present the expression flow method, which is a 2D flow field that can warp the target face globally. We develop a shape fitting algorithm, which jointly constructs 3D face shapes to the input images with the same identity but different expressions. The expression flow is computed by projecting the difference between the two 3D shapes to 2D image plane. We apply our algorithms in several applications including face compositing, face morphing, video stitching, and facial expression exaggeration. Our system is able to generate faces with much higher fidelity than existing methods.
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
The ability to reliably perceive the environmental states, particularly the existence of objects ... more The ability to reliably perceive the environmental states, particularly the existence of objects and their motion behavior, is crucial for autonomous driving. In this work, we propose an efficient deep model, called MotionNet, to jointly perform perception and motion prediction from 3D point clouds. MotionNet takes a sequence of LiDAR sweeps as input and outputs a bird's eye view (BEV) map, which encodes the object category and motion information in each grid cell. The backbone of MotionNet is a novel spatiotemporal pyramid network, which extracts deep spatial and temporal features in a hierarchical fashion. To enforce the smoothness of predictions over both space and time, the training of MotionNet is further regularized with novel spatial and temporal consistency losses. Extensive experiments show that the proposed method overall outperforms the state-of-the-arts, including the latest scene-flow-and 3D-object-detection-based methods. This indicates the potential value of the proposed method serving as a backup to the bounding-box-based system, and providing complementary information to the motion planner in autonomous driving.
IEEE Transactions on Medical Imaging, 2020
Nuclei segmentation is a fundamental task in histopathology image analysis. Typically, such segme... more Nuclei segmentation is a fundamental task in histopathology image analysis. Typically, such segmentation tasks require significant effort to manually generate accurate pixel-wise annotations for fully supervised training. To alleviate such tedious and manual effort, in this paper we propose a novel weakly supervised segmentation framework based on partial points annotation, i.e., only a small portion of nuclei locations in each image are labeled. The framework consists of two learning stages. In the first stage, we design a semi-supervised strategy to learn a detection model from partially labeled nuclei locations. Specifically, an extended Gaussian mask is designed to train an initial model with partially labeled data. Then, selftraining with background propagation is proposed to make use of the unlabeled regions to boost nuclei detection and suppress false positives. In the second stage, a segmentation model is trained from the detected nuclei locations in a weakly-supervised fashion. Two types of coarse labels with complementary information are derived from the detected points and are then utilized to train a deep neural network. The fully-connected conditional random field loss is utilized in training to further refine the model without introducing extra computational complexity during inference. The proposed method is extensively evaluated on two nuclei segmentation datasets. The experimental results demonstrate that our method can achieve competitive performance compared to the fully supervised counterpart and the state-of-the-art methods while requiring significantly less annotation effort.