Longxiu Huang - Academia.edu (original) (raw)
Papers by Longxiu Huang
Journal of Mathematical Analysis and Applications, Oct 1, 2019
We investigate systems of the form {A t g : g ∈ G, t ∈ [0, L]} where A ∈ B(H) is a normal operato... more We investigate systems of the form {A t g : g ∈ G, t ∈ [0, L]} where A ∈ B(H) is a normal operator in a separable Hilbert space H, G ⊂ H is a countable set, and L is a positive real number. Although the main goal of this work is to study the frame properties of {A t g : g ∈ G, t ∈ [0, L]}, as intermediate steps, we explore the completeness and Bessel properties of such systems from a theoretical perspective, which are of interest by themselves. Beside the theoretical appeal of investigating such systems, their connections to dynamical and mobile sampling make them fundamental for understanding and solving several major problems in engineering and science. 2010 Mathematics Subject Classification. 46N99, 42C15, 94O20. Key words and phrases. continuous frames, semi-continuous frames, dynamical sampling, discretization of continuous frames, frames induced by continuous powers of operators. The authors were supported in part by NSF Grant DMS-1322099.
arXiv (Cornell University), Aug 21, 2019
The CUR decomposition is a factorization of a low-rank matrix obtained by selecting certain colum... more The CUR decomposition is a factorization of a low-rank matrix obtained by selecting certain column and row submatrices of it. We perform a thorough investigation of what happens to such decompositions in the presence of noise. Since CUR decompositions are non-uniquely formed, we investigate several variants and give perturbation estimates for each in terms of the magnitude of the noise matrix in a broad class of norms which includes all Schatten p-norms. The estimates given here are qualitative and illustrate how the choice of columns and rows affects the quality of the approximation, and additionally we obtain new state-of-the-art bounds for some variants of CUR approximations.
arXiv (Cornell University), Sep 7, 2020
A dataset of COVID-19-related scientific literature is compiled, combining the articles from seve... more A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics. We discover eight major latent topics and 52 granular subtopics in the body of literature, related to vaccines,genetic structure and modeling of the disease and patient studies, as well as related diseases and virology. In order that our tool may help current researchers, an interactive website is created that organizes available literature using this hierarchical structure. 1 http://covid-19-literature-clustering.net/
arXiv (Cornell University), Jan 31, 2022
Classification and topic modeling are popular techniques in machine learning that extract informa... more Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF).
arXiv (Cornell University), Jul 29, 2019
This note discusses an interesting matrix factorization called the CUR Decomposition. We illustra... more This note discusses an interesting matrix factorization called the CUR Decomposition. We illustrate various viewpoints of this method by comparing and contrasting them in different situations. Additionally, we offer a new characterization of CUR decompositions which synergizes these viewpoints and shows that they are indeed the same in the exact decomposition case.
arXiv (Cornell University), May 6, 2023
We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of... more We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of matrix robust principal component analysis (RPCA), that aims to split the given tensor into an underlying low-rank component and a sparse outlier component. This work proposes a fast algorithm, called Robust Tensor CUR Decompositions (RTCUR), for large-scale non-convex TRPCA problems under the Tucker rank setting. RTCUR is developed within a framework of alternating projections that projects between the set of low-rank tensors and the set of sparse tensors. We utilize the recently developed tensor CUR decomposition to substantially reduce the computational complexity in each projection. In addition, we develop four variants of RTCUR for different application settings. We demonstrate the effectiveness and computational advantages of RTCUR against state-of-the-art methods on both synthetic and real-world datasets.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Aug 1, 2023
While uniform sampling has been widely studied in the matrix completion literature, CUR sampling ... more While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.
arXiv (Cornell University), Jan 30, 2018
We investigate systems of the form {A t g : g ∈ G, t ∈ [0, L]} where A ∈ B(H) is a normal operato... more We investigate systems of the form {A t g : g ∈ G, t ∈ [0, L]} where A ∈ B(H) is a normal operator in a separable Hilbert space H, G ⊂ H is a countable set, and L is a positive real number. Although the main goal of this work is to study the frame properties of {A t g : g ∈ G, t ∈ [0, L]}, as intermediate steps, we explore the completeness and Bessel properties of such systems from a theoretical perspective, which are of interest by themselves. Beside the theoretical appeal of investigating such systems, their connections to dynamical and mobile sampling make them fundamental for understanding and solving several major problems in engineering and science. 2010 Mathematics Subject Classification. 46N99, 42C15, 94O20. Key words and phrases. continuous frames, semi-continuous frames, dynamical sampling, discretization of continuous frames, frames induced by continuous powers of operators. The authors were supported in part by NSF Grant DMS-1322099.
arXiv (Cornell University), Jan 8, 2020
This article studies how to form CUR decompositions of low-rank matrices via primarily random sam... more This article studies how to form CUR decompositions of low-rank matrices via primarily random sampling, though deterministic methods due to previous works are illustrated as well. The primary problem is to determine when a column submatrix of a rank k matrix also has rank k. For random column sampling schemes, there is typically a tradeoff between the number of columns needed to be chosen and the complexity of determining the sampling probabilities. We discuss several sampling methods and their complexities as well as stability of the method under perturbations of both the probabilities and the underlying matrix. As an application, we give a high probability guarantee of the exact solution of the Subspace Clustering Problem via CUR decompositions when columns are sampled according to their Euclidean lengths.
arXiv (Cornell University), Mar 17, 2023
arXiv (Cornell University), Apr 28, 2020
We analyze the problem of reconstruction of a bandlimited function f from the space-time samples ... more We analyze the problem of reconstruction of a bandlimited function f from the space-time samples of its states f t " φ t˚f resulting from the convolution with a kernel φ t. It is well-known that, in natural phenomena, uniform space-time samples of f are not sufficient to reconstruct f in a stable way. To enable stable reconstruction, a space-time sampling with periodic nonuniformly spaced samples must be used as was shown by Lu and Vetterli. We show that the stability of reconstruction, as measured by a condition number, controls the maximal gap between the spacial samples. We provide a quantitative statement of this result. In addition, instead of irregular space-time samples, we show that uniform dynamical samples at sub-Nyquist spatial rate allow one to stably reconstruct the function p f away from certain, explicitly described blind spots. We also consider several classes of finite dimensional subsets of bandlimited functions in which the stable reconstruction is possible, even inside the blind spots. We obtain quantitative estimates for it using Remez-Turán type inequalities. En route, we obtain a Remez-Turán inequality for prolate spheroidal wave functions. To illustrate our results, we present some numerics and explicit estimates for the heat flow problem.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Tensor completion is an important problem in modern data analysis. In this work, we investigate a... more Tensor completion is an important problem in modern data analysis. In this work, we investigate a specific sampling strategy, referred to as tubal sampling. We propose two novel non-convex tensor completion frameworks that are easy to implement, named tensor L 1-L 2 (TL12) and tensor completion via CUR (TCCUR). We test the efficiency of both methods on synthetic data and a color image inpainting problem. Empirical results reveal a trade-off between the accuracy and time efficiency of these two methods in a low sampling ratio. Each of them outperforms some classical completion methods in at least one aspect.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We consider the space-time sampling and reconstruction of sparse bandlimited graph signals driven... more We consider the space-time sampling and reconstruction of sparse bandlimited graph signals driven by a heat diffusion process. In this paper, we develop a sampling framework consisting of selecting a small subset of space-time nodes at random according to some probability distribution, generalizing the classical variable density sampling to the heat diffusion field. We show that the number of space-time samples required to ensure stable recovery depends on an incoherence parameter determined by the interplay between graph topology, temporal dynamics, and sampling probability distributions. In optimal scenarios, as few as O(s log k) space-time samples are sufficient to ensure accurate recovery of all kbandlimited graph signals that are additionally s-sparse. Our proposed sampling method requires much fewer spatial samples than the static case by leveraging temporal information. Finally, we test our sampling techniques on a wide variety of graphs. The numerical results on synthetic and real climate data sets support our theoretical findings and demonstrate the practical applicability.
arXiv (Cornell University), Aug 23, 2021
We study the problem of tensor robust principal component analysis (TRPCA), which aims to separat... more We study the problem of tensor robust principal component analysis (TRPCA), which aims to separate an underlying low-multilinear-rank tensor and a sparse outlier tensor from their sum. In this work, we propose a fast non-convex algorithm, coined Robust Tensor CUR (RTCUR), for largescale TRPCA problems. RTCUR considers a framework of alternating projections and utilizes the recently developed tensor Fiber CUR decomposition to dramatically lower the computational complexity. The performance advantage of RTCUR is empirically verified against the state-of-the-arts on the synthetic datasets and is further demonstrated on the real-world application such as color video background subtraction.
Journal of Mathematical Analysis and Applications
Dynamical sampling is a new area in sampling theory that deals with signals that evolve over time... more Dynamical sampling is a new area in sampling theory that deals with signals that evolve over time under the action of a linear operator. There are lots of studies on various aspects of the dynamical sampling problem. However, they all focus on uniform discrete time-sets T ⊂ N. In our first paper, we concentrate on the case T = [0, L]. The goal of the present work is to study the frame property of the systems {Atg : g ∈ G}, t ∈ [0, L]}. To this end, we also characterize the completeness and Besselness properties of these systems. In our second paper, we consider dynamical sampling when the samples are corrupted by additive noises. The purpose of the second paper is to analyze the performance of the basic dynamical sampling algorithms in the finite dimensional case and study the impact of additive noise. The algorithms are implemented and tested on synthetic and real data sets, and denoising techniques are integrated to mitigate the effect of the noise. We also develop theoretical and...
While uniform sampling has been widely studied in the matrix completion literature, CUR sampling ... more While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.
Algorithms
Classification and topic modeling are popular techniques in machine learning that extract informa... more Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a novel method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method on legal documents provided by the California Innocence Project and the 20 Newsgroups dataset. Our results show that the proposed method improves both classification accuracy and topic coherence in comparison to past methods such as Semi-Supervised Non-negative Matrix Factorization (SSNMF), Guided Non-...
Developing large-scale distributed methods that are robust to the presence of adversarial or corr... more Developing large-scale distributed methods that are robust to the presence of adversarial or corrupted workers is an important part of making such methods practical for real-world problems. Here, we propose an iterative approach that is adversary-tolerant for least-squares problems. The algorithm utilizes simple statistics to guarantee convergence and is capable of learning the adversarial distributions. Additionally, the efficiency of the proposed method is shown in simulations in the presence of adversaries. The results demonstrate the great capability of such methods to tolerate different levels of adversary rates and to identify the erroneous workers with high accuracy.
Heat diffusion processes have found wide applications in modelling dynamical systems over graphs.... more Heat diffusion processes have found wide applications in modelling dynamical systems over graphs. In this paper, we consider the recovery of a k-bandlimited graph signal that is an initial signal of a heat diffusion process from its space-time samples. We propose three random space-time sampling regimes, termed dynamical sampling techniques, that consist in selecting a small subset of space-time nodes at random according to some probability distribution. We show that the number of space-time samples required to ensure stable recovery for each regime depends on a parameter called the spectral graph weighted coherence, that depends on the interplay between the dynamics over the graphs and sampling probability distributions. In optimal scenarios, no more than 𝒪(k log(k)) space-time samples are sufficient to ensure accurate and stable recovery of all k-bandlimited signals. In any case, dynamical sampling typically requires much fewer spatial samples than the static case by leveraging th...
Journal of Mathematical Analysis and Applications, Oct 1, 2019
We investigate systems of the form {A t g : g ∈ G, t ∈ [0, L]} where A ∈ B(H) is a normal operato... more We investigate systems of the form {A t g : g ∈ G, t ∈ [0, L]} where A ∈ B(H) is a normal operator in a separable Hilbert space H, G ⊂ H is a countable set, and L is a positive real number. Although the main goal of this work is to study the frame properties of {A t g : g ∈ G, t ∈ [0, L]}, as intermediate steps, we explore the completeness and Bessel properties of such systems from a theoretical perspective, which are of interest by themselves. Beside the theoretical appeal of investigating such systems, their connections to dynamical and mobile sampling make them fundamental for understanding and solving several major problems in engineering and science. 2010 Mathematics Subject Classification. 46N99, 42C15, 94O20. Key words and phrases. continuous frames, semi-continuous frames, dynamical sampling, discretization of continuous frames, frames induced by continuous powers of operators. The authors were supported in part by NSF Grant DMS-1322099.
arXiv (Cornell University), Aug 21, 2019
The CUR decomposition is a factorization of a low-rank matrix obtained by selecting certain colum... more The CUR decomposition is a factorization of a low-rank matrix obtained by selecting certain column and row submatrices of it. We perform a thorough investigation of what happens to such decompositions in the presence of noise. Since CUR decompositions are non-uniquely formed, we investigate several variants and give perturbation estimates for each in terms of the magnitude of the noise matrix in a broad class of norms which includes all Schatten p-norms. The estimates given here are qualitative and illustrate how the choice of columns and rows affects the quality of the approximation, and additionally we obtain new state-of-the-art bounds for some variants of CUR approximations.
arXiv (Cornell University), Sep 7, 2020
A dataset of COVID-19-related scientific literature is compiled, combining the articles from seve... more A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics. We discover eight major latent topics and 52 granular subtopics in the body of literature, related to vaccines,genetic structure and modeling of the disease and patient studies, as well as related diseases and virology. In order that our tool may help current researchers, an interactive website is created that organizes available literature using this hierarchical structure. 1 http://covid-19-literature-clustering.net/
arXiv (Cornell University), Jan 31, 2022
Classification and topic modeling are popular techniques in machine learning that extract informa... more Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF).
arXiv (Cornell University), Jul 29, 2019
This note discusses an interesting matrix factorization called the CUR Decomposition. We illustra... more This note discusses an interesting matrix factorization called the CUR Decomposition. We illustrate various viewpoints of this method by comparing and contrasting them in different situations. Additionally, we offer a new characterization of CUR decompositions which synergizes these viewpoints and shows that they are indeed the same in the exact decomposition case.
arXiv (Cornell University), May 6, 2023
We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of... more We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of matrix robust principal component analysis (RPCA), that aims to split the given tensor into an underlying low-rank component and a sparse outlier component. This work proposes a fast algorithm, called Robust Tensor CUR Decompositions (RTCUR), for large-scale non-convex TRPCA problems under the Tucker rank setting. RTCUR is developed within a framework of alternating projections that projects between the set of low-rank tensors and the set of sparse tensors. We utilize the recently developed tensor CUR decomposition to substantially reduce the computational complexity in each projection. In addition, we develop four variants of RTCUR for different application settings. We demonstrate the effectiveness and computational advantages of RTCUR against state-of-the-art methods on both synthetic and real-world datasets.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Aug 1, 2023
While uniform sampling has been widely studied in the matrix completion literature, CUR sampling ... more While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.
arXiv (Cornell University), Jan 30, 2018
We investigate systems of the form {A t g : g ∈ G, t ∈ [0, L]} where A ∈ B(H) is a normal operato... more We investigate systems of the form {A t g : g ∈ G, t ∈ [0, L]} where A ∈ B(H) is a normal operator in a separable Hilbert space H, G ⊂ H is a countable set, and L is a positive real number. Although the main goal of this work is to study the frame properties of {A t g : g ∈ G, t ∈ [0, L]}, as intermediate steps, we explore the completeness and Bessel properties of such systems from a theoretical perspective, which are of interest by themselves. Beside the theoretical appeal of investigating such systems, their connections to dynamical and mobile sampling make them fundamental for understanding and solving several major problems in engineering and science. 2010 Mathematics Subject Classification. 46N99, 42C15, 94O20. Key words and phrases. continuous frames, semi-continuous frames, dynamical sampling, discretization of continuous frames, frames induced by continuous powers of operators. The authors were supported in part by NSF Grant DMS-1322099.
arXiv (Cornell University), Jan 8, 2020
This article studies how to form CUR decompositions of low-rank matrices via primarily random sam... more This article studies how to form CUR decompositions of low-rank matrices via primarily random sampling, though deterministic methods due to previous works are illustrated as well. The primary problem is to determine when a column submatrix of a rank k matrix also has rank k. For random column sampling schemes, there is typically a tradeoff between the number of columns needed to be chosen and the complexity of determining the sampling probabilities. We discuss several sampling methods and their complexities as well as stability of the method under perturbations of both the probabilities and the underlying matrix. As an application, we give a high probability guarantee of the exact solution of the Subspace Clustering Problem via CUR decompositions when columns are sampled according to their Euclidean lengths.
arXiv (Cornell University), Mar 17, 2023
arXiv (Cornell University), Apr 28, 2020
We analyze the problem of reconstruction of a bandlimited function f from the space-time samples ... more We analyze the problem of reconstruction of a bandlimited function f from the space-time samples of its states f t " φ t˚f resulting from the convolution with a kernel φ t. It is well-known that, in natural phenomena, uniform space-time samples of f are not sufficient to reconstruct f in a stable way. To enable stable reconstruction, a space-time sampling with periodic nonuniformly spaced samples must be used as was shown by Lu and Vetterli. We show that the stability of reconstruction, as measured by a condition number, controls the maximal gap between the spacial samples. We provide a quantitative statement of this result. In addition, instead of irregular space-time samples, we show that uniform dynamical samples at sub-Nyquist spatial rate allow one to stably reconstruct the function p f away from certain, explicitly described blind spots. We also consider several classes of finite dimensional subsets of bandlimited functions in which the stable reconstruction is possible, even inside the blind spots. We obtain quantitative estimates for it using Remez-Turán type inequalities. En route, we obtain a Remez-Turán inequality for prolate spheroidal wave functions. To illustrate our results, we present some numerics and explicit estimates for the heat flow problem.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Tensor completion is an important problem in modern data analysis. In this work, we investigate a... more Tensor completion is an important problem in modern data analysis. In this work, we investigate a specific sampling strategy, referred to as tubal sampling. We propose two novel non-convex tensor completion frameworks that are easy to implement, named tensor L 1-L 2 (TL12) and tensor completion via CUR (TCCUR). We test the efficiency of both methods on synthetic data and a color image inpainting problem. Empirical results reveal a trade-off between the accuracy and time efficiency of these two methods in a low sampling ratio. Each of them outperforms some classical completion methods in at least one aspect.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We consider the space-time sampling and reconstruction of sparse bandlimited graph signals driven... more We consider the space-time sampling and reconstruction of sparse bandlimited graph signals driven by a heat diffusion process. In this paper, we develop a sampling framework consisting of selecting a small subset of space-time nodes at random according to some probability distribution, generalizing the classical variable density sampling to the heat diffusion field. We show that the number of space-time samples required to ensure stable recovery depends on an incoherence parameter determined by the interplay between graph topology, temporal dynamics, and sampling probability distributions. In optimal scenarios, as few as O(s log k) space-time samples are sufficient to ensure accurate recovery of all kbandlimited graph signals that are additionally s-sparse. Our proposed sampling method requires much fewer spatial samples than the static case by leveraging temporal information. Finally, we test our sampling techniques on a wide variety of graphs. The numerical results on synthetic and real climate data sets support our theoretical findings and demonstrate the practical applicability.
arXiv (Cornell University), Aug 23, 2021
We study the problem of tensor robust principal component analysis (TRPCA), which aims to separat... more We study the problem of tensor robust principal component analysis (TRPCA), which aims to separate an underlying low-multilinear-rank tensor and a sparse outlier tensor from their sum. In this work, we propose a fast non-convex algorithm, coined Robust Tensor CUR (RTCUR), for largescale TRPCA problems. RTCUR considers a framework of alternating projections and utilizes the recently developed tensor Fiber CUR decomposition to dramatically lower the computational complexity. The performance advantage of RTCUR is empirically verified against the state-of-the-arts on the synthetic datasets and is further demonstrated on the real-world application such as color video background subtraction.
Journal of Mathematical Analysis and Applications
Dynamical sampling is a new area in sampling theory that deals with signals that evolve over time... more Dynamical sampling is a new area in sampling theory that deals with signals that evolve over time under the action of a linear operator. There are lots of studies on various aspects of the dynamical sampling problem. However, they all focus on uniform discrete time-sets T ⊂ N. In our first paper, we concentrate on the case T = [0, L]. The goal of the present work is to study the frame property of the systems {Atg : g ∈ G}, t ∈ [0, L]}. To this end, we also characterize the completeness and Besselness properties of these systems. In our second paper, we consider dynamical sampling when the samples are corrupted by additive noises. The purpose of the second paper is to analyze the performance of the basic dynamical sampling algorithms in the finite dimensional case and study the impact of additive noise. The algorithms are implemented and tested on synthetic and real data sets, and denoising techniques are integrated to mitigate the effect of the noise. We also develop theoretical and...
While uniform sampling has been widely studied in the matrix completion literature, CUR sampling ... more While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.
Algorithms
Classification and topic modeling are popular techniques in machine learning that extract informa... more Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a novel method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method on legal documents provided by the California Innocence Project and the 20 Newsgroups dataset. Our results show that the proposed method improves both classification accuracy and topic coherence in comparison to past methods such as Semi-Supervised Non-negative Matrix Factorization (SSNMF), Guided Non-...
Developing large-scale distributed methods that are robust to the presence of adversarial or corr... more Developing large-scale distributed methods that are robust to the presence of adversarial or corrupted workers is an important part of making such methods practical for real-world problems. Here, we propose an iterative approach that is adversary-tolerant for least-squares problems. The algorithm utilizes simple statistics to guarantee convergence and is capable of learning the adversarial distributions. Additionally, the efficiency of the proposed method is shown in simulations in the presence of adversaries. The results demonstrate the great capability of such methods to tolerate different levels of adversary rates and to identify the erroneous workers with high accuracy.
Heat diffusion processes have found wide applications in modelling dynamical systems over graphs.... more Heat diffusion processes have found wide applications in modelling dynamical systems over graphs. In this paper, we consider the recovery of a k-bandlimited graph signal that is an initial signal of a heat diffusion process from its space-time samples. We propose three random space-time sampling regimes, termed dynamical sampling techniques, that consist in selecting a small subset of space-time nodes at random according to some probability distribution. We show that the number of space-time samples required to ensure stable recovery for each regime depends on a parameter called the spectral graph weighted coherence, that depends on the interplay between the dynamics over the graphs and sampling probability distributions. In optimal scenarios, no more than 𝒪(k log(k)) space-time samples are sufficient to ensure accurate and stable recovery of all k-bandlimited signals. In any case, dynamical sampling typically requires much fewer spatial samples than the static case by leveraging th...