Yiming Ying | University of Exeter (original) (raw)

Papers by Yiming Ying

Research paper thumbnail of Convergence analysis of online algorithms

Advances in Computational Mathematics, Nov 25, 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Rademacher Chaos Complexities for Learning the Kernel Problem

Neural Computation, Nov 1, 2010

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Large Margin Local Metric Learning

Springer eBooks, 2014

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Learning Rates of Least-Square Regularized Regression

Foundations of Computational Mathematics, Sep 23, 2005

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Fairness-aware Differentially Private Collaborative Filtering

Companion Proceedings of the ACM Web Conference 2023

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Unmixing Biological Fluorescence Image Data with Sparse and Low-Rank Poisson Regression

Multispectral biological fluorescence microscopy has enabled the identification of multiple targe... more Multispectral biological fluorescence microscopy has enabled the identification of multiple targets in complex samples. The accuracy in the unmixing result degrades (1) as the number of fluorophores used in any experiment increases and (2) as the signal-to-noise ratio in the recorded images decreases. Further, the availability of prior knowledge regarding the expected spatial distributions of fluorophores in images of labeled cells provides an opportunity to improve the accuracy of fluorophore identification and abundance. We propose a regularized sparse and low-rank Poisson unmixing approach (SL-PRU) to deconvolve spectral images labeled with highly overlapping fluorophores which are recorded in low signal-to-noise regimes. Firstly, SL-PRU implements multi-penalty terms when pursuing sparseness and spatial correlation of the resulting abundances in small neighborhoods simultaneously. Secondly, SL-PRU makes use of Poisson regression for unmixing instead of least squares regression t...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks

arXiv (Cornell University), Sep 19, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Minimax AUC Fairness: Efficient Algorithm with Provable Convergence

arXiv (Cornell University), Aug 22, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Stability and Generalization for Markov Chain Stochastic Gradient Methods

arXiv (Cornell University), Sep 16, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Co-Regularized PLSA for Multi-Modal Learning

Proceedings of the AAAI Conference on Artificial Intelligence

Many learning problems in real world applications involve rich datasets comprising multiple infor... more Many learning problems in real world applications involve rich datasets comprising multiple information modalities. In this work, we study co-regularized PLSA(coPLSA) as an efficient solution to probabilistic topic analysis of multi-modal data. In coPLSA, similarities between topic compositions of a data entity across different data modalities are measured with divergences between discrete probabilities, which are incorporated as a co-regularizer to augment individual PLSA models over each data modality. We derive efficient iterative learning algorithms for coPLSA with symmetric KL, L2 and L1 divergences as co-regularizers, in each case the essential optimization problem affords simple numerical solutions that entail only matrix arithmetic operations and numerical solution of 1D nonlinear equations. We evaluate the performance of the coPLSA algorithms on text/image cross-modal retrieval tasks, on which they show competitive performance with state-of-the-art methods.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Univariate Bound of Area Under ROC

arXiv (Cornell University), Apr 16, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of P.J.Whitelock Shake-and-Bake translation

Gaussian kernels with flexible variances provide a rich family of Mercer kernels for learning alg... more Gaussian kernels with flexible variances provide a rich family of Mercer kernels for learning algorithms. We show that the union of the unit balls of reproducing kernel Hilbert spaces generated by Gaussian kernels with flexible variances is a uniform Glivenko-Cantelli (uGC) class. This result confirms a conjecture concerning learnability of Gaussian kernels and verifies the uniform convergence of many learning algorithms involving Gaussians with changing variances. Rademacher averages and empirical covering numbers are used to estimate sample errors of multi-kernel regularization schemes associated with general loss functions. It is then shown that the regularization error associated with the least square loss and the Gaussian kernels can be greatly improved when flexible variances are allowed. Finally, for regularization schemes generated by Gaussian kernels with flexible variances we present explicit learning rates for regression with least square loss and classification with hing...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Fast Convergence of Online Pairwise Learning Algorithms

Pairwise learning usually refers to a learning task which involves a loss function depending on p... more Pairwise learning usually refers to a learning task which involves a loss function depending on pairs of examples, among which most notable ones are bipartite ranking, metric learning and AUC maximization. In this paper, we focus on online learning algorithms for pairwise learning problems without strong convexity, for which all previously known algorithms achieve a convergence rate of O(1/ √ T ) after T iterations. In particular, we study an online learning algorithm for pairwise learning with a least-square loss function in an unconstrained setting. We prove that the convergence of its last iterate can converge to the desired minimizer at a rate arbitrarily close to O(1/T ) up to logarithmic factor. The rates for this algorithm are established in high probability under the assumptions of polynomially decaying step sizes.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization

Stochastic optimization algorithms update models with cheap per-iteration costs sequentially, whi... more Stochastic optimization algorithms update models with cheap per-iteration costs sequentially, which makes them amenable for large-scale data analysis. Such algorithms have been widely studied for structured sparse models where the sparsity information is very specific, e.g., convex sparsity-inducing norms or ell0\ell^0ell0-norm. However, these norms cannot be directly applied to the problem of complex (non-convex) graph-structured sparsity models, which have important application in disease outbreak and social networks, etc. In this paper, we propose a stochastic gradient-based method for solving graph-structured sparsity constraint problems, not restricted to the least square loss. We prove that our algorithm enjoys a linear convergence up to a constant error, which is competitive with the counterparts in the batch learning setting. We conduct extensive experiments to show the efficiency and effectiveness of the proposed algorithms.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Distributionally Robust Optimization for Deep Kernel Multiple Instance Learning

Multiple Instance Learning (MIL) provides a promising solution to many real-world problems, where... more Multiple Instance Learning (MIL) provides a promising solution to many real-world problems, where labels are only available at the bag level but missing for instances due to a high labeling cost. As a powerful Bayesian non-parametric model, Gaussian Processes (GP) have been extended from classical supervised learning to MIL settings, aiming to identify the most likely positive (or least negative) instance from a positive (or negative) bag using only the bag-level labels. However, solely focusing on a single instance in a bag makes the model less robust to outliers or multi-modal scenarios, where a single bag contains a diverse set of positive instances. We propose a general GP mixture framework that simultaneously considers multiple instances through a latent mixture model. By adding a top-k constraint, the framework is equivalent to choosing the top-k most positive instances, making it more robust to outliers and multimodal scenarios. We further introduce a Distributionally Robust ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Differentially Private Stochastic Gradient Descent with Low-Noise

arXiv (Cornell University), Sep 9, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of AUC Maximization in the Era of Big Data and AI: A Survey

ACM Computing Surveys

Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a c... more Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a classifier for imbalanced data. AUC maximization refers to a learning paradigm that learns a predictive model by directly maximizing its AUC score. It has been studied for more than two decades dating back to late 90s and a huge amount of work has been devoted to AUC maximization since then. Recently, stochastic AUC maximization for big data and deep AUC maximization (DAM) for deep learning have received increasing attention and yielded dramatic impact for solving real-world problems. However, to the best our knowledge there is no comprehensive survey of related works for AUC maximization. This paper aims to address the gap by reviewing the literature in the past two decades. We not only give a holistic view of the literature but also present detailed explanations and comparisons of different papers from formulations to algorithms and theoretical guarantees. We also identify and discuss r...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Stability and Differential Privacy of Stochastic Gradient Descent for Pairwise Learning with Non-Smooth Loss

Pairwise learning has recently received increasing attention since it subsumes many important mac... more Pairwise learning has recently received increasing attention since it subsumes many important machine learning tasks (e.g. AUC maximization and metric learning) into a unifying framework. In this paper, we give the first-ever-known stability and generalization analysis of stochastic gradient descent (SGD) for pairwise learning with non-smooth loss functions, which are widely used (e.g. Ranking SVM with the hinge loss). We introduce a novel decomposition in its stability analysis to decouple the pairwisely dependent random variables, and derive generalization bounds which are consistent with the setting of pointwise learning. Furthermore, we apply our stability analysis to develop di↵erentially private SGD for pairwise learning, for which our utility bounds match with the state-ofthe-art output perturbation method (Huai et al., 2020) with smooth losses. Finally, we illustrate the results using specific examples of AUC maximization and similarity metric learning. As a byproduct, we pr...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Faster convergence of a randomized coordinate descent method for linearly constrained optimization problems

Analysis and Applications, 2018

The problem of minimizing a separable convex function under linearly coupled constraints arises f... more The problem of minimizing a separable convex function under linearly coupled constraints arises from various application domains such as economic systems, distributed control, and network flow. The main challenge for solving this problem is that the size of data is very large, which makes usual gradient-based methods infeasible. Recently, Necoara, Nesterov and Glineur [Random block coordinate descent methods for linearly constrained optimization over networks, J. Optim. Theory Appl. 173(1) (2017) 227–254] proposed an efficient randomized coordinate descent method to solve this type of optimization problems and presented an appealing convergence analysis. In this paper, we develop new techniques to analyze the convergence of such algorithms, which are able to greatly improve the results presented in the above. This refined result is achieved by extending Nesterov’s second technique [Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim. 22 (2012)...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Generalization analysis of multi-modal metric learning

Analysis and Applications, 2016

Multi-modal metric learning has recently received considerable attention since many real-world ap... more Multi-modal metric learning has recently received considerable attention since many real-world applications involve multi-modal data. However, there is relatively little study on the generalization analysis of the associated learning algorithms. In this paper, we bridge this theoretical gap by deriving its generalization bounds using Rademacher complexities. In particular, we establish a general Rademacher complexity result by systematically analyzing the behavior of the resulting models with various regularizers, e.g., [Formula: see text]-regularizer on the modality level with either a mixed [Formula: see text]-norm or a Schatten norm on each modality. Our results and the discussion followed help to understand how the prior knowledge can be exploited by selecting an appropriate regularizer.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Convergence analysis of online algorithms

Advances in Computational Mathematics, Nov 25, 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Rademacher Chaos Complexities for Learning the Kernel Problem

Neural Computation, Nov 1, 2010

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Large Margin Local Metric Learning

Springer eBooks, 2014

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Learning Rates of Least-Square Regularized Regression

Foundations of Computational Mathematics, Sep 23, 2005

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Fairness-aware Differentially Private Collaborative Filtering

Companion Proceedings of the ACM Web Conference 2023

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Unmixing Biological Fluorescence Image Data with Sparse and Low-Rank Poisson Regression

Multispectral biological fluorescence microscopy has enabled the identification of multiple targe... more Multispectral biological fluorescence microscopy has enabled the identification of multiple targets in complex samples. The accuracy in the unmixing result degrades (1) as the number of fluorophores used in any experiment increases and (2) as the signal-to-noise ratio in the recorded images decreases. Further, the availability of prior knowledge regarding the expected spatial distributions of fluorophores in images of labeled cells provides an opportunity to improve the accuracy of fluorophore identification and abundance. We propose a regularized sparse and low-rank Poisson unmixing approach (SL-PRU) to deconvolve spectral images labeled with highly overlapping fluorophores which are recorded in low signal-to-noise regimes. Firstly, SL-PRU implements multi-penalty terms when pursuing sparseness and spatial correlation of the resulting abundances in small neighborhoods simultaneously. Secondly, SL-PRU makes use of Poisson regression for unmixing instead of least squares regression t...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks

arXiv (Cornell University), Sep 19, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Minimax AUC Fairness: Efficient Algorithm with Provable Convergence

arXiv (Cornell University), Aug 22, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Stability and Generalization for Markov Chain Stochastic Gradient Methods

arXiv (Cornell University), Sep 16, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Co-Regularized PLSA for Multi-Modal Learning

Proceedings of the AAAI Conference on Artificial Intelligence

Many learning problems in real world applications involve rich datasets comprising multiple infor... more Many learning problems in real world applications involve rich datasets comprising multiple information modalities. In this work, we study co-regularized PLSA(coPLSA) as an efficient solution to probabilistic topic analysis of multi-modal data. In coPLSA, similarities between topic compositions of a data entity across different data modalities are measured with divergences between discrete probabilities, which are incorporated as a co-regularizer to augment individual PLSA models over each data modality. We derive efficient iterative learning algorithms for coPLSA with symmetric KL, L2 and L1 divergences as co-regularizers, in each case the essential optimization problem affords simple numerical solutions that entail only matrix arithmetic operations and numerical solution of 1D nonlinear equations. We evaluate the performance of the coPLSA algorithms on text/image cross-modal retrieval tasks, on which they show competitive performance with state-of-the-art methods.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Univariate Bound of Area Under ROC

arXiv (Cornell University), Apr 16, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of P.J.Whitelock Shake-and-Bake translation

Gaussian kernels with flexible variances provide a rich family of Mercer kernels for learning alg... more Gaussian kernels with flexible variances provide a rich family of Mercer kernels for learning algorithms. We show that the union of the unit balls of reproducing kernel Hilbert spaces generated by Gaussian kernels with flexible variances is a uniform Glivenko-Cantelli (uGC) class. This result confirms a conjecture concerning learnability of Gaussian kernels and verifies the uniform convergence of many learning algorithms involving Gaussians with changing variances. Rademacher averages and empirical covering numbers are used to estimate sample errors of multi-kernel regularization schemes associated with general loss functions. It is then shown that the regularization error associated with the least square loss and the Gaussian kernels can be greatly improved when flexible variances are allowed. Finally, for regularization schemes generated by Gaussian kernels with flexible variances we present explicit learning rates for regression with least square loss and classification with hing...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Fast Convergence of Online Pairwise Learning Algorithms

Pairwise learning usually refers to a learning task which involves a loss function depending on p... more Pairwise learning usually refers to a learning task which involves a loss function depending on pairs of examples, among which most notable ones are bipartite ranking, metric learning and AUC maximization. In this paper, we focus on online learning algorithms for pairwise learning problems without strong convexity, for which all previously known algorithms achieve a convergence rate of O(1/ √ T ) after T iterations. In particular, we study an online learning algorithm for pairwise learning with a least-square loss function in an unconstrained setting. We prove that the convergence of its last iterate can converge to the desired minimizer at a rate arbitrarily close to O(1/T ) up to logarithmic factor. The rates for this algorithm are established in high probability under the assumptions of polynomially decaying step sizes.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization

Stochastic optimization algorithms update models with cheap per-iteration costs sequentially, whi... more Stochastic optimization algorithms update models with cheap per-iteration costs sequentially, which makes them amenable for large-scale data analysis. Such algorithms have been widely studied for structured sparse models where the sparsity information is very specific, e.g., convex sparsity-inducing norms or ell0\ell^0ell0-norm. However, these norms cannot be directly applied to the problem of complex (non-convex) graph-structured sparsity models, which have important application in disease outbreak and social networks, etc. In this paper, we propose a stochastic gradient-based method for solving graph-structured sparsity constraint problems, not restricted to the least square loss. We prove that our algorithm enjoys a linear convergence up to a constant error, which is competitive with the counterparts in the batch learning setting. We conduct extensive experiments to show the efficiency and effectiveness of the proposed algorithms.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Distributionally Robust Optimization for Deep Kernel Multiple Instance Learning

Multiple Instance Learning (MIL) provides a promising solution to many real-world problems, where... more Multiple Instance Learning (MIL) provides a promising solution to many real-world problems, where labels are only available at the bag level but missing for instances due to a high labeling cost. As a powerful Bayesian non-parametric model, Gaussian Processes (GP) have been extended from classical supervised learning to MIL settings, aiming to identify the most likely positive (or least negative) instance from a positive (or negative) bag using only the bag-level labels. However, solely focusing on a single instance in a bag makes the model less robust to outliers or multi-modal scenarios, where a single bag contains a diverse set of positive instances. We propose a general GP mixture framework that simultaneously considers multiple instances through a latent mixture model. By adding a top-k constraint, the framework is equivalent to choosing the top-k most positive instances, making it more robust to outliers and multimodal scenarios. We further introduce a Distributionally Robust ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Differentially Private Stochastic Gradient Descent with Low-Noise

arXiv (Cornell University), Sep 9, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of AUC Maximization in the Era of Big Data and AI: A Survey

ACM Computing Surveys

Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a c... more Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a classifier for imbalanced data. AUC maximization refers to a learning paradigm that learns a predictive model by directly maximizing its AUC score. It has been studied for more than two decades dating back to late 90s and a huge amount of work has been devoted to AUC maximization since then. Recently, stochastic AUC maximization for big data and deep AUC maximization (DAM) for deep learning have received increasing attention and yielded dramatic impact for solving real-world problems. However, to the best our knowledge there is no comprehensive survey of related works for AUC maximization. This paper aims to address the gap by reviewing the literature in the past two decades. We not only give a holistic view of the literature but also present detailed explanations and comparisons of different papers from formulations to algorithms and theoretical guarantees. We also identify and discuss r...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Stability and Differential Privacy of Stochastic Gradient Descent for Pairwise Learning with Non-Smooth Loss

Pairwise learning has recently received increasing attention since it subsumes many important mac... more Pairwise learning has recently received increasing attention since it subsumes many important machine learning tasks (e.g. AUC maximization and metric learning) into a unifying framework. In this paper, we give the first-ever-known stability and generalization analysis of stochastic gradient descent (SGD) for pairwise learning with non-smooth loss functions, which are widely used (e.g. Ranking SVM with the hinge loss). We introduce a novel decomposition in its stability analysis to decouple the pairwisely dependent random variables, and derive generalization bounds which are consistent with the setting of pointwise learning. Furthermore, we apply our stability analysis to develop di↵erentially private SGD for pairwise learning, for which our utility bounds match with the state-ofthe-art output perturbation method (Huai et al., 2020) with smooth losses. Finally, we illustrate the results using specific examples of AUC maximization and similarity metric learning. As a byproduct, we pr...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Faster convergence of a randomized coordinate descent method for linearly constrained optimization problems

Analysis and Applications, 2018

The problem of minimizing a separable convex function under linearly coupled constraints arises f... more The problem of minimizing a separable convex function under linearly coupled constraints arises from various application domains such as economic systems, distributed control, and network flow. The main challenge for solving this problem is that the size of data is very large, which makes usual gradient-based methods infeasible. Recently, Necoara, Nesterov and Glineur [Random block coordinate descent methods for linearly constrained optimization over networks, J. Optim. Theory Appl. 173(1) (2017) 227–254] proposed an efficient randomized coordinate descent method to solve this type of optimization problems and presented an appealing convergence analysis. In this paper, we develop new techniques to analyze the convergence of such algorithms, which are able to greatly improve the results presented in the above. This refined result is achieved by extending Nesterov’s second technique [Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim. 22 (2012)...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Generalization analysis of multi-modal metric learning

Analysis and Applications, 2016

Multi-modal metric learning has recently received considerable attention since many real-world ap... more Multi-modal metric learning has recently received considerable attention since many real-world applications involve multi-modal data. However, there is relatively little study on the generalization analysis of the associated learning algorithms. In this paper, we bridge this theoretical gap by deriving its generalization bounds using Rademacher complexities. In particular, we establish a general Rademacher complexity result by systematically analyzing the behavior of the resulting models with various regularizers, e.g., [Formula: see text]-regularizer on the modality level with either a mixed [Formula: see text]-norm or a Schatten norm on each modality. Our results and the discussion followed help to understand how the prior knowledge can be exploited by selecting an appropriate regularizer.

Bookmarks Related papers MentionsView impact