Andi Han - Academia.edu (original) (raw)
Papers by Andi Han
In this paper, we study the min-max optimization problems on Riemannian manifolds. We introduce a... more In this paper, we study the min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak–Łojasiewicz (PL) condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is satisfied. To minimize the Hamiltonian function, we propose Riemannian Hamiltonian methods (RHM) and present their convergence analysis. We extend RHM to include consensus regularization and to the stochastic setting. We illustrate the efficacy of the proposed RHM in applications such as subspace robust Wasserstein distance, robust training of neural networks, and generative adversarial networks.
Optimal transport (OT) has seen its popularity in various fields of applications. We start by obs... more Optimal transport (OT) has seen its popularity in various fields of applications. We start by observing that the OT problem can be viewed as an instance of a general symmetric positive definite (SPD) matrix-valued OT problem, where the cost, the marginals, and the coupling are represented as block matrices and each component block is a SPD matrix. The summation of row blocks and column blocks in the coupling matrix are constrained by the given block-SPD marginals. We endow the set of such block-coupling matrices with a novel Riemannian manifold structure. This allows to exploit the versatile Riemannian optimization framework to solve generic SPD matrix-valued OT problems. We illustrate the usefulness of the proposed approach in several applications.
Dimensionality reduction (DR) and manifold learning (ManL) have been applied extensively in many ... more Dimensionality reduction (DR) and manifold learning (ManL) have been applied extensively in many machine learning tasks, including signal processing, speech recognition, and neuroinformatics. However, the understanding of whether DR and ManL models can generate valid learning results remains unclear. In this work, we investigate the validity of learning results of some widely used DR and ManL methods through the chart mapping function of a manifold. We identify a fundamental problem of these methods: the mapping functions induced by these methods violate the basic settings of manifolds, and hence they are not learning manifold in the mathematical sense. To address this problem, we provide a provably correct algorithm called fixed points Laplacian mapping (FPLM), that has the geometric guarantee to find a valid manifold representation (up to a homeomorphism). Combining one additional condition(orientation preserving), we discuss a sufficient condition for an algorithm to be bijective...
In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affi... more In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affine-Invariant (AI) geometry for Riemannian optimization on the symmetric positive definite (SPD) matrix manifold. Our study begins with an observation that the BW metric has a linear dependence on SPD matrices in contrast to the quadratic dependence of the AI metric. We build on this to show that the BW metric is a more suitable and robust choice for several Riemannian optimization problems over ill-conditioned SPD matrices. We show that the BW geometry has a non-negative curvature, which further improves convergence rates of algorithms over the non-positively curved AI geometry. Finally, we verify that several popular cost functions, which are known to be geodesic convex under the AI geometry, are also geodesic convex under the BW geometry. Extensive experiments on various applications support our findings.
Variance reduction techniques are popular in accelerating gradient descent and stochastic gradien... more Variance reduction techniques are popular in accelerating gradient descent and stochastic gradient descent for optimization problems defined on both Euclidean space and Riemannian manifold. In this paper, we further improve on existing variance reduction methods for non-convex Riemannian optimization, including R-SVRG and R-SRG/R-SPIDER with batch size adaptation. We show that this strategy can achieve lower total complexities for optimizing both general non-convex and gradient dominated functions under both finite-sum and online settings. As a result, we also provide simpler convergence analysis for R-SVRG and improve complexity bounds for R-SRG under finite-sum setting. Specifically, we prove that R-SRG achieves the same near-optimal complexity as R-SPIDER without requiring a small step size. Empirical experiments on a variety of tasks demonstrate effectiveness of proposed adaptive batch size scheme.
In this paper, we propose a variant of Riemannian stochastic recursive gradient method that can a... more In this paper, we propose a variant of Riemannian stochastic recursive gradient method that can achieve second-order convergence guarantee using simple perturbation. The idea is to perturb the iterates when gradient is small and carry out stochastic recursive gradient updates over tangent space. This avoids the complication of exploiting Riemannian geometry. We show that under finite-sum setting, our algorithm requires 𝒪( √(n)/ϵ^2 + √(n)/δ^4 + n/δ^3) stochastic gradient queries to find a (ϵ, δ)-second-order critical point. This strictly improves the complexity of perturbed Riemannian gradient descent and is superior to perturbed Riemannian accelerated gradient descent under large-sample settings. We also provide a complexity of 𝒪(1/ϵ^3 + 1/δ^3 ϵ^2 + 1/δ^4 ϵ) for online optimization, which is novel on Riemannian manifold in terms of second-order convergence using only first-order information.
ArXiv, 2021
In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affi... more In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affine-Invariant (AI) geometry for Riemannian optimization on the symmetric positive definite (SPD) matrix manifold. Our study begins with an observation that the BW metric has a linear dependence on SPD matrices in contrast to the quadratic dependence of the AI metric. We build on this to show that the BW metric is a more suitable and robust choice for several Riemannian optimization problems over ill-conditioned SPD matrices. We show that the BW geometry has a non-negative curvature, which further improves convergence rates of algorithms over the non-positively curved AI geometry. Finally, we verify that several popular cost functions, which are known to be geodesic convex under the AI geometry, are also geodesic convex under the BW geometry. Extensive experiments on various applications support our findings.
ArXiv, 2021
Dimensionality reduction (DR) and manifold learning (ManL) have been applied extensively in many ... more Dimensionality reduction (DR) and manifold learning (ManL) have been applied extensively in many machine learning tasks, including signal processing, speech recognition, and neuroinformatics.However, the understanding of whether DR and ManL models can generate valid learning results remains unclear. In this work, we investigate the validity of learning results of some widely used DR and ManL methods through the chart mapping function of a manifold. We identify a fundamental problem of these methods: the mapping functions induced by these methods violate the basic settings of manifolds, and hence they are not learning manifold in the mathematical sense. To address this problem, we provide a provably correct algorithm called fixed points Laplacian mapping (FPLM), that has the geometric guarantee to find a valid manifold representation (up to a homeomorphism). Combining one additional condition (orientation preserving), we discuss a sufficient condition for an algorithm to be bijective...
ArXiv, 2020
Variance reduction techniques are popular in accelerating gradient descent and stochastic gradien... more Variance reduction techniques are popular in accelerating gradient descent and stochastic gradient descent for optimization problems defined on both Euclidean space and Riemannian manifold. In this paper, we further improve on existing variance reduction methods for non-convex Riemannian optimization, including R-SVRG and R-SRG/R-SPIDER with batch size adaptation. We show that this strategy can achieve lower total complexities for optimizing both general non-convex and gradient dominated functions under both finite-sum and online settings. As a result, we also provide simpler convergence analysis for R-SVRG and improve complexity bounds for R-SRG under finite-sum setting. Specifically, we prove that R-SRG achieves the same near-optimal complexity as R-SPIDER without requiring a small step size. Empirical experiments on a variety of tasks demonstrate effectiveness of proposed adaptive batch size scheme.
This paper proposes a generalized Bures-Wasserstein (BW) Riemannian geometry for the manifold of ... more This paper proposes a generalized Bures-Wasserstein (BW) Riemannian geometry for the manifold of symmetric positive definite matrices. We explore the generalization of the BW geometry in three different ways: 1) by generalizing the Lyapunov operator in the metric, 2) by generalizing the orthogonal Procrustes distance, and 3) by generalizing the Wasserstein distance between the Gaussians. We show that they all lead to the same geometry. The proposed generalization is parameterized by a symmetric positive definite matrix M such that when M = I, we recover the BW geometry. We derive expressions for the distance, geodesic, exponential/logarithm maps, Levi-Civita connection, and sectional curvature under the generalized BW geometry. We also present applications and experiments that illustrate the efficacy of the proposed geometry.
In this paper, we propose a variant of Riemannian stochastic recursive gradient method that can a... more In this paper, we propose a variant of Riemannian stochastic recursive gradient method that can achieve second-order convergence guarantee and escape saddle points using simple perturbation. The idea is to perturb the iterates when gradient is small and carry out stochastic recursive gradient updates over tangent space. This avoids the complication of exploiting Riemannian geometry. We show that under finite-sum setting, our algorithm requires widetildemathcalObig(fracsqrtnepsilon2+fracsqrtndelta4+fracndelta3big)\widetilde{\mathcal{O}}\big( \frac{ \sqrt{n}}{\epsilon^2} + \frac{\sqrt{n} }{\delta^4} + \frac{n}{\delta^3}\big)widetildemathcalObig(fracsqrtnepsilon2+fracsqrtndelta4+fracndelta3big) stochastic gradient queries to find a (epsilon,delta)(\epsilon, \delta)(epsilon,delta)-second-order critical point. This strictly improves the complexity of perturbed Riemannian gradient descent and is superior to perturbed Riemannian accelerated gradient descent under large-sample settings. We also provide a complexity of widetildemathcalObig(frac1epsilon3+frac1delta3epsilon2+frac1delta4epsilonbig)\widetilde{\mathcal{O}} \big( \frac{1}{\epsilon^3} + \frac{1}{\delta^3 \epsilon^2} + \frac{1}{\delta^4 \epsilon} \big)widetildemathcalObig(frac1epsilon3+frac1delta3epsilon2+frac1delta4epsilonbig) for online optim...
We propose a stochastic recursive momentum method for Riemannian non-convex optimization that ach... more We propose a stochastic recursive momentum method for Riemannian non-convex optimization that achieves a near-optimal complexity of tildemathcalO(epsilon−3)\tilde{\mathcal{O}}(\epsilon^{-3})tildemathcalO(epsilon−3) to find epsilon\epsilonepsilon-approximate solution with one sample. That is, our method requires mathcalO(1)\mathcal{O}(1)mathcalO(1) gradient evaluations per iteration and does not require restarting with a large batch gradient, which is commonly used to obtain the faster rate. Extensive experiment results demonstrate the superiority of our proposed algorithm.
IEEE Transactions on Pattern Analysis and Machine Intelligence
In this paper, we study the min-max optimization problems on Riemannian manifolds. We introduce a... more In this paper, we study the min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak–Łojasiewicz (PL) condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is satisfied. To minimize the Hamiltonian function, we propose Riemannian Hamiltonian methods (RHM) and present their convergence analysis. We extend RHM to include consensus regularization and to the stochastic setting. We illustrate the efficacy of the proposed RHM in applications such as subspace robust Wasserstein distance, robust training of neural networks, and generative adversarial networks.
Optimal transport (OT) has seen its popularity in various fields of applications. We start by obs... more Optimal transport (OT) has seen its popularity in various fields of applications. We start by observing that the OT problem can be viewed as an instance of a general symmetric positive definite (SPD) matrix-valued OT problem, where the cost, the marginals, and the coupling are represented as block matrices and each component block is a SPD matrix. The summation of row blocks and column blocks in the coupling matrix are constrained by the given block-SPD marginals. We endow the set of such block-coupling matrices with a novel Riemannian manifold structure. This allows to exploit the versatile Riemannian optimization framework to solve generic SPD matrix-valued OT problems. We illustrate the usefulness of the proposed approach in several applications.
Dimensionality reduction (DR) and manifold learning (ManL) have been applied extensively in many ... more Dimensionality reduction (DR) and manifold learning (ManL) have been applied extensively in many machine learning tasks, including signal processing, speech recognition, and neuroinformatics. However, the understanding of whether DR and ManL models can generate valid learning results remains unclear. In this work, we investigate the validity of learning results of some widely used DR and ManL methods through the chart mapping function of a manifold. We identify a fundamental problem of these methods: the mapping functions induced by these methods violate the basic settings of manifolds, and hence they are not learning manifold in the mathematical sense. To address this problem, we provide a provably correct algorithm called fixed points Laplacian mapping (FPLM), that has the geometric guarantee to find a valid manifold representation (up to a homeomorphism). Combining one additional condition(orientation preserving), we discuss a sufficient condition for an algorithm to be bijective...
In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affi... more In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affine-Invariant (AI) geometry for Riemannian optimization on the symmetric positive definite (SPD) matrix manifold. Our study begins with an observation that the BW metric has a linear dependence on SPD matrices in contrast to the quadratic dependence of the AI metric. We build on this to show that the BW metric is a more suitable and robust choice for several Riemannian optimization problems over ill-conditioned SPD matrices. We show that the BW geometry has a non-negative curvature, which further improves convergence rates of algorithms over the non-positively curved AI geometry. Finally, we verify that several popular cost functions, which are known to be geodesic convex under the AI geometry, are also geodesic convex under the BW geometry. Extensive experiments on various applications support our findings.
Variance reduction techniques are popular in accelerating gradient descent and stochastic gradien... more Variance reduction techniques are popular in accelerating gradient descent and stochastic gradient descent for optimization problems defined on both Euclidean space and Riemannian manifold. In this paper, we further improve on existing variance reduction methods for non-convex Riemannian optimization, including R-SVRG and R-SRG/R-SPIDER with batch size adaptation. We show that this strategy can achieve lower total complexities for optimizing both general non-convex and gradient dominated functions under both finite-sum and online settings. As a result, we also provide simpler convergence analysis for R-SVRG and improve complexity bounds for R-SRG under finite-sum setting. Specifically, we prove that R-SRG achieves the same near-optimal complexity as R-SPIDER without requiring a small step size. Empirical experiments on a variety of tasks demonstrate effectiveness of proposed adaptive batch size scheme.
In this paper, we propose a variant of Riemannian stochastic recursive gradient method that can a... more In this paper, we propose a variant of Riemannian stochastic recursive gradient method that can achieve second-order convergence guarantee using simple perturbation. The idea is to perturb the iterates when gradient is small and carry out stochastic recursive gradient updates over tangent space. This avoids the complication of exploiting Riemannian geometry. We show that under finite-sum setting, our algorithm requires 𝒪( √(n)/ϵ^2 + √(n)/δ^4 + n/δ^3) stochastic gradient queries to find a (ϵ, δ)-second-order critical point. This strictly improves the complexity of perturbed Riemannian gradient descent and is superior to perturbed Riemannian accelerated gradient descent under large-sample settings. We also provide a complexity of 𝒪(1/ϵ^3 + 1/δ^3 ϵ^2 + 1/δ^4 ϵ) for online optimization, which is novel on Riemannian manifold in terms of second-order convergence using only first-order information.
ArXiv, 2021
In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affi... more In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affine-Invariant (AI) geometry for Riemannian optimization on the symmetric positive definite (SPD) matrix manifold. Our study begins with an observation that the BW metric has a linear dependence on SPD matrices in contrast to the quadratic dependence of the AI metric. We build on this to show that the BW metric is a more suitable and robust choice for several Riemannian optimization problems over ill-conditioned SPD matrices. We show that the BW geometry has a non-negative curvature, which further improves convergence rates of algorithms over the non-positively curved AI geometry. Finally, we verify that several popular cost functions, which are known to be geodesic convex under the AI geometry, are also geodesic convex under the BW geometry. Extensive experiments on various applications support our findings.
ArXiv, 2021
Dimensionality reduction (DR) and manifold learning (ManL) have been applied extensively in many ... more Dimensionality reduction (DR) and manifold learning (ManL) have been applied extensively in many machine learning tasks, including signal processing, speech recognition, and neuroinformatics.However, the understanding of whether DR and ManL models can generate valid learning results remains unclear. In this work, we investigate the validity of learning results of some widely used DR and ManL methods through the chart mapping function of a manifold. We identify a fundamental problem of these methods: the mapping functions induced by these methods violate the basic settings of manifolds, and hence they are not learning manifold in the mathematical sense. To address this problem, we provide a provably correct algorithm called fixed points Laplacian mapping (FPLM), that has the geometric guarantee to find a valid manifold representation (up to a homeomorphism). Combining one additional condition (orientation preserving), we discuss a sufficient condition for an algorithm to be bijective...
ArXiv, 2020
Variance reduction techniques are popular in accelerating gradient descent and stochastic gradien... more Variance reduction techniques are popular in accelerating gradient descent and stochastic gradient descent for optimization problems defined on both Euclidean space and Riemannian manifold. In this paper, we further improve on existing variance reduction methods for non-convex Riemannian optimization, including R-SVRG and R-SRG/R-SPIDER with batch size adaptation. We show that this strategy can achieve lower total complexities for optimizing both general non-convex and gradient dominated functions under both finite-sum and online settings. As a result, we also provide simpler convergence analysis for R-SVRG and improve complexity bounds for R-SRG under finite-sum setting. Specifically, we prove that R-SRG achieves the same near-optimal complexity as R-SPIDER without requiring a small step size. Empirical experiments on a variety of tasks demonstrate effectiveness of proposed adaptive batch size scheme.
This paper proposes a generalized Bures-Wasserstein (BW) Riemannian geometry for the manifold of ... more This paper proposes a generalized Bures-Wasserstein (BW) Riemannian geometry for the manifold of symmetric positive definite matrices. We explore the generalization of the BW geometry in three different ways: 1) by generalizing the Lyapunov operator in the metric, 2) by generalizing the orthogonal Procrustes distance, and 3) by generalizing the Wasserstein distance between the Gaussians. We show that they all lead to the same geometry. The proposed generalization is parameterized by a symmetric positive definite matrix M such that when M = I, we recover the BW geometry. We derive expressions for the distance, geodesic, exponential/logarithm maps, Levi-Civita connection, and sectional curvature under the generalized BW geometry. We also present applications and experiments that illustrate the efficacy of the proposed geometry.
In this paper, we propose a variant of Riemannian stochastic recursive gradient method that can a... more In this paper, we propose a variant of Riemannian stochastic recursive gradient method that can achieve second-order convergence guarantee and escape saddle points using simple perturbation. The idea is to perturb the iterates when gradient is small and carry out stochastic recursive gradient updates over tangent space. This avoids the complication of exploiting Riemannian geometry. We show that under finite-sum setting, our algorithm requires widetildemathcalObig(fracsqrtnepsilon2+fracsqrtndelta4+fracndelta3big)\widetilde{\mathcal{O}}\big( \frac{ \sqrt{n}}{\epsilon^2} + \frac{\sqrt{n} }{\delta^4} + \frac{n}{\delta^3}\big)widetildemathcalObig(fracsqrtnepsilon2+fracsqrtndelta4+fracndelta3big) stochastic gradient queries to find a (epsilon,delta)(\epsilon, \delta)(epsilon,delta)-second-order critical point. This strictly improves the complexity of perturbed Riemannian gradient descent and is superior to perturbed Riemannian accelerated gradient descent under large-sample settings. We also provide a complexity of widetildemathcalObig(frac1epsilon3+frac1delta3epsilon2+frac1delta4epsilonbig)\widetilde{\mathcal{O}} \big( \frac{1}{\epsilon^3} + \frac{1}{\delta^3 \epsilon^2} + \frac{1}{\delta^4 \epsilon} \big)widetildemathcalObig(frac1epsilon3+frac1delta3epsilon2+frac1delta4epsilonbig) for online optim...
We propose a stochastic recursive momentum method for Riemannian non-convex optimization that ach... more We propose a stochastic recursive momentum method for Riemannian non-convex optimization that achieves a near-optimal complexity of tildemathcalO(epsilon−3)\tilde{\mathcal{O}}(\epsilon^{-3})tildemathcalO(epsilon−3) to find epsilon\epsilonepsilon-approximate solution with one sample. That is, our method requires mathcalO(1)\mathcal{O}(1)mathcalO(1) gradient evaluations per iteration and does not require restarting with a large batch gradient, which is commonly used to obtain the faster rate. Extensive experiment results demonstrate the superiority of our proposed algorithm.
IEEE Transactions on Pattern Analysis and Machine Intelligence