Yinyu Ye - Academia.edu (original) (raw)
Papers by Yinyu Ye
We consider a retailer selling a single product with limited on-hand inventory over a finite sell... more We consider a retailer selling a single product with limited on-hand inventory over a finite selling season. Customer demand arrives according to a Poisson process, the rate of which is influenced by a single action taken by the retailer (such as price adjustment, sales commission, advertisement intensity, etc.). The relationship between the action and the demand rate is not known in advance. However, the retailer is able to learn the optimal action "on the fly" as she maximizes her total expected revenue based on the observed demand reactions. Using the pricing problem as an example, we propose a dynamic "learning-while-doing" algorithm that only involves function value estimation to achieve a near-optimal performance. Our algorithm employs a series of shrinking price intervals and iteratively tests prices within that interval using a set of carefully chosen parameters. We prove that the convergence rate of our algorithm is among the fastest of all possible algo...
We propose a number of techniques for obtaining a global ranking from data that may be incomplete... more We propose a number of techniques for obtaining a global ranking from data that may be incomplete and imbalanced -- characteristics almost universal to modern datasets coming from e-commerce and internet applications. We are primarily interested in score or rating-based cardinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our statistical ranking method uses the graph Helmholtzian, the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator or scalar Laplacian. We study the graph Helmholtzian using combinatorial Hodge theory: we show that every edge flow representing pairwise ranking can be resolved into two orthogonal components, a gradient flow that represents the L2-optimal global ranking and a divergence-free flow (cyclic) that measures the validity of the global ranking obtained -- if this is large, then the data does...
In this paper, we propose ℓ_p-norm regularized models to seek near-optimal sparse portfolios. The... more In this paper, we propose ℓ_p-norm regularized models to seek near-optimal sparse portfolios. These sparse solutions reduce the complexity of portfolio implementation and management. Theoretical results are established to guarantee the sparsity of the second-order KKT points of the ℓ_p-norm regularized models. More interestingly, we present a theory that relates sparsity of the KKT points with Projected correlation and Projected Sharpe ratio. We also design an interior point algorithm to obtain an approximate second-order KKT solution of the ℓ_p-norm models in polynomial time with a fixed error tolerance, and then test our ℓ_p-norm modes on S&P 500 (2008-2012) data and international market data. The computational results illustrate that the ℓ_p-norm regularized models can generate portfolios of any desired sparsity with portfolio variance and portfolio return comparable to those of the unregularized Markowitz model with cardinality constraint. Our analysis of a combined model lead u...
The Alternating Direction Method of Multipliers (ADMM) has now days gained tremendous attentions ... more The Alternating Direction Method of Multipliers (ADMM) has now days gained tremendous attentions for solving large-scale machine learning and signal processing problems due to the relative simplicity. However, the two-block structure of the classical ADMM still limits the size of the real problems being solved. When one forces a more-than-two-block structure by variable-splitting, the convergence speed slows down greatly as observed in practice. Recently, a randomly assembled cyclic multi-block ADMM (RAC-MBADMM) was developed by the authors for solving general convex and nonconvex quadratic optimization problems where the number of blocks can go greater than two so that each sub-problem has a smaller size and can be solved much more efficiently. In this paper, we apply this method to solving few selected machine learning problems related to convex quadratic optimization, such as Linear Regression, LASSO, Elastic-Net, and SVM. We prove that the algorithm would converge in expectation...
IEEE Transactions on Automatic Control, 1999
We show that the direct extension of alternating direction method of multipliers (ADMM) with thre... more We show that the direct extension of alternating direction method of multipliers (ADMM) with three blocks is not necessarily convergent even for solving a square system of linear equations, although its convergence proof was established 40 years ago with one or two-block. However, we prove that, in each iteration if one randomly and independently permutes the updating order of variable blocks followed by the regular multiplier update, then ADMM will converge in expectation when solving any square system of linear equations with any number of blocks. We also discuss its extension to solve general convex optimization problems, in particular, linear and quadratic programs.
We focus on a permutation betting market under parimutuel call auction model where traders bet on... more We focus on a permutation betting market under parimutuel call auction model where traders bet on the final ranking of n candidates. We present a Proportional Betting mechanism for this market. Our mechanism allows the traders to bet on any subset of the n x n 'candidate-rank' pairs, and rewards them proportionally to the number of pairs that appear in the final outcome. We show that market organizer's decision problem for this mechanism can be formulated as a convex program of polynomial size. More importantly, the formulation yields a set of n x n unique marginal prices that are sufficient to price the bets in this mechanism, and are computable in polynomial-time. The marginal prices reflect the traders' beliefs about the marginal distributions over outcomes. We also propose techniques to compute the joint distribution over n! permutations from these marginal distributions. We show that using a maximum entropy criterion, we can obtain a concise parametric form (wit...
Let (G,P) be a bar framework of n vertices in general position in R^d, d <= n-1, where G is a ... more Let (G,P) be a bar framework of n vertices in general position in R^d, d <= n-1, where G is a (d+1)-lateration graph. In this paper, we present a constructive proof that (G,P) admits a positive semi-definite stress matrix with rank n-d-1. We also prove a similar result for a sensor network where the graph consists of m(>= d+1) anchors.
A Semidefinite Programming (SDP) relaxation is an effective computational method to solve a Senso... more A Semidefinite Programming (SDP) relaxation is an effective computational method to solve a Sensor Network Localization problem, which attempts to determine the locations of a group of sensors given the distances between some of them [11]. In this paper, we analyze and determine new sufficient conditions and formulations that guarantee that the SDP relaxation is exact, i.e., gives the correct solution. These conditions can be useful for designing sensor networks and managing connectivities in practice. Our main contribution is twofold: We present the first non-asymptotic bound on the connectivity or radio range requirement of the sensors in order to ensure the network is uniquely localizable. Determining this range is a key component in the design of sensor networks, and we provide a result that leads to a correct localization of each sensor, for any number of sensors. Second, we introduce a new class of graphs that can always be correctly localized by an SDP relaxation. Specificall...
We establish that mass conserving single terminal-linkage networks of chemical reactions admit po... more We establish that mass conserving single terminal-linkage networks of chemical reactions admit positive steady states regardless of network deficiency and the choice of reaction rate constants. This result holds for closed systems without material exchange across the boundary, as well as for open systems with material exchange at rates that satisfy a simple sufficient and necessary condition. Our proof uses a fixed point of a novel convex optimization formulation to find the steady state behavior of chemical reaction networks that satisfy the law of mass-action kinetics. A fixed point iteration can be used to compute these steady states, and we show that it converges for weakly reversible homogeneous systems. We report the results of our algorithm on numerical experiments.
Let P be a polytope satisfying that each row of the defining matrix has at most one positive entr... more Let P be a polytope satisfying that each row of the defining matrix has at most one positive entry. Determining whether there is an integer point in P is known to be an NP-complete problem. By introducing an integer labeling rule on an augmented set and applying a triangulation of the Euclidean space, we develop in this paper a variable dimension method for computing an integer point in P. The method starts from an arbitrary integer point and follows a finite simplicial path that either leads to an integer point in P or proves no such point exists.
Naval Research Logistics (NRL), 2021
Lecture Notes in Control and Information Sciences
We present computational experience with an interior-point algorithm for large-scale quadratic pr... more We present computational experience with an interior-point algorithm for large-scale quadratic programming problems with box constraints. The algorithm requires a total of O (√ nL) number of iterations, where L is the size of the input data of the problem, and O (n 3) arithmetic operations per iteration. The algorithm has been implemented using vectorization and tested on an IBM 3090-600S computer with vector facilities. The computational results suggest that the efficiency of the algorithm depends on an appropriate choice of ...
Nonconvex Optimization and Its Applications
IEEE Signal Processing Magazine, 2010
IEEE Transactions on Signal Processing, 2010
Diagonal preconditioning has been a staple technique in optimization and machine learning. It oft... more Diagonal preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the design or Hessian matrix it is applied to, thereby speeding up convergence. However, rigorous analyses of how well various diagonal preconditioning procedures improve the condition number of the preconditioned matrix and how that translates into improvements in optimization are rare. In this paper, we first provide an analysis of a popular diagonal preconditioning technique based on column standard deviation and its effect on the condition number using random matrix theory. Then we identify a class of design matrices whose condition numbers can be reduced significantly by this procedure. We then study the problem of optimal diagonal preconditioning to improve the condition number of any full-rank matrix and provide a bisection algorithm and a potential reduction algorithm with O(log(1/ϵ)) iteration complexity, where each iteration consists of an SDP...
This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for minimizing ... more This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for minimizing a convex quadratic function, which is equivalent to Gauss-Seidel method and can be transformed to Kaczmarz method and projection onto convex sets (POCS). We observe that the known provable complexity of C-CD can be O(n^2) times slower than randomized coordinate descent (R-CD), but no example was rigorously proven to exhibit such a large gap. In this paper we show that the gap indeed exists. We prove that there exists an example for which C-CD takes at least O(n^4 κ_CD1/ϵ) operations, where κ_CD is related to Demmel's condition number and it determines the convergence rate of R-CD. It implies that in the worst case C-CD can indeed be O(n^2) times slower than R-CD, which has complexity O( n^2 κ_CD1/ϵ). Note that for this example, the gap exists for any fixed update order, not just a particular order. Based on the example, we establish several almost tight complexity bounds of C-CD for ...
Linear and Nonlinear Programming, 2016
We consider a retailer selling a single product with limited on-hand inventory over a finite sell... more We consider a retailer selling a single product with limited on-hand inventory over a finite selling season. Customer demand arrives according to a Poisson process, the rate of which is influenced by a single action taken by the retailer (such as price adjustment, sales commission, advertisement intensity, etc.). The relationship between the action and the demand rate is not known in advance. However, the retailer is able to learn the optimal action "on the fly" as she maximizes her total expected revenue based on the observed demand reactions. Using the pricing problem as an example, we propose a dynamic "learning-while-doing" algorithm that only involves function value estimation to achieve a near-optimal performance. Our algorithm employs a series of shrinking price intervals and iteratively tests prices within that interval using a set of carefully chosen parameters. We prove that the convergence rate of our algorithm is among the fastest of all possible algo...
We propose a number of techniques for obtaining a global ranking from data that may be incomplete... more We propose a number of techniques for obtaining a global ranking from data that may be incomplete and imbalanced -- characteristics almost universal to modern datasets coming from e-commerce and internet applications. We are primarily interested in score or rating-based cardinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our statistical ranking method uses the graph Helmholtzian, the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator or scalar Laplacian. We study the graph Helmholtzian using combinatorial Hodge theory: we show that every edge flow representing pairwise ranking can be resolved into two orthogonal components, a gradient flow that represents the L2-optimal global ranking and a divergence-free flow (cyclic) that measures the validity of the global ranking obtained -- if this is large, then the data does...
In this paper, we propose ℓ_p-norm regularized models to seek near-optimal sparse portfolios. The... more In this paper, we propose ℓ_p-norm regularized models to seek near-optimal sparse portfolios. These sparse solutions reduce the complexity of portfolio implementation and management. Theoretical results are established to guarantee the sparsity of the second-order KKT points of the ℓ_p-norm regularized models. More interestingly, we present a theory that relates sparsity of the KKT points with Projected correlation and Projected Sharpe ratio. We also design an interior point algorithm to obtain an approximate second-order KKT solution of the ℓ_p-norm models in polynomial time with a fixed error tolerance, and then test our ℓ_p-norm modes on S&P 500 (2008-2012) data and international market data. The computational results illustrate that the ℓ_p-norm regularized models can generate portfolios of any desired sparsity with portfolio variance and portfolio return comparable to those of the unregularized Markowitz model with cardinality constraint. Our analysis of a combined model lead u...
The Alternating Direction Method of Multipliers (ADMM) has now days gained tremendous attentions ... more The Alternating Direction Method of Multipliers (ADMM) has now days gained tremendous attentions for solving large-scale machine learning and signal processing problems due to the relative simplicity. However, the two-block structure of the classical ADMM still limits the size of the real problems being solved. When one forces a more-than-two-block structure by variable-splitting, the convergence speed slows down greatly as observed in practice. Recently, a randomly assembled cyclic multi-block ADMM (RAC-MBADMM) was developed by the authors for solving general convex and nonconvex quadratic optimization problems where the number of blocks can go greater than two so that each sub-problem has a smaller size and can be solved much more efficiently. In this paper, we apply this method to solving few selected machine learning problems related to convex quadratic optimization, such as Linear Regression, LASSO, Elastic-Net, and SVM. We prove that the algorithm would converge in expectation...
IEEE Transactions on Automatic Control, 1999
We show that the direct extension of alternating direction method of multipliers (ADMM) with thre... more We show that the direct extension of alternating direction method of multipliers (ADMM) with three blocks is not necessarily convergent even for solving a square system of linear equations, although its convergence proof was established 40 years ago with one or two-block. However, we prove that, in each iteration if one randomly and independently permutes the updating order of variable blocks followed by the regular multiplier update, then ADMM will converge in expectation when solving any square system of linear equations with any number of blocks. We also discuss its extension to solve general convex optimization problems, in particular, linear and quadratic programs.
We focus on a permutation betting market under parimutuel call auction model where traders bet on... more We focus on a permutation betting market under parimutuel call auction model where traders bet on the final ranking of n candidates. We present a Proportional Betting mechanism for this market. Our mechanism allows the traders to bet on any subset of the n x n 'candidate-rank' pairs, and rewards them proportionally to the number of pairs that appear in the final outcome. We show that market organizer's decision problem for this mechanism can be formulated as a convex program of polynomial size. More importantly, the formulation yields a set of n x n unique marginal prices that are sufficient to price the bets in this mechanism, and are computable in polynomial-time. The marginal prices reflect the traders' beliefs about the marginal distributions over outcomes. We also propose techniques to compute the joint distribution over n! permutations from these marginal distributions. We show that using a maximum entropy criterion, we can obtain a concise parametric form (wit...
Let (G,P) be a bar framework of n vertices in general position in R^d, d <= n-1, where G is a ... more Let (G,P) be a bar framework of n vertices in general position in R^d, d <= n-1, where G is a (d+1)-lateration graph. In this paper, we present a constructive proof that (G,P) admits a positive semi-definite stress matrix with rank n-d-1. We also prove a similar result for a sensor network where the graph consists of m(>= d+1) anchors.
A Semidefinite Programming (SDP) relaxation is an effective computational method to solve a Senso... more A Semidefinite Programming (SDP) relaxation is an effective computational method to solve a Sensor Network Localization problem, which attempts to determine the locations of a group of sensors given the distances between some of them [11]. In this paper, we analyze and determine new sufficient conditions and formulations that guarantee that the SDP relaxation is exact, i.e., gives the correct solution. These conditions can be useful for designing sensor networks and managing connectivities in practice. Our main contribution is twofold: We present the first non-asymptotic bound on the connectivity or radio range requirement of the sensors in order to ensure the network is uniquely localizable. Determining this range is a key component in the design of sensor networks, and we provide a result that leads to a correct localization of each sensor, for any number of sensors. Second, we introduce a new class of graphs that can always be correctly localized by an SDP relaxation. Specificall...
We establish that mass conserving single terminal-linkage networks of chemical reactions admit po... more We establish that mass conserving single terminal-linkage networks of chemical reactions admit positive steady states regardless of network deficiency and the choice of reaction rate constants. This result holds for closed systems without material exchange across the boundary, as well as for open systems with material exchange at rates that satisfy a simple sufficient and necessary condition. Our proof uses a fixed point of a novel convex optimization formulation to find the steady state behavior of chemical reaction networks that satisfy the law of mass-action kinetics. A fixed point iteration can be used to compute these steady states, and we show that it converges for weakly reversible homogeneous systems. We report the results of our algorithm on numerical experiments.
Let P be a polytope satisfying that each row of the defining matrix has at most one positive entr... more Let P be a polytope satisfying that each row of the defining matrix has at most one positive entry. Determining whether there is an integer point in P is known to be an NP-complete problem. By introducing an integer labeling rule on an augmented set and applying a triangulation of the Euclidean space, we develop in this paper a variable dimension method for computing an integer point in P. The method starts from an arbitrary integer point and follows a finite simplicial path that either leads to an integer point in P or proves no such point exists.
Naval Research Logistics (NRL), 2021
Lecture Notes in Control and Information Sciences
We present computational experience with an interior-point algorithm for large-scale quadratic pr... more We present computational experience with an interior-point algorithm for large-scale quadratic programming problems with box constraints. The algorithm requires a total of O (√ nL) number of iterations, where L is the size of the input data of the problem, and O (n 3) arithmetic operations per iteration. The algorithm has been implemented using vectorization and tested on an IBM 3090-600S computer with vector facilities. The computational results suggest that the efficiency of the algorithm depends on an appropriate choice of ...
Nonconvex Optimization and Its Applications
IEEE Signal Processing Magazine, 2010
IEEE Transactions on Signal Processing, 2010
Diagonal preconditioning has been a staple technique in optimization and machine learning. It oft... more Diagonal preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the design or Hessian matrix it is applied to, thereby speeding up convergence. However, rigorous analyses of how well various diagonal preconditioning procedures improve the condition number of the preconditioned matrix and how that translates into improvements in optimization are rare. In this paper, we first provide an analysis of a popular diagonal preconditioning technique based on column standard deviation and its effect on the condition number using random matrix theory. Then we identify a class of design matrices whose condition numbers can be reduced significantly by this procedure. We then study the problem of optimal diagonal preconditioning to improve the condition number of any full-rank matrix and provide a bisection algorithm and a potential reduction algorithm with O(log(1/ϵ)) iteration complexity, where each iteration consists of an SDP...
This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for minimizing ... more This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for minimizing a convex quadratic function, which is equivalent to Gauss-Seidel method and can be transformed to Kaczmarz method and projection onto convex sets (POCS). We observe that the known provable complexity of C-CD can be O(n^2) times slower than randomized coordinate descent (R-CD), but no example was rigorously proven to exhibit such a large gap. In this paper we show that the gap indeed exists. We prove that there exists an example for which C-CD takes at least O(n^4 κ_CD1/ϵ) operations, where κ_CD is related to Demmel's condition number and it determines the convergence rate of R-CD. It implies that in the worst case C-CD can indeed be O(n^2) times slower than R-CD, which has complexity O( n^2 κ_CD1/ϵ). Note that for this example, the gap exists for any fixed update order, not just a particular order. Based on the example, we establish several almost tight complexity bounds of C-CD for ...
Linear and Nonlinear Programming, 2016