Monotone Algorithms (original) (raw)
Related papers
SQUAREM: An R Package for Off-the-Shelf Acceleration of EM, MM and Other EM-Like Monotone Algorithms
Journal of Statistical Software
We discuss R package SQUAREM for accelerating iterative algorithms which exhibit slow, monotone convergence. These include the well-known expectation-maximization algorithm, majorize-minimize (MM), and other EM-like algorithms such as expectation conditional maximization, and generalized EM algorithms. We demonstrate the simplicity, generality, and power of SQUAREM through a wide array of applications of EM/MM problems, including binary Poisson mixture, factor analysis, interval censoring, genetics admixture, and logistic regression maximum likelihood estimation (an MM problem). We show that SQUAREM is easy to apply, and can accelerate any fixed-point, smooth, contraction mapping with linear convergence rate. Squared iterative scheme (Squarem) algorithm provides significant speed-up of EM-like algorithms. The margin of the advantage for Squarem is especially huge for high-dimensional problems or when EM step is relatively time-consuming to evaluate. Squarem can be used off-the-shelf since there is no need for the user to tweak any control parameters to optimize performance. Given its remarkable ease of use, Squarem may be considered as a default accelerator for slowly converging EM-like algorithms. All the comparisons of CPU computing time in the paper are made on a quad-core 2.3 GHz Intel Core i7 Mac computer.
Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm
Scandinavian Journal of Statistics, 2008
The expectation-maximization (EM) algorithm is a popular approach for obtaining maximum likelihood estimates in incomplete data problems because of its simplicity and stability (e.g. monotonic increase of likelihood). However, in many applications the stability of EM is attained at the expense of slow, linear convergence. We have developed a new class of iterative schemes, called squared iterative methods (SQUAREM), to accelerate EM, without compromising on simplicity and stability. SQUAREM generally achieves superlinear convergence in problems with a large fraction of missing information. Globally convergent schemes are easily obtained by viewing SQUAREM as a continuation of EM. SQUAREM is especially attractive in high-dimensional problems, and in problems where model-specific analytic insights are not available. SQUAREM can be readily implemented as an 'off-the-shelf ' accelerator of any EM-type algorithm, as it only requires the EM parameter updating. We present four examples to demonstrate the effectiveness of SQUAREM. A general-purpose implementation (written in R) is available.
An Expanded Theoretical Treatment of Iteration-Dependent Majorize-Minimize Algorithms
IEEE Transactions on Image Processing, 2007
The majorize-minimize (MM) optimization technique has received considerable attention in signal and image processing applications, as well as in statistics literature. At each iteration of an MM algorithm, one constructs a tangent majorant function that majorizes the given cost function and is equal to it at the current iterate. The next iterate is obtained by minimizing this tangent majorant function, resulting in a sequence of iterates that reduces the cost function monotonically. A well-known special case of MM methods are expectation-maximization algorithms. In this paper, we expand on previous analyses of MM, due to Fessler and Hero, that allowed the tangent majorants to be constructed in iteration-dependent ways. Also, this paper overcomes an error in one of those earlier analyses. There are three main aspects in which our analysis builds upon previous work. First, our treatment relaxes many assumptions related to the structure of the cost function, feasible set, and tangent majorants. For example, the cost function can be nonconvex and the feasible set for the problem can be any convex set. Second, we propose convergence conditions, based on upper curvature bounds, that can be easier to verify than more standard continuity conditions. Furthermore, these conditions allow for considerable design freedom in the iteration-dependent behavior of the algorithm. Finally, we give an original characterization of the local region of convergence of MM algorithms based on connected (e.g., convex) tangent majorants. For such algorithms, cost function minimizers will locally attract the iterates over larger neighborhoods than typically is guaranteed with other methods. This expanded treatment widens the scope of the MM algorithm designs that can be considered for signal and image processing applications, allows us to verify the convergent behavior of previously published algorithms, and gives a fuller understanding overall of how these algorithms behave.
A fast EM algorithm for quadratic optimization subject to convex constraints
Statistica Sinica, 2007
Convex constraints (CCs) such as box constraints and linear inequality constraints appear frequently in statistical inference and in applications. The problems of quadratic optimization (QO) subject to CCs occur in isotonic regression, shape-restricted non-parametric regression, variable selection (via the lasso algorithm and bridge regression), limited dependent variables models, image reconstruction, and so on. Existing packages for QO are not generally applicable to CCs. Although EM-type algorithms may be applied to such problems (Tian, Ng and Tan (2005)), the convergence rate/speed of these algorithms is painfully slow, especially for high-dimensional data. This paper develops a fast EM algorithm for QO with CCs. We construct a class of data augmentation schemes indexed by a 'working parameter' r (r ∈ R), and then optimize r over R under several convergence criteria. In addition, we use Cholesky decomposition to reduce both the number of latent variables and the dimension, leading to further acceleration of the EM. Standard errors of the restricted estimators are calculated using a non-parametric bootstrapping procedure. Simulation and comparison are performed and a complex multinomial dataset is analyzed to illustrate the proposed methods.
On Aitken’s method and other approaches for accelerating convergence of the EM algorithm
1995
The EM algorithm of is a broadly applicable approach that has been widely applied to the iterative computation of maximum likelihood estimates in a variety of incomplete-data problems. A criticism that has been levelled at the EM algorithm is that its convergence can be quite slow. Unfortunately, methods to accelerate the EM algorithm do tend to sacri ce the simplicity it usually enjoys. In this paper, we review some available methods for accelerating convergence of the EM algorithm. In particular, we consider the use of the multivariate version of Aitken's method for EM acceleration. Other methods to be considered include the conjugate gradient approach of Jamshidian and Jennrich (1993) and the quasi-Newton approach of Lange (1995b). We also consider the recently proposed ECME algorithm of .
Annals of the Institute of Statistical Mathematics, 2011
The EM algorithm is a widely used methodology for penalized likelihood estimation. Provable monotonicity and convergence are the hallmarks of the EM algorithm and these properties are well established for smooth likelihood and smooth penalty functions. However, many relaxed versions of variable selection penalties are not smooth. In this paper we introduce a new class of Space Alternating Penalized Kullback Proximal extensions of the EM algorithm for nonsmooth likelihood inference. We show that the cluster points of the new method are stationary points even when they lie on the boundary of the parameter set. We illustrate the new class of algorithms for the problems of model selection for finite mixtures of regression and of sparse image reconstruction.
WIREs Data Mining and Knowledge Discovery, 2017
MM (majorization-minimization) algorithms are an increasingly popular tool for solving optimization problems in machine learning and statistical estimation. This article introduces the MM algorithm framework in general and via three commonly considered example applications: Gaussian mixture regressions, multinomial logistic regressions, and support vector machines. Specific algorithms for these three examples are derived and Mathematical Programming Series A numerical demonstrations are presented. Theoretical and practical aspects of MM algorithm design are discussed.
Proximal Algorithms in Statistics and Machine Learning
Statistical Science, 2015
In this paper we develop proximal methods for statistical learning. Proximal point algorithms are useful for optimisation in machine learning and statistics for obtaining solutions with composite objective functions. Our approach exploits a generalised Moreau envelope and closed form solutions of proximal operators to develop novel proximal algorithms. We illustrate our methodology with regularized logistic and poisson regression and provide solutions for non-convex bridge penalties and fused lasso norms. We also provide a survey of convergence of non-descent algorithms with acceleration. Finally, we provide directions for future research.
Inverse Problems, 2010
In many numerical applications, for instance in image deconvolution, the nonnegativity of the computed solution is required. When a problem of deconvolution is formulated in a statistical frame, the recorded image is seen as the realization of a random process, where the nature of the noise is taken into account. This formulation leads to the maximization of a likelihood function which depends on the statistical property assumed for the noise. In this paper we revisit, under this unifying statistical approach, some iterative methods coupled with suitable strategies for enforcing nonnegativity and other ones which instead naturally embed nonnegativity. For all these methods we carry out a comparative study taking into account several performance indicators. The reconstruction efficiency, the computational cost, the consistency with the discrepancy principle (a common technique for guessing the best regularization parameter) and the sensitivity to this choice are compared in a simulated context, by means of an extensive experimentation on both 1D and 2D problems.
A quasi-Newton acceleration for high-dimensional optimization algorithms
Statistics and Computing, 2011
In many statistical problems, maximum likeli- hood estimation by an EM or MM algorithm suffers from excruciatingly slow convergence. This tendency limits the application of these algorithms to modern high-dimensional problems in data mining, genomics, and imaging. Unfortu- nately, most existing acceleration techniques are ill-suited to complicated models involving large numbers of parameters. The squared iterative methods (SQUAREM) recently pro-