Hypertension and Diabetic Retinopathy (original) (raw)

Pathwise coordinate optimization

2007

We consider "one-at-a-time" coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1-penalized regression (lasso) in the literature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well-known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinate-wise descent does not work in the "fused lasso," however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to the two-dimensional fused lasso, and demonstrate its performance on some image smoothing problems.

Fast optimization of non-convex Machine Learning objectives

MSc Thesis, University of Edinburgh, 2012

In this project we examined the problem of non-convex optimization in the context of Machine Learning, drawing inspiration from the increasing popularity of methods such as Deep Belief Networks, which involve non-convex objectives. We focused on the task of training the Neural Autoregressive Distribution Estimator, a recently proposed variant of the Restricted Boltzmann Machine, in applications to density estimation. The aim of the project was to explore the various stages involved in implementing optimization methods and choosing the appropriate one for a given task. We examined a number of optimization methods, ranging from derivative-free to second order and from batch to stochastic. We experimented with variations of these methods, presenting along the way all the major steps and decisions involved. The challenges of the problem included the relatively large parameter space and the non-convexity of the objective function, the large size of some of the datasets we used, the multitude of hyperparameters and decisions involved in each method, as well as the ever-present danger of overfitting the data. Our results show that second order Quasi-Newton batch methods like L-BFGS and variants of stochastic first order methods like Averaged Stochastic Gradient Descent outshine the rest of the methods we examined.

Dual subgradient algorithms for large-scale nonsmooth learning problems

Mathematical Programming, 2013

"Classical" First Order (FO) algorithms of convex optimization, such as Mirror Descent algorithm or Nesterov's optimal algorithm of smooth convex optimization, are well known to have optimal (theoretical) complexity estimates which do not depend on the problem dimension. However, to attain the optimality, the domain of the problem should admit a "good proximal setup". The latter essentially means that 1) the problem domain should satisfy certain geometric conditions of "favorable geometry", and 2) the practical use of these methods is conditioned by our ability to compute at a moderate cost proximal transformation at each iteration. More often than not these two conditions are satisfied in optimization problems arising in computational learning, what explains why proximal type FO methods recently became methods of choice when solving various learning problems. Yet, they meet their limits in several important problems such as multi-task learning with large number of tasks, where the problem domain does not exhibit favorable geometry, and learning and matrix completion problems with nuclear norm constraint, when the numerical cost of computing proximal transformation becomes prohibitive in large-scale problems. We propose a novel approach to solving nonsmooth optimization problems arising in learning applications where Fenchel-type representation of the objective function is available. The approach is based on applying FO algorithms to the dual problem and using the accuracy certificates supplied by the method to recover the primal solution. While suboptimal in terms of accuracy guaranties, the proposed approach does not rely upon "good proximal setup" for the primal problem but requires the problem domain to admit a Linear Optimization oracle-the ability to efficiently maximize a linear form on the domain of the primal problem.

Learning Coordinate Gradients with MultiTask Kernels

2008

Coordinate gradient learning is motivated by the problem of variable selection and determining variable covariation. In this paper we propose a novel unifying framework for coordinate gradient learning (MGL) from the perspective of multi-task learning. Our approach relies on multi-task kernels to simulate the structure of gradient learning. This has several appealing properties. Firstly, it allows us to introduce a novel algorithm which appropriately captures the inherent structure of coordinate gradient learning. Secondly, this approach gives rise to a clear algorithmic process: a computational optimization algorithm which is memory and time efficient. Finally, a statistical error analysis ensures convergence of the estimated function and its gradient to the true function and true gradient. We report some preliminary experiments to validate MGL for variable selection as well as determining variable covariation.

An improved multi-task learning approach with applications in medical diagnosis

2008

We propose a family of multi-task learning algorithms for collaborative computer aided diagnosis which aims to diagnose multiple clinically-related abnormal structures from medical images. Our formulations eliminate features irrelevant to all tasks, and identify discriminative features for each of the tasks. A probabilistic model is derived to justify the proposed learning formulations. By equivalence proof, some existing regularization-based methods can also be interpreted by our probabilistic model as imposing a Wishart hyperprior. Convergence analysis highlights the conditions under which the formulations achieve convexity and global convergence. Two real-world medical problems: lung cancer prognosis and heart wall motion analysis, are used to validate the proposed algorithms.

Multi-Task Learning for Diabetic Retinopathy Grading and Lesion Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence, 2020

Although deep learning for Diabetic Retinopathy (DR) screening has shown great success in achieving clinically acceptable accuracy for referable versus non-referable DR, there remains a need to provide more fine-grained grading of the DR severity level as well as automated segmentation of lesions (if any) in the retina images. We observe that the DR severity level of an image is dependent on the presence of different types of lesions and their prevalence. In this work, we adopt a multi-task learning approach to perform the DR grading and lesion segmentation tasks. In light of the lack of lesion segmentation mask ground-truths, we further propose a semi-supervised learning process to obtain the segmentation masks for the various datasets. Experiments results on publicly available datasets and a real world dataset obtained from population screening demonstrate the effectiveness of the multi-task solution over state-of-the-art networks.

SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task Learning

ArXiv, 2021

Multi-task learning (MTL) is a subfield of machine learning with important applications, but the multiobjective nature of optimization in MTL leads to difficulties in balancing training between tasks. The best MTL optimization methods require individually computing the gradient of each task’s loss function, which impedes scalability to a large number of tasks. In this paper, we propose Scaled Loss Approximate Weighting (SLAW), a method for multitask optimization that matches the performance of the best existing methods while being much more efficient. SLAW balances learning between tasks by estimating the magnitudes of each task’s gradient without performing any extra backward passes. We provide theoretical and empirical justification for SLAW’s estimation of gradient magnitudes. Experimental results on non-linear regression, multi-task computer vision, and virtual screening for drug discovery demonstrate that SLAW is significantly more efficient than strong baselines without sacrif...

A multi-task learning formulation for predicting disease progression

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11, 2011

Alzheimer's Disease (AD), the most common type of dementia, is a severe neurodegenerative disorder. Identifying markers that can track the progress of the disease has recently received increasing attentions in AD research. A definitive diagnosis of AD requires autopsy confirmation, thus many clinical/cognitive measures including Mini Mental State Examination (MMSE) and Alzheimer's Disease Assessment Scale cognitive subscale (ADAS-Cog) have been designed to evaluate the cognitive status of the patients and used as important criteria for clinical diagnosis of probable AD. In this paper, we propose a multi-task learning formulation for predicting the disease progression measured by the cognitive scores and selecting markers predictive of the progression. Specifically, we formulate the prediction problem as a multi-task regression problem by considering the prediction at each time point as a task. We capture the intrinsic relatedness among different tasks by a temporal group Lasso regularizer. The regularizer consists of two components including an ℓ2,1-norm penalty on the regression weight vectors, which ensures that a small subset of features will be selected for the regression models at all time points, and a temporal smoothness term which ensures a small deviation between two regression models at successive time points. We have performed extensive evaluations using various types of data at the baseline from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database for predicting the future MMSE and ADAS-Cog scores. Our experimental studies demonstrate the effectiveness of the proposed algorithm for capturing the progression trend and the cross-sectional group differences of AD severity. Results also show that most markers selected by the proposed algorithm are consistent with findings from existing cross-sectional studies.

A Projected Subgradient Method for Scalable Multi-Task Learning

2008

Recent approaches to multi-task learning have investigated the use of a variety of matrix norm regularization schemes for promoting feature sharing across tasks. In essence, these approaches aim at extending the l 1 framework for sparse single task approximation to the multi-task setting. In this paper we focus on the computational complexity of training a jointly regularized model and propose an optimization algorithm whose complexity is linear with the number of training examples and O(n log n) with n being the number of parameters of the joint model. Our algorithm is based on setting jointly regularized loss minimization as a convex constrained optimization problem for which we develop an efficient projected gradient algorithm. The main contribution of this paper is the derivation of a gradient projection method with l 1−∞ constraints that can be performed efficiently and which has convergence rates of O(1/ǫ 2) for any convex Lipschitz loss function.