Efficient and Scalable Multi-Task Regression on Massive Number of Tasks (original) (raw)

Many Task Learning with Task Routing

arXiv (Cornell University), 2019

Typical multi-task learning (MTL) methods rely on architectural adjustments and a large trainable parameter set to jointly optimize over several tasks. However, when the number of tasks increases so do the complexity of the architectural adjustments and resource requirements. In this paper, we introduce a method which applies a conditional feature-wise transformation over the convolutional activations that enables a model to successfully perform a large number of tasks. To distinguish from regular MTL, we introduce Many Task Learning (MaTL) as a special case of MTL where more than 20 tasks are performed by a single model. Our method dubbed Task Routing (TR) is encapsulated in a layer we call the Task Routing Layer (TRL), which applied in an MaTL scenario successfully fits hundreds of classification tasks in one model. We evaluate our method on 5 datasets against strong baselines and state-of-the-art approaches.

Multi-stage multi-task feature learning

Journal of Machine Learning Research, 2013

Multi-task sparse feature learning aims to improve the generalization performance by exploiting the shared features among tasks. It has been successfully applied to many applications including computer vision and biomedical informatics. Most of the existing multi-task sparse feature learning algorithms are formulated as a convex sparse regularization problem, which is usually suboptimal, due to its looseness for approximating an ℓ 0-type regularizer. In this paper, we propose a non-convex formulation for multi-task sparse feature learning based on a novel non-convex regularizer. To solve the non-convex optimization problem, we propose a Multi-Stage Multi-Task Feature Learning (MSMTFL) algorithm; we also provide intuitive interpretations, detailed convergence and reproducibility analysis for the proposed algorithm. Moreover, we present a detailed theoretical analysis showing that MSMTFL achieves a better parameter estimation error bound than the convex formulation. Empirical studies on both synthetic and real-world data sets demonstrate the effectiveness of MSMTFL in comparison with the state of the art multi-task sparse feature learning algorithms.

Large Dimensional Analysis and Improvement of Multi Task Learning

Cornell University - arXiv, 2020

Multi Task Learning (MTL) efficiently leverages useful information contained in multiple related tasks to help improve the generalization performance of all tasks. This article conducts a large dimensional analysis of a simple but, as we shall see, extremely powerful when carefully tuned, Least Square Support Vector Machine (LSSVM) version of MTL, in the regime where the dimension p of the data and their number n grow large at the same rate. Under mild assumptions on the input data, the theoretical analysis of the MTL-LSSVM algorithm first reveals the "sufficient statistics" exploited by the algorithm and their interaction at work. These results demonstrate, as a striking consequence, that the standard approach to MTL-LSSVM is largely suboptimal, can lead to severe effects of negative transfer but that these impairments are easily corrected. These corrections are turned into an improved MTL-LSSVM algorithm which can only benefit from additional data, and the theoretical performance of which is also analyzed. As evidenced and theoretically sustained in numerous recent works, these large dimensional results are robust to broad ranges of data distributions, which our present experiments corroborate. Specifically, the article reports a systematically close behavior between theoretical and empirical performances on popular datasets, which is strongly suggestive of the applicability of the proposed carefully tuned MTL-LSSVM method to real data. This fine-tuning is fully based on the theoretical analysis and does not in particular require any cross validation procedure. Besides, the reported performances on real datasets almost systematically outperform much more elaborate and less intuitive state-of-the-art multi-task and transfer learning methods.

Efficient multi-task feature learning with calibration

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14, 2014

Multi-task feature learning has been proposed to improve the generalization performance by learning the shared features among multiple related tasks and it has been successfully applied to many real-world problems in machine learning, data mining, computer vision and bioinformatics. Most existing multi-task feature learning models simply assume a common noise level for all tasks, which may not be the case in real applications. Recently, a Calibrated Multivariate Regression (CMR) model has been proposed, which calibrates different tasks with respect to their noise levels and achieves superior prediction performance over the noncalibrated one. A major challenge is how to solve the CMR model efficiently as it is formulated as a composite optimization problem consisting of two non-smooth terms. In this paper, we propose a variant of the calibrated multi-task feature learning formulation by including a squared norm regularizer. We show that the dual problem of the proposed formulation is a smooth optimization problem with a piecewise sphere constraint. The simplicity of the dual problem enables us to develop fast dual optimization algorithms with low per-iteration cost. We also provide a detailed convergence analysis for the proposed dual optimization algorithm. Empirical studies demonstrate that, the dual optimization algorithm quickly converges and it is much more efficient than the primal optimization algorithm. Moreover, the calibrated multi-task feature learning algorithms with and without the squared norm regularizer achieve similar prediction performance and both outperform the non-calibrated ones. Thus, the proposed variant not only enables us to develop fast optimization algorithms, but also keeps the superior prediction performance of the calibrated multi-task feature learning over the non-calibrated one.

SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task Learning

ArXiv, 2021

Multi-task learning (MTL) is a subfield of machine learning with important applications, but the multiobjective nature of optimization in MTL leads to difficulties in balancing training between tasks. The best MTL optimization methods require individually computing the gradient of each task’s loss function, which impedes scalability to a large number of tasks. In this paper, we propose Scaled Loss Approximate Weighting (SLAW), a method for multitask optimization that matches the performance of the best existing methods while being much more efficient. SLAW balances learning between tasks by estimating the magnitudes of each task’s gradient without performing any extra backward passes. We provide theoretical and empirical justification for SLAW’s estimation of gradient magnitudes. Experimental results on non-linear regression, multi-task computer vision, and virtual screening for drug discovery demonstrate that SLAW is significantly more efficient than strong baselines without sacrif...

A Divide-and-Conquer Method for Scalable Robust Multitask Learning

IEEE Transactions on Neural Networks and Learning Systems, 2015

Multitask learning (MTL) aims at improving the generalization performance of multiple tasks by exploiting the shared factors among them. An important line of research in the MTL is the robust MTL (RMTL) methods, which use trace-norm regularization to capture task relatedness via a low-rank structure. The existing algorithms for the RMTL optimization problems rely on the accelerated proximal gradient (APG) scheme that needs repeated full singular value decomposition (SVD) operations. However, the time complexity of a full SVD is O(min(md 2 , m 2 d)) for an RMTL problem with m tasks and d features, which becomes unaffordable in real-world MTL applications that often have a large number of tasks and high-dimensional features. In this paper, we propose a scalable solution for large-scale RMTL, with either the least squares loss or the squared hinge loss, by a divide-and-conquer method. The proposed method divides the original RMTL problem into several size-reduced subproblems, solves these cheaper subproblems in parallel by any base algorithm (e.g., APG) for RMTL, and then combines the results to obtain the final solution. Our theoretical analysis indicates that, with high probability, the recovery errors of the proposed divide-and-conquer algorithm are bounded by those of the base algorithm. Furthermore, in order to solve the subproblems with the least squares loss or the squared hinge loss, we propose two efficient base algorithms based on the linearized alternating direction method, respectively. Experimental results demonstrate that, with little loss of accuracy, our method is substantially faster than the state-of-the-art APG algorithms for RMTL. Index Terms-Divide-and-conquer method, linearized alternating direction method (LADM), low-rank matrices, multitask learning (MTL). I. INTRODUCTION M ULTITASK learning (MTL) aims at improving the generalization performance of multiple tasks by utilizing the shared information among them. MTL has proved its success in various applications, such as natural language processing [1], handwritten character recognition [2], [3], and medical diagnosis [4]. Many MTL methods have been proposed in [1], [2], and [5]-[14].

Clustered Multi-Task Learning Via Alternating Structure Optimization

Advances in neural information processing systems, 2011

Multi-task learning (MTL) learns multiple related tasks simultaneously to improve generalization performance. Alternating structure optimization (ASO) is a popular MTL method that learns a shared low-dimensional predictive structure on hypothesis spaces from multiple related tasks. It has been applied successfully in many real world applications. As an alternative MTL approach, clustered multi-task learning (CMTL) assumes that multiple tasks follow a clustered structure, i.e., tasks are partitioned into a set of groups where tasks in the same group are similar to each other, and that such a clustered structure is unknown a priori. The objectives in ASO and CMTL differ in how multiple tasks are related. Interestingly, we show in this paper the equivalence relationship between ASO and CMTL, providing significant new insights into ASO and CMTL as well as their inherent relationship. The CMTL formulation is non-convex, and we adopt a convex relaxation to the CMTL formulation. We further...

Multi-task Learning by Maximizing Statistical Dependence

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

We present a new multi-task learning (MTL) approach that can be applied to multiple heterogeneous task estimators. Our motivation is that the best task estimator could change depending on the task itself. For example, we may have a deep neural network for the first task and a Gaussian process for the second task. Classical MTL approaches cannot handle this case, as they require the same model or even the same parameter types for all tasks. We tackle this by considering task-specific estimators as random variables. Then, the task relationships are discovered by measuring the statistical dependence between each pair of random variables. By doing so, our model is independent of the parametric nature of each task, and is even agnostic to the existence of such parametric formulation. We compare our algorithm with existing MTL approaches on challenging real world ranking and regression datasets, and show that our approach achieves comparable or better performance without knowing the parametric form.

Computationally Efficient Multi-task Learning with Least-squares Probabilistic Classifiers

IPSJ Transactions on Computer Vision and Applications, 2011

Probabilistic classification and multi-task learning are two important branches of machine learning research. Probabilistic classification is useful when the 'confidence' of decision is necessary. On the other hand, the idea of multi-task learning is beneficial if multiple related learning tasks exist. So far, kernelized logistic regression has been a vital probabilistic classifier for the use in multi-task learning scenarios. However, its training tends to be computationally expensive, which prevented its use in large-scale problems. To overcome this limitation, we propose to employ a recently-proposed probabilistic classifier called the least-squares probabilistic classifier in multi-task learning scenarios. Through image classification experiments, we show that our method achieves comparable classification performance to the existing method, with much less training time.