A comparison of loss weighting strategies for multi-task learning in deep neural networks (original) (raw)

Adaptive weight assignment scheme for multi-task learning

IAES International Journal of Artificial Intelligence (IJ-AI), 2022

Deep learning based models are used regularly in every applications nowadays. Gen- erally we train a single model on a single task. However, we can train multiple tasks on a single model under multi-task learning (MTL) settings. This provides us many benefits like lesser training time, training a single model for multiple tasks, reducing overfitting, and improving performances. To train a model in multi-task learning settings we need to sum the loss values from different tasks. In vanilla multi-task learning settings we assign equal weights but since not all tasks are of similar difficulty we need to allocate more weight to tasks which are more difficult. Also improper weight assignment reduces the performance of the model. We propose a simple weight assignment scheme in this paper which improves the performance of the model and puts more emphasis on difficult tasks. We tested our methods performance on both image and textual data and also compared performance against two popular weight ...

SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task Learning

ArXiv, 2021

Multi-task learning (MTL) is a subfield of machine learning with important applications, but the multiobjective nature of optimization in MTL leads to difficulties in balancing training between tasks. The best MTL optimization methods require individually computing the gradient of each task’s loss function, which impedes scalability to a large number of tasks. In this paper, we propose Scaled Loss Approximate Weighting (SLAW), a method for multitask optimization that matches the performance of the best existing methods while being much more efficient. SLAW balances learning between tasks by estimating the magnitudes of each task’s gradient without performing any extra backward passes. We provide theoretical and empirical justification for SLAW’s estimation of gradient magnitudes. Experimental results on non-linear regression, multi-task computer vision, and virtual screening for drug discovery demonstrate that SLAW is significantly more efficient than strong baselines without sacrif...

Design Perspectives of Multitask Deep Learning Models and Applications

arXiv (Cornell University), 2022

In recent years, multi-task learning has turned out to be of great success in various applications. Though single model training has promised great results throughout these years, it ignores valuable information that might help us estimate a metric better. Under learningrelated tasks, multi-task learning has been able to generalize the models even better. We try to enhance the feature mapping of the multi-tasking models by sharing features among related tasks and inductive transfer learning. Also, our interest is in learning the task relationships among various tasks for acquiring better benefits from multi-task learning. In this chapter, our objective is to visualize the existing multi-tasking models, compare their performances, the methods used to evaluate the performance of the multi-tasking models, discuss the problems faced during the design and implementation of these models in various domains, and the advantages and milestones achieved by them.

Navigating the Trade-Off between Multi-Task Learning and Learning to Multitask in Deep Neural Networks

ArXiv, 2020

The terms multi-task learning and multitasking are easily confused. Multi-task learning refers to a paradigm in machine learning in which a network is trained on various related tasks to facilitate the acquisition of tasks. In contrast, multitasking is used to indicate, especially in the cognitive science literature, the ability to execute multiple tasks simultaneously. While multi-task learning exploits the discovery of common structure between tasks in the form of shared representations, multitasking is promoted by separating representations between tasks to avoid processing interference. Here, we build on previous work involving shallow networks and simple task settings suggesting that there is a trade-off between multi-task learning and multitasking, mediated by the use of shared versus separated representations. We show that the same tension arises in deep networks and discuss a meta-learning algorithm for an agent to manage this trade-off in an unfamiliar environment. We displ...

Deep multi-task learning with low level tasks supervised at lower layers

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2016

In all previous work on deep multi-task learning we are aware of, all task supervisions are on the same (outermost) layer. We present a multi-task learning architecture with deep bi-directional RNNs, where different tasks supervision can happen at different layers. We present experiments in syntactic chunking and CCG supertagging, coupled with the additional task of POS-tagging. We show that it is consistently better to have POS supervision at the innermost rather than the outermost layer. We argue that this is because "lowlevel" tasks are better kept at the lower layers, enabling the higher-level tasks to make use of the shared representation of the lower-level tasks. Finally, we also show how this architecture can be used for domain adaptation.

Regularizing Deep Multi-Task Networks using Orthogonal Gradients

ArXiv, 2019

Deep neural networks are a promising approach towards multi-task learning because of their capability to leverage knowledge across domains and learn general purpose representations. Nevertheless, they can fail to live up to these promises as tasks often compete for a model's limited resources, potentially leading to lower overall performance. In this work we tackle the issue of interfering tasks through a comprehensive analysis of their training, derived from looking at the interaction between gradients within their shared parameters. Our empirical results show that well-performing models have low variance in the angles between task gradients and that popular regularization methods implicitly reduce this measure. Based on this observation, we propose a novel gradient regularization term that minimizes task interference by enforcing near orthogonal gradients. Updating the shared parameters using this property encourages task specific decoders to optimize different parts of the fe...

LogSE: An Uncertainty-Based Multi-Task Loss Function for Learning Two Regression Tasks

JUCS - Journal of Universal Computer Science

Multi-task learning (MTL) is a popular method in machine learning which utilizes related information of multi tasks to learn a task more efficiently and accurately. Naively, one can benefit from MTL by using a weighted linear sum of the different tasks loss functions. Manual specification of appropriate weights is difficult and typically does not improve performance, so it is critical to find an automatic weighting strategy for MTL. Also, there are three types of uncertainties that are captured in deep learning. Epistemic uncertainty is related to the lack of data. Heteroscedas- tic aleatoric uncertainty depends on the input data and differs from one input to another. In this paper, we focus on the third type, homoscedastic aleatoric uncertainty, which is constant for differ- ent inputs and is task-dependent. There are some methods for learning uncertainty-based weights as the parameters of a model. But in this paper, we introduce a novel multi-task loss function to capture homosced...

Empirical evaluation of multi-task learning in deep neural networks for natural language processing

Neural Computing and Applications, 2020

Multi-Task Learning (MTL) aims at boosting the overall performance of each individual task by leveraging useful information contained in multiple related tasks. It has shown great success in natural language processing (NLP). Currently, a number of MLT architectures and learning mechanisms have been proposed for various NLP tasks. However, there is no systematic exploration and comparison of different MLT architectures and learning mechanisms for their strong performance in-depth. In this paper, we conduct a thorough examination of typical MTL methods on a broad range of representative NLP tasks. Our primary goal is to understand the merits and demerits of existing MTL methods in NLP tasks, thus devising new hybrid architectures intended to combine their strengths.

Many Task Learning with Task Routing

arXiv (Cornell University), 2019

Typical multi-task learning (MTL) methods rely on architectural adjustments and a large trainable parameter set to jointly optimize over several tasks. However, when the number of tasks increases so do the complexity of the architectural adjustments and resource requirements. In this paper, we introduce a method which applies a conditional feature-wise transformation over the convolutional activations that enables a model to successfully perform a large number of tasks. To distinguish from regular MTL, we introduce Many Task Learning (MaTL) as a special case of MTL where more than 20 tasks are performed by a single model. Our method dubbed Task Routing (TR) is encapsulated in a layer we call the Task Routing Layer (TRL), which applied in an MaTL scenario successfully fits hundreds of classification tasks in one model. We evaluate our method on 5 datasets against strong baselines and state-of-the-art approaches.

MBMT-Net: A Multi-Task Learning Based Convolutional Neural Network Architecture for Dense Prediction Tasks

IEEE Access

Recently proposed improvements in the field of Computer Vision refer to enhancing the feature processing capabilities of Single-Task Convolutional Neural Networks. A typical Single-Task network consists of a backbone and a head, where the feature extractor is usually optimised using the gradient provided by the head. Inevitably, the backbone specialises for the given task. This sort of approach does not scale well for learning multiple tasks at once while having the same input. As a response, there is an increasing interest in Multi-Task formulations. Since most Multi-Task architectures employ a single shared backbone, when gradients from different tasks are propagated back to it, it can result in its oversaturation. Thus, this problem may be solved using Multi-Backbone feature extractors. Hence, as a strategy proposed to compensate for these shortcomings, we introduce MBMT-Net, a Multi-Backbone-Multi-Task-Network architecture based on a development strategy that infuses backbones with more diverse and specialised processing capabilities. MBMT-Net consists of parallel pre-trained backbones whose outputs are concatenated and offered to the Multi-Task heads that shall benefit from richer and more diverse features with decreased number of network parameters when compared to traditional Multi-Task architectures. Our strategy is architecture independent, and it can be applied to different types of backbones and parsing heads, which greatly extends the domain of configurable features, finally enhancing existing Single-and Multi-Task model building strategies and outperforming them when using the Multi-Backbone design. As a result, while having a deficit of 12.16M parameters, MBMT-Net reaches state-of-the-art performances, and surpasses the previously best semantic segmentation Multi-Task model in terms of Mean Intersection over Union when evaluated on NYUv2 data set.