Ben Chamberlain | Imperial College London (original) (raw)
Papers by Ben Chamberlain
The workshop features presentations of accepted contributions to the RecSys Challenge 2021, organ... more The workshop features presentations of accepted contributions to the RecSys Challenge 2021, organized by Politecnico di Bari, ETH Zürich, Jönköping University, and the data set is provided by Twitter. The challenge focuses on a real-world task of tweet engagement prediction in a dynamic environment. For 2021, the challenge considers four different engagement types: Likes, Retweet, Quote, and replies. This year's challenge brings the problem even closer to Twitter's real recommender systems by introducing latency constraints. We also increases the data size to encourage novel methods. Also, the data density is increased in terms of the graph where users are considered to be nodes and interactions as edges. The goal is twofold: to predict the probability of different engagement types of a target user for a set of Tweets based on heterogeneous input data while providing fair recommendations. In fact, multi-goal optimization considering accuracy and fairness is particularly challenging. However, we believed that the recommendation community was nowadays mature enough to face the challenge of providing accurate and, at the same time, fair recommendations. To this end, Twitter has released a public dataset of close to 1 billion data points, > 40 million each day over 28 days. Week 1−3 will be used for training and week 4 for evaluation and testing. Each datapoint contains the tweet along with engagement features, user features, and tweet * Both authors contributed equally to this research.
arXiv (Cornell University), Jun 18, 2020
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to le... more Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions. These arise in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed for dealing with graphs that are dynamic in nature (e.g. evolving features or connectivity over time). We present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs significantly outperform previous approaches while being more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.
International Conference on Machine Learning, Jul 18, 2021
We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous... more We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, which achieve competitive results on many standard graph benchmarks.
After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that wa... more After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (˜1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dataset was synced with the Twitter platform: if a user deleted their content, the same content would be promptly removed from the dataset too. In this paper, we introduce the dataset and challenge, highlighting some of the issues that arise when creating recommender systems at Twitter scale.
arXiv (Cornell University), 2020
Graph representation learning has recently been applied to a broad spectrum of problems ranging f... more Graph representation learning has recently been applied to a broad spectrum of problems ranging from computer graphics and chemistry to high energy physics and social media. The popularity of graph neural networks has sparked interest, both in academia and in industry, in developing methods that scale to very large graphs such as Facebook or Twitter social networks. In most of these approaches, the computational cost is alleviated by a sampling strategy retaining a subset of node neighbors or subgraphs at training time. In this paper we propose a new, efficient and scalable graph deep learning architecture which sidesteps the need for graph sampling by using graph convolutional filters of different size that are amenable to efficient precomputation, allowing extremely fast training and inference. Our architecture allows using different local graph operators (e.g. motifinduced adjacency matrices or Personalized Page Rank diffusion matrix) to best suit the task at hand. We conduct extensive experimental evaluation on various open benchmarks and show that our approach is competitive with other state-ofthe-art architectures, while requiring a fraction of the training and inference time. Moreover, we obtain state-of-the-art results on ogbn-papers100M, the largest public graph dataset, with over 110 million nodes and 1.5 billion edges. * Equal contribution Preprint. Under review.
Data splits We report transductive node classification results in Tables 2 and 3 in the main pape... more Data splits We report transductive node classification results in Tables 2 and 3 in the main paper. To facilitate direct comparison with many previous works, Table 2 uses the Planetoid splits given by [8]. As discussed in e.g. [7], there are many limitations with results based on these fixed dataset splits and so we report a more robust set of experimental results in Table 3. Here we use 100 random splits with 20 random weight initializations. In each case 20 labelled nodes per class are used at training time, with the remaining labels split between test and validation. This methodology is consistent with [7].
Fourteenth ACM Conference on Recommender Systems, 2020
Word2vec is a powerful machine learning tool that emerged from Natural Language Processing (NLP) ... more Word2vec is a powerful machine learning tool that emerged from Natural Language Processing (NLP) and is now applied in multiple domains, including recommender systems, forecasting, and network analysis. As Word2vec is often used off the shelf, we address the question of whether the default hyperparameters are suitable for recommender systems. The answer is emphatically no. In this paper, we first elucidate the importance of hyperparameter optimization and show that unconstrained optimization yields an average 221% improvement in hit rate over the default parameters. However, unconstrained optimization leads to hyperparameter settings that are very expensive and not feasible for large scale recommendation tasks. To this end, we demonstrate 138% average improvement in hit rate with a runtime budget-constrained hyperparameter optimization. Furthermore, to make hyperparameter optimization applicable for large scale recommendation problems where the target dataset is too large to search over, we investigate generalizing hyperparameters settings from samples. We show that applying constrained hyperparameter optimization using only a 10% sample of the data still yields a 91% average improvement in hit rate over the default parameters when applied to the full datasets. Finally, we apply hyperparameters learned using our method of constrained optimization on a sample to the Who To Follow recommendation service at Twitter and are able to increase follow rates by 15%.
2019 IEEE International Conference on Data Mining (ICDM), 2019
Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the i... more Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the impact of new propositions and to experiment with many variants of existing products. However, until now, the question of measuring the measurer, or valuing the contribution of an E&M capability to organizational success has not been addressed. We tackle this problem by analyzing how, by decreasing estimation uncertainty, E&M platforms allow for better prioritization. We quantify this benefit in terms of expected relative improvement in the performance of all new propositions and provide guidance for how much an E&M capability is worth and when organizations should invest in one.
After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that wa... more After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (~ 1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dataset was synced with the Twitter platform: if a user deleted their content, the same content would be promptly removed from the dataset too. In this paper, we introduce the dataset and challenge, highlighting some of the issues that arise when creating recommender systems at Twitter scale.
Fifteenth ACM Conference on Recommender Systems, 2021
The workshop features presentations of accepted contributions to the RecSys Challenge 2021, organ... more The workshop features presentations of accepted contributions to the RecSys Challenge 2021, organized by Politecnico di Bari, ETH Zürich, Jönköping University, and the data set is provided by Twitter. The challenge focuses on a real-world task of tweet engagement prediction in a dynamic environment. For 2021, the challenge considers four different engagement types: Likes, Retweet, Quote, and replies. This year’s challenge brings the problem even closer to Twitter’s real recommender systems by introducing latency constraints. We also increases the data size to encourage novel methods. Also, the data density is increased in terms of the graph where users are considered to be nodes and interactions as edges. The goal is twofold: to predict the probability of different engagement types of a target user for a set of Tweets based on heterogeneous input data while providing fair recommendations. In fact, multi-goal optimization considering accuracy and fairness is particularly challenging....
ArXiv, 2020
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to le... more Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. W...
ArXiv, 2021
We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-E... more We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural networks and achieves state-of-the-art results on several benchmarks.
We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous... more We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, which achieve competitive results on many standard graph benchmarks.
Graph representation learning has recently been applied to a broad spectrum of problems ranging f... more Graph representation learning has recently been applied to a broad spectrum of problems ranging from computer graphics and chemistry to high energy physics and social media. The popularity of graph neural networks has sparked interest, both in academia and in industry, in developing methods that scale to very large graphs such as Facebook or Twitter social networks. In most of these approaches, the computational cost is alleviated by a sampling strategy retaining a subset of node neighbors or subgraphs at training time. In this paper we propose a new, efficient and scalable graph deep learning architecture which sidesteps the need for graph sampling by using graph convolutional filters of different size that are amenable to efficient precomputation, allowing extremely fast training and inference. Our architecture allows using different local graph operators (e.g. motif-induced adjacency matrices or Personalized Page Rank diffusion matrix) to best suit the task at hand. We conduct ex...
RecSysChallenge '21: Proceedings of the Recommender Systems Challenge 2021
Fourteenth ACM Conference on Recommender Systems
Word2vec is a powerful machine learning tool that emerged from Natural Language Processing (NLP) ... more Word2vec is a powerful machine learning tool that emerged from Natural Language Processing (NLP) and is now applied in multiple domains, including recommender systems, forecasting, and network analysis. As Word2vec is often used off the shelf, we address the question of whether the default hyperparameters are suitable for recommender systems. The answer is emphatically no. In this paper, we first elucidate the importance of hyperparameter optimization and show that unconstrained optimization yields an average 221% improvement in hit rate over the default parameters. However, unconstrained optimization leads to hyperparameter settings that are very expensive and not feasible for large scale recommendation tasks. To this end, we demonstrate 138% average improvement in hit rate with a runtime budget-constrained hyperparameter optimization. Furthermore, to make hyperparameter optimization applicable for large scale recommendation problems where the target dataset is too large to search over, we investigate generalizing hyperparameters settings from samples. We show that applying constrained hyperparameter optimization using only a 10% sample of the data still yields a 91% average improvement in hit rate over the default parameters when applied to the full datasets. Finally, we apply hyperparameters learned using our method of constrained optimization on a sample to the Who To Follow recommendation service at Twitter and are able to increase follow rates by 15%.
The workshop features presentations of accepted contributions to the RecSys Challenge 2021, organ... more The workshop features presentations of accepted contributions to the RecSys Challenge 2021, organized by Politecnico di Bari, ETH Zürich, Jönköping University, and the data set is provided by Twitter. The challenge focuses on a real-world task of tweet engagement prediction in a dynamic environment. For 2021, the challenge considers four different engagement types: Likes, Retweet, Quote, and replies. This year's challenge brings the problem even closer to Twitter's real recommender systems by introducing latency constraints. We also increases the data size to encourage novel methods. Also, the data density is increased in terms of the graph where users are considered to be nodes and interactions as edges. The goal is twofold: to predict the probability of different engagement types of a target user for a set of Tweets based on heterogeneous input data while providing fair recommendations. In fact, multi-goal optimization considering accuracy and fairness is particularly challenging. However, we believed that the recommendation community was nowadays mature enough to face the challenge of providing accurate and, at the same time, fair recommendations. To this end, Twitter has released a public dataset of close to 1 billion data points, > 40 million each day over 28 days. Week 1−3 will be used for training and week 4 for evaluation and testing. Each datapoint contains the tweet along with engagement features, user features, and tweet * Both authors contributed equally to this research.
arXiv (Cornell University), Jun 18, 2020
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to le... more Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions. These arise in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed for dealing with graphs that are dynamic in nature (e.g. evolving features or connectivity over time). We present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs significantly outperform previous approaches while being more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.
International Conference on Machine Learning, Jul 18, 2021
We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous... more We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, which achieve competitive results on many standard graph benchmarks.
After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that wa... more After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (˜1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dataset was synced with the Twitter platform: if a user deleted their content, the same content would be promptly removed from the dataset too. In this paper, we introduce the dataset and challenge, highlighting some of the issues that arise when creating recommender systems at Twitter scale.
arXiv (Cornell University), 2020
Graph representation learning has recently been applied to a broad spectrum of problems ranging f... more Graph representation learning has recently been applied to a broad spectrum of problems ranging from computer graphics and chemistry to high energy physics and social media. The popularity of graph neural networks has sparked interest, both in academia and in industry, in developing methods that scale to very large graphs such as Facebook or Twitter social networks. In most of these approaches, the computational cost is alleviated by a sampling strategy retaining a subset of node neighbors or subgraphs at training time. In this paper we propose a new, efficient and scalable graph deep learning architecture which sidesteps the need for graph sampling by using graph convolutional filters of different size that are amenable to efficient precomputation, allowing extremely fast training and inference. Our architecture allows using different local graph operators (e.g. motifinduced adjacency matrices or Personalized Page Rank diffusion matrix) to best suit the task at hand. We conduct extensive experimental evaluation on various open benchmarks and show that our approach is competitive with other state-ofthe-art architectures, while requiring a fraction of the training and inference time. Moreover, we obtain state-of-the-art results on ogbn-papers100M, the largest public graph dataset, with over 110 million nodes and 1.5 billion edges. * Equal contribution Preprint. Under review.
Data splits We report transductive node classification results in Tables 2 and 3 in the main pape... more Data splits We report transductive node classification results in Tables 2 and 3 in the main paper. To facilitate direct comparison with many previous works, Table 2 uses the Planetoid splits given by [8]. As discussed in e.g. [7], there are many limitations with results based on these fixed dataset splits and so we report a more robust set of experimental results in Table 3. Here we use 100 random splits with 20 random weight initializations. In each case 20 labelled nodes per class are used at training time, with the remaining labels split between test and validation. This methodology is consistent with [7].
Fourteenth ACM Conference on Recommender Systems, 2020
Word2vec is a powerful machine learning tool that emerged from Natural Language Processing (NLP) ... more Word2vec is a powerful machine learning tool that emerged from Natural Language Processing (NLP) and is now applied in multiple domains, including recommender systems, forecasting, and network analysis. As Word2vec is often used off the shelf, we address the question of whether the default hyperparameters are suitable for recommender systems. The answer is emphatically no. In this paper, we first elucidate the importance of hyperparameter optimization and show that unconstrained optimization yields an average 221% improvement in hit rate over the default parameters. However, unconstrained optimization leads to hyperparameter settings that are very expensive and not feasible for large scale recommendation tasks. To this end, we demonstrate 138% average improvement in hit rate with a runtime budget-constrained hyperparameter optimization. Furthermore, to make hyperparameter optimization applicable for large scale recommendation problems where the target dataset is too large to search over, we investigate generalizing hyperparameters settings from samples. We show that applying constrained hyperparameter optimization using only a 10% sample of the data still yields a 91% average improvement in hit rate over the default parameters when applied to the full datasets. Finally, we apply hyperparameters learned using our method of constrained optimization on a sample to the Who To Follow recommendation service at Twitter and are able to increase follow rates by 15%.
2019 IEEE International Conference on Data Mining (ICDM), 2019
Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the i... more Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the impact of new propositions and to experiment with many variants of existing products. However, until now, the question of measuring the measurer, or valuing the contribution of an E&M capability to organizational success has not been addressed. We tackle this problem by analyzing how, by decreasing estimation uncertainty, E&M platforms allow for better prioritization. We quantify this benefit in terms of expected relative improvement in the performance of all new propositions and provide guidance for how much an E&M capability is worth and when organizations should invest in one.
After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that wa... more After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (~ 1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dataset was synced with the Twitter platform: if a user deleted their content, the same content would be promptly removed from the dataset too. In this paper, we introduce the dataset and challenge, highlighting some of the issues that arise when creating recommender systems at Twitter scale.
Fifteenth ACM Conference on Recommender Systems, 2021
The workshop features presentations of accepted contributions to the RecSys Challenge 2021, organ... more The workshop features presentations of accepted contributions to the RecSys Challenge 2021, organized by Politecnico di Bari, ETH Zürich, Jönköping University, and the data set is provided by Twitter. The challenge focuses on a real-world task of tweet engagement prediction in a dynamic environment. For 2021, the challenge considers four different engagement types: Likes, Retweet, Quote, and replies. This year’s challenge brings the problem even closer to Twitter’s real recommender systems by introducing latency constraints. We also increases the data size to encourage novel methods. Also, the data density is increased in terms of the graph where users are considered to be nodes and interactions as edges. The goal is twofold: to predict the probability of different engagement types of a target user for a set of Tweets based on heterogeneous input data while providing fair recommendations. In fact, multi-goal optimization considering accuracy and fairness is particularly challenging....
ArXiv, 2020
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to le... more Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. W...
ArXiv, 2021
We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-E... more We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural networks and achieves state-of-the-art results on several benchmarks.
We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous... more We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, which achieve competitive results on many standard graph benchmarks.
Graph representation learning has recently been applied to a broad spectrum of problems ranging f... more Graph representation learning has recently been applied to a broad spectrum of problems ranging from computer graphics and chemistry to high energy physics and social media. The popularity of graph neural networks has sparked interest, both in academia and in industry, in developing methods that scale to very large graphs such as Facebook or Twitter social networks. In most of these approaches, the computational cost is alleviated by a sampling strategy retaining a subset of node neighbors or subgraphs at training time. In this paper we propose a new, efficient and scalable graph deep learning architecture which sidesteps the need for graph sampling by using graph convolutional filters of different size that are amenable to efficient precomputation, allowing extremely fast training and inference. Our architecture allows using different local graph operators (e.g. motif-induced adjacency matrices or Personalized Page Rank diffusion matrix) to best suit the task at hand. We conduct ex...
RecSysChallenge '21: Proceedings of the Recommender Systems Challenge 2021
Fourteenth ACM Conference on Recommender Systems
Word2vec is a powerful machine learning tool that emerged from Natural Language Processing (NLP) ... more Word2vec is a powerful machine learning tool that emerged from Natural Language Processing (NLP) and is now applied in multiple domains, including recommender systems, forecasting, and network analysis. As Word2vec is often used off the shelf, we address the question of whether the default hyperparameters are suitable for recommender systems. The answer is emphatically no. In this paper, we first elucidate the importance of hyperparameter optimization and show that unconstrained optimization yields an average 221% improvement in hit rate over the default parameters. However, unconstrained optimization leads to hyperparameter settings that are very expensive and not feasible for large scale recommendation tasks. To this end, we demonstrate 138% average improvement in hit rate with a runtime budget-constrained hyperparameter optimization. Furthermore, to make hyperparameter optimization applicable for large scale recommendation problems where the target dataset is too large to search over, we investigate generalizing hyperparameters settings from samples. We show that applying constrained hyperparameter optimization using only a 10% sample of the data still yields a 91% average improvement in hit rate over the default parameters when applied to the full datasets. Finally, we apply hyperparameters learned using our method of constrained optimization on a sample to the Who To Follow recommendation service at Twitter and are able to increase follow rates by 15%.