XGBoost | Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (original) (raw)
XGBoost: A Scalable Tree Boosting System
Published: 13 August 2016 Publication History
Abstract
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Supplementary Material
MP4 File (kdd2016_chen_boosting_system_01-acm.mp4)
- Download
- 396.08 MB
References
[1]
R. Bekkerman. The present and the future of the kdd cup competition: an outsider's perspective.
[2]
R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011.
[3]
J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3--6, New York, Aug. 2007.
[4]
L. Breiman. Random forests. Maching Learning, 45(1):5--32, Oct. 2001.
[5]
C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23--581, 2010.
[6]
O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1--24, 2011.
[7]
T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML'13), volume 1, pages 436--444, 2013.
[8]
T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS'15), volume 1, 2015.
[9]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008.
[10]
J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189--1232, 2001.
[11]
J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367--378, 2002.
[12]
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337--407, 2000.
[13]
J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.
[14]
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58--66, 2001.
[15]
X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD'14, 2014.
[16]
P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI'10), pages 302--311, 2010.
[17]
P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897--904. 2008.
[18]
X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1--7, 2016.
[19]
B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426--1437, Aug. 2009.
[20]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[21]
G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.
[22]
S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387--396. ACM, 2011.
[23]
J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM '09.
[24]
Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.
[25]
T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.
Information & Contributors
Information
Published In
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
Copyright © 2016 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 13 August 2016
Permissions
Request permissions for this article.
Check for updates
Author Tag
Qualifiers
- Research-article
Funding Sources
Conference
Acceptance Rates
KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- View Citations
- Downloads (Last 12 months)44,726
- Downloads (Last 6 weeks)5,986
Reflects downloads up to 14 Oct 2024
Other Metrics
Citations
- Barkhordari M(2025)Enhancing Cold Joint Shear Strength Prediction in Concrete Structures: Novel Approach with Ensemble Spiking Neural NetworksJournal of Structural Design and Construction Practice10.1061/JSDCCC.SCENG-161130:1Online publication date: Feb-2025
- Yin ZLi WLi CZheng Y(2025)The relationship between accessibility and land prices: A focus on accessibility to transit in the 15-min cityTravel Behaviour and Society10.1016/j.tbs.2024.10091438(100914)Online publication date: Jan-2025
- Li YSanmiquel LZhang ZZhao GBascompta M(2025)Discovering the underground coal mining accident patterns in Spain from 2003 to 2021: Insights through machine learning techniquesSafety Science10.1016/j.ssci.2024.106677181(106677)Online publication date: Jan-2025
- Wu SLu WYin XYang R(2025)Robust watermarking against arbitrary scaling and cropping attacksSignal Processing10.1016/j.sigpro.2024.109655226(109655)Online publication date: Jan-2025
- Bilancia PLocatelli ATutarini AMucciarini MIori MPellicciari M(2025)Online motion accuracy compensation of industrial servomechanisms using machine learning approachesRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2024.10283891(102838)Online publication date: Feb-2025
- Hassan APaik JKhare SHassan S(2025)A wrapper feature selection approach using Markov blanketsPattern Recognition10.1016/j.patcog.2024.111069158(111069)Online publication date: Feb-2025
- Guo AZhao WDing PTang PZeng X(2025)Machine learning approach in multi-channel fiber-optic SPR sensorsOptics & Laser Technology10.1016/j.optlastec.2024.111618181(111618)Online publication date: Feb-2025
- Salgado WDam RPuertas ESalgado C(2025)Predicting scale thickness in three-phase flow using neutron activation analysis and deep learningMeasurement10.1016/j.measurement.2024.115880242(115880)Online publication date: Jan-2025
- Nguyen VLe TNguyen AHoang XNguyen NNguyen N(2025)Optimization of milling conditions for AISI 4140 steel using an integrated machine learning-multi objective optimization-multi criteria decision making frameworkMeasurement10.1016/j.measurement.2024.115837242(115837)Online publication date: Jan-2025
- Ma THu XLiu HPeng KLin YChen YLuo KXie SHan CChen M(2025)Elastic modulus prediction for high-temperature treated rock using multi-step hybrid ensemble model combined with coronavirus herd immunity optimizerMeasurement10.1016/j.measurement.2024.115596240(115596)Online publication date: Jan-2025
- Show More Cited By
View Options
View options
View or Download as a PDF file.
eReader
View online with eReader.
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Full Access
Media
Figures
Other
Tables
Affiliations
Tianqi Chen
University of Washington, Seattle, WA, USA
Carlos Guestrin
University of Washington, Seattle, WA, USA