Compositional Stochastic Average Gradient for Machine Learning and Related Applications

References

Amari, S.I.: Backpropagation and stochastic gradient descent method. Neurocomputing 5(4–5), 185–196 (1993)
Article Google Scholar
Bishop, C.: Pattern Recognition and Machine Learning. Springer, New york (2006). https://doi.org/10.1007/978-1-4615-7566-5
Book MATH Google Scholar
Bottou, L.: Stochastic gradient learning in neural networks. Proc. Neuro-Nımes 91(8), 12 (1991)
Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT, pp. 177–186. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Chapter Google Scholar
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
MATH Google Scholar
Cauwenberghs, G.: A fast stochastic error-descent algorithm for supervised learning and optimization. In: Advances in Neural Information Processing Systems, pp. 244–251 (1993)
Google Scholar
Dai, B., He, N., Pan, Y., Boots, B., Song, L.: Learning from conditional distributions via dual embeddings. arXiv preprint arXiv:1607.04579 (2016)
Darken, C., Moody, J.: Fast adaptive K-means clustering: some empirical results. In: International Joint Conference on Neural Networks, pp. 233–238. IEEE (1990)
Google Scholar
Dentcheva, D., Penev, S., Ruszczyński, A.: Statistical estimation of composite risk functionals and risk optimization problems. Ann. Inst. Stat. Math. 69(4), 737–760 (2017)
Article MathSciNet Google Scholar
Ermoliev, Y.: Stochastic quasigradient methods. In: Ermoliev, Y., Wets, R.J.-B. (eds.) Numerical techniques for stochastic optimization, no. 10. Springer, Heidelberg (1988)
Google Scholar
Fagan, F., Iyengar, G.: Unbiased scalable softmax optimization. arXiv preprint arXiv:1803.08577 (2018)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5
Book MATH Google Scholar
Hu, J., Zhou, E., Fan, Q.: Model-based annealing random search with stochastic averaging. ACM Trans Model. Comput. Simul. 24(4), 21 (2014)
Article MathSciNet Google Scholar
Huo, Z., Gu, B., Huang, H.: Accelerated method for stochastic composition optimization with nonsmooth regularization. arXiv preprint arXiv:1711.03937 (2017)
Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Conference on Learning Theory, pp. 545–604 (2018)
Google Scholar
Jin, C., Kakade, S.M., Netrapalli, P.: Provable efficient online matrix completion via non-convex stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 4520–4528 (2016)
Google Scholar
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)
Google Scholar
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952)
Article MathSciNet Google Scholar
Le, Q.V., Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Ng, A.Y.: On optimization methods for deep learning. In: Proceedings of the 28th International Conference on Machine Learning, pp. 265–272. Omnipress (2011)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3
Chapter Google Scholar
Lian, X., Wang, M., Liu, J.: Finite-sum composition optimization via variance reduced gradient descent. In: Artificial Intelligence and Statistics, pp. 1159–1167 (2017)
Google Scholar
Lin, T., Fan, C., Wang, M., Jordan, M.I.: Improved oracle complexity for stochastic compositional variance reduced gradient. arXiv preprint arXiv:1806.00458 (2018)
Liu, L., Liu, J., Tao, D.: Variance reduced methods for non-convex composition optimization. arXiv preprint arXiv:1711.04416 (2017)
Mandt, S., Hoffman, M.D., Blei, D.M.: Stochastic gradient descent as approximate Bayesian inference. J. Mach. Learn. Res. 18(1), 4873–4907 (2017)
MathSciNet MATH Google Scholar
Needell, D., Ward, R., Srebro, N.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. In: Advances in Neural Information Processing Systems, pp. 1017–1025 (2014)
Google Scholar
Ravikumar, P., Lafferty, J., Liu, H., Wasserman, L.: Sparse additive models. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 71(5), 1009–1030 (2009)
Article MathSciNet Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. In: Lai, T.L., Siegmund, D. (eds.) Herbert Robbins Selected Papers, pp. 102–109. Springer, New York (1985)
Chapter Google Scholar
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)
Article MathSciNet Google Scholar
Shamir, O.: Convergence of stochastic gradient descent for PCA. In: International Conference on Machine Learning, pp. 257–265 (2016)
Google Scholar
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia (2009)
Book Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tan, C., Ma, S., Dai, Y.H., Qian, Y.: Barzilai-Borwein step size for stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 685–693 (2016)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Society. Ser. B (Methodol.) 267–288 (1996)
Google Scholar
Wang, L., Yang, Y., Min, R., Chakradhar, S.: Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Netw. 93, 219–229 (2017)
Article Google Scholar
Wang, M., Fang, E.X., Liu, H.: Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Math. Program. 161(1–2), 419–449 (2017)
Article MathSciNet Google Scholar
Wang, M., Liu, J., Fang, E.: Accelerating stochastic composition optimization. In: Advances in Neural Information Processing Systems, pp. 1714–1722 (2016)
Google Scholar
Yu, Y., Huang, L.: Fast stochastic variance reduced ADMM for stochastic composition optimization. arXiv preprint arXiv:1705.04138 (2017)
Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19–35 (2007)
Article MathSciNet Google Scholar
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st International Conference on Machine Learning, p. 116. ACM (2004)
Google Scholar
Zhao, S.Y., Li, W.J.: Fast asynchronous parallel stochastic gradient descent: a lock-free approach with convergence guarantee. In: AAAI, pp. 2379–2385 (2016)
Google Scholar
Zinkevich, M., Weimer, M., Li, L., Smola, A.J.: Parallelized stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 2595–2603 (2010)
Google Scholar

Download references

Compositional Stochastic Average Gradient for Machine Learning and Related Applications (original) (raw)

References