Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018) ArticleMathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011) MATH Google Scholar
Cauwenberghs, G.: A fast stochastic error-descent algorithm for supervised learning and optimization. In: Advances in Neural Information Processing Systems, pp. 244–251 (1993) Google Scholar
Darken, C., Moody, J.: Fast adaptive K-means clustering: some empirical results. In: International Joint Conference on Neural Networks, pp. 233–238. IEEE (1990) Google Scholar
Dentcheva, D., Penev, S., Ruszczyński, A.: Statistical estimation of composite risk functionals and risk optimization problems. Ann. Inst. Stat. Math. 69(4), 737–760 (2017) ArticleMathSciNet Google Scholar
Hu, J., Zhou, E., Fan, Q.: Model-based annealing random search with stochastic averaging. ACM Trans Model. Comput. Simul. 24(4), 21 (2014) ArticleMathSciNet Google Scholar
Huo, Z., Gu, B., Huang, H.: Accelerated method for stochastic composition optimization with nonsmooth regularization. arXiv preprint arXiv:1711.03937 (2017)
Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Conference on Learning Theory, pp. 545–604 (2018) Google Scholar
Jin, C., Kakade, S.M., Netrapalli, P.: Provable efficient online matrix completion via non-convex stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 4520–4528 (2016) Google Scholar
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013) Google Scholar
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952) ArticleMathSciNet Google Scholar
Le, Q.V., Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Ng, A.Y.: On optimization methods for deep learning. In: Proceedings of the 28th International Conference on Machine Learning, pp. 265–272. Omnipress (2011) Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015) Article Google Scholar
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3 Chapter Google Scholar
Lian, X., Wang, M., Liu, J.: Finite-sum composition optimization via variance reduced gradient descent. In: Artificial Intelligence and Statistics, pp. 1159–1167 (2017) Google Scholar
Mandt, S., Hoffman, M.D., Blei, D.M.: Stochastic gradient descent as approximate Bayesian inference. J. Mach. Learn. Res. 18(1), 4873–4907 (2017) MathSciNetMATH Google Scholar
Needell, D., Ward, R., Srebro, N.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. In: Advances in Neural Information Processing Systems, pp. 1017–1025 (2014) Google Scholar
Ravikumar, P., Lafferty, J., Liu, H., Wasserman, L.: Sparse additive models. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 71(5), 1009–1030 (2009) ArticleMathSciNet Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. In: Lai, T.L., Siegmund, D. (eds.) Herbert Robbins Selected Papers, pp. 102–109. Springer, New York (1985) Chapter Google Scholar
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017) ArticleMathSciNet Google Scholar
Shamir, O.: Convergence of stochastic gradient descent for PCA. In: International Conference on Machine Learning, pp. 257–265 (2016) Google Scholar
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia (2009) Book Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) Google Scholar
Tan, C., Ma, S., Dai, Y.H., Qian, Y.: Barzilai-Borwein step size for stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 685–693 (2016) Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Society. Ser. B (Methodol.) 267–288 (1996) Google Scholar
Wang, L., Yang, Y., Min, R., Chakradhar, S.: Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Netw. 93, 219–229 (2017) Article Google Scholar
Wang, M., Fang, E.X., Liu, H.: Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Math. Program. 161(1–2), 419–449 (2017) ArticleMathSciNet Google Scholar
Wang, M., Liu, J., Fang, E.: Accelerating stochastic composition optimization. In: Advances in Neural Information Processing Systems, pp. 1714–1722 (2016) Google Scholar
Yu, Y., Huang, L.: Fast stochastic variance reduced ADMM for stochastic composition optimization. arXiv preprint arXiv:1705.04138 (2017)
Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19–35 (2007) ArticleMathSciNet Google Scholar
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st International Conference on Machine Learning, p. 116. ACM (2004) Google Scholar
Zhao, S.Y., Li, W.J.: Fast asynchronous parallel stochastic gradient descent: a lock-free approach with convergence guarantee. In: AAAI, pp. 2379–2385 (2016) Google Scholar
Zinkevich, M., Weimer, M., Li, L., Smola, A.J.: Parallelized stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 2595–2603 (2010) Google Scholar