Structure-aware methods for expensive derivative-free nonsmooth composite optimization (original) (raw)
Abstract
We present new methods for solving a broad class of bound-constrained nonsmooth composite minimization problems. These methods are specially designed for objectives that are some known mapping of outputs from a computationally expensive function. We provide accompanying implementations of these methods: in particular, a novel manifold sampling algorithm (MS-P) with subproblems that are in a sense primal versions of the dual problems solved by previous manifold sampling methods and a method (GOOMBAH) that employs more difficult optimization subproblems. For these two methods, we provide rigorous convergence analysis and guarantees. We demonstrate extensive testing of these methods. Open-source implementations of the methods developed in this manuscript can be found at https://github.com/POptUS/IBCDFO/.
Access this article
Subscribe and save
- Starting from 10 chapters or articles per month
- Access and download chapters and articles from more than 300k books and 2,500 journals
- Cancel anytime View plans
Buy Now
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Instant access to the full article PDF.
Similar content being viewed by others
Data availability
Matlab scripts to reproduce all experiments and figures in this manuscript are available at https://github.com/POptUS/IBCDFO.
Code Availability
Software implementing the algorithms and numerical tests in this paper is available online at https://github.com/POptUS/IBCDFO/.
Notes
- For practical purposes, one should attempt to define/construct \({\mathfrak {H}}\) such that \({\mathfrak {H}}\) contains only functions that are essentially active somewhere in the domain of h, \({{\textbf{i}}}{{\textbf{m}}}_F\left( L_{\textrm{max}}\cap \varOmega \right) \).
- For later reference, observe that in the special case of \({\mathbb {G}}^k\) where \(c_1=c_2=0\), a null step \(s=0\) in (5) implies that \(v=f(x^k)\), which in turn implies the objective value of (5) is 0.
- The necessary conditions on approximate solution quality are given in Assumption 5
References
- Audet, C., Hare, W.L.: Derivative-Free and Blackbox Optimization. Springer, Berlin (2017). https://doi.org/10.1007/978-3-319-68913-5
Book Google Scholar - Audet, C., Hare, W.L.: Model-based methods in derivative-free nonsmooth optimization. In: A.M. Bagirov, M. Gaudioso, N. Karmitsa, M. Mäkelä (eds.) Numerical Nonsmooth Optimization, pp. 655–691. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-34910-3_19
- Bagirov, A.M., Ganjehlou, A.N.: An approximate subgradient algorithm for unconstrained nonsmooth, nonconvex optimization. Math. Methods Oper. Res. 67(2), 187–206 (2008). https://doi.org/10.1007/s00186-007-0186-5
Article MathSciNet Google Scholar - Bagirov, A.M., Jin, L., Karmitsa, N., Nuaimat, A.A., Sultanova, N.: Subgradient method for nonconvex nonsmooth optimization. J. Optim. Theory Appl. 157(2), 416–435 (2013). https://doi.org/10.1007/s10957-012-0167-6
Article MathSciNet Google Scholar - Bagirov, A.M., Karasözen, B., Sezer, M.: Discrete gradient method: Derivative-free method for nonsmooth optimization. J. Optim. Theory Appl. 137(2), 317–334 (2007). https://doi.org/10.1007/s10957-007-9335-5
Article MathSciNet Google Scholar - Bareilles, G., Iutzeler, F., Malick, J.: Harnessing structure in composite nonsmooth minimization (2022). https://doi.org/10.48550/arxiv.2206.15053
- Brown, J., He, Y., MacLachlan, S., Menickelly, M., Wild, S.M.: Tuning multigrid methods with robust optimization and local Fourier analysis. SIAM J. Sci. Comput. 43(1), A109–A138 (2021). https://doi.org/10.1137/19m1308669
Article MathSciNet Google Scholar - Burke, J.V., Curtis, F.E., Lewis, A.S., Overton, M.L., Simões, L.E.A.: Gradient sampling methods for nonsmooth optimization. In: A.M. Bagirov, M. Gaudioso, N. Karmitsa, M. Mäkelä (eds.) Numerical Nonsmooth Optimization, pp. 201–225. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-34910-3_6
- Burke, J.V., Lewis, A.S., Overton, M.L.: Approximating subdifferentials by random sampling of gradients. Math. Oper. Res. 27, 567–584 (2002). https://doi.org/10.1287/moor.27.3.567.317
Article MathSciNet Google Scholar - Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005). https://doi.org/10.1137/030601296
Article MathSciNet Google Scholar - Chen, Y., Jarry-Bolduc, G., Hare, W.L.: Error analysis of surrogate models constructed through operations on sub-models (2021). https://doi.org/10.48550/ARXIV.2112.08411
- Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-region methods. SIAM, Philadelphia (2000). https://doi.org/10.1137/1.9780898719857
Book Google Scholar - Conn, A.R., Scheinberg, K., Vicente, L.N.: Global convergence of general derivative-free trust-region algorithms to first and second order critical points. SIAM J. Optim. 20(1), 387–415 (2009). https://doi.org/10.1137/060673424
Article MathSciNet Google Scholar - Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. SIAM, Philadelphia (2009). https://doi.org/10.1137/1.9780898718768
Book Google Scholar - Curtis, F.E., Que, X.: An adaptive gradient sampling algorithm for non-smooth optimization. Optim. Methods Softw. 28(6), 1302–1324 (2013). https://doi.org/10.1080/10556788.2012.714781
Article MathSciNet Google Scholar - Drud, A.S.: CONOPT—a large-scale GRG code. ORSA J. Comput. 6, 207–216 (1994). https://doi.org/10.1287/ijoc.6.2.207
Article Google Scholar - Eldred, J.S., Larson, J., Padidar, M., Stern, E., Wild, S.M.: Derivative-free optimization of a rapid-cycling synchrotron. Optim. Eng. (2022). https://doi.org/10.1007/s11081-022-09733-4
Article Google Scholar - Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2003). https://doi.org/10.1007/b97543
Book Google Scholar - Fletcher, R.: A model algorithm for composite nondifferentiable optimization problems. In: D.C. Sorensen, R.J.B. Wets (eds.) Nondifferential and Variational Techniques in Optimization, Mathematical Programming Studies, vol. 17, pp. 67–76. Springer, Berlin (1982). https://doi.org/10.1007/BFb0120959
- Fletcher, R.: Second order corrections for non-differentiable optimization. In: Numerical Analysis, pp. 85–114. Springer, Berlin (1982). https://doi.org/10.1007/bfb0093151
- GAMS Development Corporation: General Algebraic Modeling System Release 38.3 (2022). http://www.gams.com
- Garmanjani, R., Júdice, D., Vicente, L.N.: Trust-region methods without using derivatives: Worst case complexity and the nonsmooth case. SIAM J. Optim. 26(4), 1987–2011 (2016). https://doi.org/10.1137/151005683
Article MathSciNet Google Scholar - Grapiglia, G.N., Yuan, J., Yuan, Y.: A derivative-free trust-region algorithm for composite nonsmooth optimization. Comput. Appl. Math. 35(2), 475–499 (2016). https://doi.org/10.1007/s40314-014-0201-4
Article MathSciNet Google Scholar - Hare, W.L., Planiden, C., Sagastizábal, C.: A derivative-free \(\cal{VU} \)-algorithm for convex finite-max problems. Optim. Methods Softw. 35(3), 521–559 (2020). https://doi.org/10.1080/10556788.2019.1668944
Article MathSciNet Google Scholar - Hare, W.L., Sagastizábal, C.: A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim. 20(5), 2442–2473 (2010). https://doi.org/10.1137/090754595
Article MathSciNet Google Scholar - Hare, W.L., Sagastizábal, C., Solodov, M.: A proximal bundle method for nonsmooth nonconvex functions with inexact information. Comput. Optim. Appl. 63(1), 1–28 (2016). https://doi.org/10.1007/s10589-015-9762-4
Article MathSciNet Google Scholar - Karmitsa, N., Bagirov, A.M.: Limited memory discrete gradient bundle method for nonsmooth derivative-free optimization. Optimization 61(12), 1491–1509 (2012). https://doi.org/10.1080/02331934.2012.687736
Article MathSciNet Google Scholar - Khan, K.A., Larson, J., Wild, S.M.: Manifold sampling for optimization of nonconvex functions that are piecewise linear compositions of smooth components. SIAM J. Optim. 28(4), 3001–3024 (2018). https://doi.org/10.1137/17m114741x
Article MathSciNet Google Scholar - Kiwiel, K.C.: Restricted step and Levenberg–Marquardt techniques in proximal bundle methods for nonconvex nondifferentiable optimization. SIAM J. Optim. 6(1), 227–249 (1996). https://doi.org/10.1137/0806013
Article MathSciNet Google Scholar - Kiwiel, K.C.: Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 18(2), 379–388 (2007). https://doi.org/10.1137/050639673
Article MathSciNet Google Scholar - Kiwiel, K.C.: A nonderivative version of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 20(4), 1983–1994 (2010). https://doi.org/10.1137/090748408
Article MathSciNet Google Scholar - Larson, J., Menickelly, M., Wild, S.M.: Manifold sampling for \(\ell _1\) nonconvex optimization. SIAM J. Optim. 26(4), 2540–2563 (2016). https://doi.org/10.1137/15M1042097
Article MathSciNet Google Scholar - Larson, J., Menickelly, M., Zhou, B.: Manifold sampling for optimizing nonsmooth nonconvex compositions. SIAM J. Optim. 31(4), 2638–2664 (2021). https://doi.org/10.1137/20M1378089
Article MathSciNet Google Scholar - Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-Newton methods. Math. Program. 141(1), 135–163 (2013). https://doi.org/10.1007/s10107-012-0514-2
Article MathSciNet Google Scholar - Liuzzi, G., Lucidi, S., Rinaldi, F., Vicente, L.N.: Trust-region methods for the derivative-free optimization of nonsmooth black-box functions. SIAM J. Optim. 29(4), 3012–3035 (2019). https://doi.org/10.1137/19m125772x
Article MathSciNet Google Scholar - Mäkelä, M.: Survey of bundle methods for nonsmooth optimization. Optim. Methods Softw. 17(1), 1–29 (2002). https://doi.org/10.1080/10556780290027828
Article ADS MathSciNet Google Scholar - Menickelly, M., Wild, S.M.: Robust learning of trimmed estimators via manifold sampling. In: Modern Trends in Nonconvex Optimization for Machine Learning-ICML 2018 Workshop (2018). https://sites.google.com/view/icml2018nonconvex/papers
- Menickelly, M., Wild, S.M.: Derivative-free robust optimization by outer approximations. Math. Program. 179(1–2), 157–193 (2020). https://doi.org/10.1007/s10107-018-1326-9
Article MathSciNet Google Scholar - Moré, J.J., Wild, S.M.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20(1), 172–191 (2009). https://doi.org/10.1137/080724083
Article MathSciNet Google Scholar - Murtagh, B., Saunders, M.: MINOS 5.6. Tech. rep., Department of Operations Research, Stanford University (2016)
- Riis, E.S., Ehrhardt, M.J., Quispel, G.R.W., Schönlieb, C.B.: A geometric integration approach to nonsmooth, nonconvex optimisation. Found. Comput. Math. (2021). https://doi.org/10.1007/s10208-020-09489-2
Article Google Scholar - Rockafellar, R., Wets, R.J.B.: Variational Analysis, vol. 317. Springer, Berlin (2009). https://doi.org/10.1007/978-3-642-02431-3
Book Google Scholar - Sagastizábal, C.: Composite proximal bundle method. Math. Program. 140(1), 189–233 (2013). https://doi.org/10.1007/s10107-012-0600-5
Article MathSciNet Google Scholar - Sahinidis, N.V.: BARON 22.3.21: Global Optimization of Mixed-Integer Nonlinear Programs (2022)
- Schichl, H., Fendl, H.: A second order bundle algorithm for nonsmooth, nonconvex optimization problems. In: Bagirov, A.M., Gaudioso, M., Karmitsa, N., Mäkelä, M.M., Taheri, S. (eds.) Numerical Nonsmooth Optimization, pp. 117–165. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-34910-3_4
Chapter Google Scholar - Scholtes, S.: Introduction to Piecewise Differentiable Equations. Springer, Berlin (2012). https://doi.org/10.1007/978-1-4614-4340-7
Book Google Scholar - Waltz, R., Nocedal, J.: KNITRO: A Package for Nonlinear Optimization. Manual (2002)
- Wiedemann, H.: Particle Accelerator Physics. Springer, Berlin (2015). https://doi.org/10.1007/978-3-319-18317-6
Book Google Scholar - Wild, S.M.: Solving derivative-free nonlinear least squares problems with POUNDERS. In: Terlaky, T., Anjos, M.F., Ahmed, S. (eds.) Advances and Trends in Optimization with Engineering Applications, pp. 529–540. SIAM, Philadelphia (2017). https://doi.org/10.1137/1.9781611974683.ch40
Chapter Google Scholar - Womersley, R., Fletcher, R.: An algorithm for composite nonsmooth optimization problems. J. Optim. Theory Appl. 48(3), 493–523 (1986). https://doi.org/10.1007/bf00940574
Article MathSciNet Google Scholar - Womersley, R.S.: Censored discrete linear \(l_1\) approximation. SIAM J. Sci. Stat. Comput. 7(1), 105–122 (1986). https://doi.org/10.1137/0907008
Article MathSciNet Google Scholar - Yuan, Y.: Conditions for convergence of trust region algorithms for nonsmooth optimization. Math. Program. 31(2), 220–228 (1985). https://doi.org/10.1007/bf02591750
Article MathSciNet Google Scholar - Yuan, Yx.: On the superlinear convergence of a trust region algorithm for nonsmooth optimization. Math. Program. 31(3), 269–285 (1985). https://doi.org/10.1007/bf02591949
Article MathSciNet Google Scholar
Acknowledgements
We thank Geovani Nunes Grapiglia for initial discussions of convergence analysis results.
Funding
This work was supported in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research and Office of High-Energy Physics, Scientific Discovery through Advanced Computing (SciDAC) Program through the FASTMath Institute and the CAMPA Project under Contract No. DE-AC02-06CH11357.
Author information
Author notes
- Jeffrey Larson and Matt Menickelly have contributed equally to this work.
Authors and Affiliations
- Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL, 60439, USA
Jeffrey Larson & Matt Menickelly
Authors
- Jeffrey Larson
- Matt Menickelly
Corresponding author
Correspondence toJeffrey Larson.
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Table of notation
\({\mathbb {A}}\!\left( z \right) \):
Set of indices of essentially active functions at a point z
\({\mathcal {B}}\):
Euclidean ball
D:
Sampled gradients of f used in numerical tests
F:
Expensive inner function, with components \(F_i\)
G:
Matrix with columns of vectors from \(g_j^k\)
\({\mathbb {G}}\):
Set of indices used to generate G
H:
Model Hessian \(H^k\)
\({\mathfrak {H}}\):
Set of selection functions
\(I_n\):
The identity matrix for \({\mathcal {R}}^n\)
K:
Used for Lipschitz constants, (e.g., \(K_h, K_{\nabla \! F}, K_{\nabla \! F_i}\) )
\({\mathbb {K}}\):
Special sets of iterates
\(L\):
Level set
\(L_{\textrm{max}}\):
Level set plus \(\varDelta _{\textrm{max}}\) padding
\({\mathcal {L}}\):
The Lagrangian function
\({\mathbb {L}}_{\infty }\):
\(\{i\in 1,\dots ,n: \ell _i = -\infty \}\)
M:
Vector mapping of the n models \(m^{F_i}\)
\({\mathbb {N}}\):
The set of Integers
Q:
\(Q_i\) is used to define quadratics in test functions
\({\mathcal {R}}\):
The set of real numbers
S:
Set of points \(S^t\) sampled around each x evaluated by a method
\({\mathbb {U}}_{\infty }\):
\(\triangleq \{i\in 1,\dots ,n: u_i = \infty \}\)
Y:
A collection of points y from the domain of F that have been evaluated
a:
The value \( [a^k]_j \triangleq f(x^k) - f_j(x^k) + \beta _{j,k}\)
b:
\(b_i\) is used to define quadratics in test functions
c:
\(c_i\) are the censors for the censored-L1 loss function, Also \(c_1,c_2\) are algorithmic constants
d:
\(d_i\) are the data for the censored-L1 loss function
e:
Vector of all ones
f:
Composite objective function \(f \triangleq h \circ F\). Also, sometimes \(f_j \triangleq h_j \circ F\)
g:
The generators, \(g_j^k = \left[ \nabla m^F_k\right] \nabla h_j(F(x^k))\)
h:
Nonsmooth, outer piecewise-selection function
\(h_j\):
Smooth selection functions defining h
i:
General index
j:
General index
k:
Iteration of the algorithm
\(\ell \):
Lower bounds
\(m^{F_i}\):
A model of \(F_i\)
n:
Dimension of domain of F (and f)
p:
Dimension of domain of h
s:
Trust-region subproblem step, \(s^*\) or \(s^k\) or \({\tilde{s}}^k\)
t:
Index for points evaluated by methods (not necessarily the iterate k)
\(u\):
Upper bound on domain
v:
Primal variables for the problem
x:
Points in the domain of F
y:
Points in the domain of F
z:
Points in the domain of h
\(\beta \):
A nonnegative offset added to affine functions in primal model
\(\gamma _{\textrm{d}}\):
Trust-region decrease factor
\(\gamma _{\textrm{i}}\):
Trust-region increase factor
\(\varDelta \):
Trust region radius
\(\varDelta _{\textrm{max}}\):
Upper bound on trust region radius
\(\eta \):
Algorithmic acceptability tolerances
\(\kappa \):
Bounds on errors (between models/functions, fraction of Cauchy decrease) \(\kappa _{i, \textrm{eg}}, \kappa _{\textrm{g}}, \kappa _{\textrm{H}}, \kappa _{\textrm{fcd}}\)
\(\lambda \):
Dual variables (\(\lambda _a, \lambda _{\ell }, \lambda _u\))
\(\pi \):
Used to denote projection problem solution
\(\rho \):
Ratio of actual-versus-predicted decrease
\(\sigma \):
A mapping between two convex sets
\(\tau \):
Tolerance used for data profiles
\(\upchi \):
Stationary measure
\(\varOmega \):
Domain of test problems, either \({\mathcal {R}}^n\) or \([\ell ,u]\)
\(\partial _{\textrm{C}}\):
Clarke subdifferential Operations:
\({{\textbf{c}}}{{\textbf{l}}}\left( {\mathcal {S}} \right) \):
Closure of a set \({\mathcal {S}}\)
\(\textbf{int}\left( {\mathcal {S}} \right) \):
Interior of a set \({\mathcal {S}}\)
\({{\textbf{c}}}{{\textbf{o}}}\left( {\mathcal {S}}\right) \):
Convex hull of a set \({\mathcal {S}}\)
\(\textbf{proj}\left( 0 , {\mathcal {S}} \right) \):
Projection of zero onto a set \({\mathcal {S}}\)
\({{\textbf{i}}}{{\textbf{m}}}_F\left( {\mathcal {S}} \right) \):
Image of set \({\mathcal {S}}\) under F
Additional data profiles
Fig. 3
Data profiles for subproblems with the pointwise-minimum-squared function \(h_1\), unconstrained and bound-constrained, for three values of \(\tau \)
Fig. 4
Data profiles for subproblems with the pointwise-maximum-squared function \(h_2\), unconstrained and bound-constrained, for three values of \(\tau \)
Fig. 5
Data profiles for subproblems with the censored-L1-loss function \(h_3\), unconstrained and bound-constrained, for three values of \(\tau \)
Fig. 6
Data profiles for subproblem with the piecewise-quadratic function \(h_4\), unconstrained and bound-constrained, for three values of \(\tau \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Larson, J., Menickelly, M. Structure-aware methods for expensive derivative-free nonsmooth composite optimization.Math. Prog. Comp. 16, 1–36 (2024). https://doi.org/10.1007/s12532-023-00245-5
- Received: 15 July 2022
- Accepted: 06 June 2023
- Published: 19 August 2023
- Version of record: 19 August 2023
- Issue date: March 2024
- DOI: https://doi.org/10.1007/s12532-023-00245-5