Structure-aware methods for expensive derivative-free nonsmooth composite optimization (original) (raw)

Abstract

We present new methods for solving a broad class of bound-constrained nonsmooth composite minimization problems. These methods are specially designed for objectives that are some known mapping of outputs from a computationally expensive function. We provide accompanying implementations of these methods: in particular, a novel manifold sampling algorithm (MS-P) with subproblems that are in a sense primal versions of the dual problems solved by previous manifold sampling methods and a method (GOOMBAH) that employs more difficult optimization subproblems. For these two methods, we provide rigorous convergence analysis and guarantees. We demonstrate extensive testing of these methods. Open-source implementations of the methods developed in this manuscript can be found at https://github.com/POptUS/IBCDFO/.

Access this article

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Data availability

Matlab scripts to reproduce all experiments and figures in this manuscript are available at https://github.com/POptUS/IBCDFO.

Code Availability

Software implementing the algorithms and numerical tests in this paper is available online at https://github.com/POptUS/IBCDFO/.

Notes

For practical purposes, one should attempt to define/construct \({\mathfrak {H}}\) such that \({\mathfrak {H}}\) contains only functions that are essentially active somewhere in the domain of h, \({{\textbf{i}}}{{\textbf{m}}}_F\left( L_{\textrm{max}}\cap \varOmega \right) \).
For later reference, observe that in the special case of \({\mathbb {G}}^k\) where \(c_1=c_2=0\), a null step \(s=0\) in (5) implies that \(v=f(x^k)\), which in turn implies the objective value of (5) is 0.
The necessary conditions on approximate solution quality are given in Assumption 5

References

Audet, C., Hare, W.L.: Derivative-Free and Blackbox Optimization. Springer, Berlin (2017). https://doi.org/10.1007/978-3-319-68913-5
Book Google Scholar
Audet, C., Hare, W.L.: Model-based methods in derivative-free nonsmooth optimization. In: A.M. Bagirov, M. Gaudioso, N. Karmitsa, M. Mäkelä (eds.) Numerical Nonsmooth Optimization, pp. 655–691. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-34910-3_19
Bagirov, A.M., Ganjehlou, A.N.: An approximate subgradient algorithm for unconstrained nonsmooth, nonconvex optimization. Math. Methods Oper. Res. 67(2), 187–206 (2008). https://doi.org/10.1007/s00186-007-0186-5
Article MathSciNet Google Scholar
Bagirov, A.M., Jin, L., Karmitsa, N., Nuaimat, A.A., Sultanova, N.: Subgradient method for nonconvex nonsmooth optimization. J. Optim. Theory Appl. 157(2), 416–435 (2013). https://doi.org/10.1007/s10957-012-0167-6
Article MathSciNet Google Scholar
Bagirov, A.M., Karasözen, B., Sezer, M.: Discrete gradient method: Derivative-free method for nonsmooth optimization. J. Optim. Theory Appl. 137(2), 317–334 (2007). https://doi.org/10.1007/s10957-007-9335-5
Article MathSciNet Google Scholar
Bareilles, G., Iutzeler, F., Malick, J.: Harnessing structure in composite nonsmooth minimization (2022). https://doi.org/10.48550/arxiv.2206.15053
Brown, J., He, Y., MacLachlan, S., Menickelly, M., Wild, S.M.: Tuning multigrid methods with robust optimization and local Fourier analysis. SIAM J. Sci. Comput. 43(1), A109–A138 (2021). https://doi.org/10.1137/19m1308669
Article MathSciNet Google Scholar
Burke, J.V., Curtis, F.E., Lewis, A.S., Overton, M.L., Simões, L.E.A.: Gradient sampling methods for nonsmooth optimization. In: A.M. Bagirov, M. Gaudioso, N. Karmitsa, M. Mäkelä (eds.) Numerical Nonsmooth Optimization, pp. 201–225. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-34910-3_6
Burke, J.V., Lewis, A.S., Overton, M.L.: Approximating subdifferentials by random sampling of gradients. Math. Oper. Res. 27, 567–584 (2002). https://doi.org/10.1287/moor.27.3.567.317
Article MathSciNet Google Scholar
Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005). https://doi.org/10.1137/030601296
Article MathSciNet Google Scholar
Chen, Y., Jarry-Bolduc, G., Hare, W.L.: Error analysis of surrogate models constructed through operations on sub-models (2021). https://doi.org/10.48550/ARXIV.2112.08411
Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-region methods. SIAM, Philadelphia (2000). https://doi.org/10.1137/1.9780898719857
Book Google Scholar
Conn, A.R., Scheinberg, K., Vicente, L.N.: Global convergence of general derivative-free trust-region algorithms to first and second order critical points. SIAM J. Optim. 20(1), 387–415 (2009). https://doi.org/10.1137/060673424
Article MathSciNet Google Scholar
Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. SIAM, Philadelphia (2009). https://doi.org/10.1137/1.9780898718768
Book Google Scholar
Curtis, F.E., Que, X.: An adaptive gradient sampling algorithm for non-smooth optimization. Optim. Methods Softw. 28(6), 1302–1324 (2013). https://doi.org/10.1080/10556788.2012.714781
Article MathSciNet Google Scholar
Drud, A.S.: CONOPT—a large-scale GRG code. ORSA J. Comput. 6, 207–216 (1994). https://doi.org/10.1287/ijoc.6.2.207
Article Google Scholar
Eldred, J.S., Larson, J., Padidar, M., Stern, E., Wild, S.M.: Derivative-free optimization of a rapid-cycling synchrotron. Optim. Eng. (2022). https://doi.org/10.1007/s11081-022-09733-4
Article Google Scholar
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2003). https://doi.org/10.1007/b97543
Book Google Scholar
Fletcher, R.: A model algorithm for composite nondifferentiable optimization problems. In: D.C. Sorensen, R.J.B. Wets (eds.) Nondifferential and Variational Techniques in Optimization, Mathematical Programming Studies, vol. 17, pp. 67–76. Springer, Berlin (1982). https://doi.org/10.1007/BFb0120959
Fletcher, R.: Second order corrections for non-differentiable optimization. In: Numerical Analysis, pp. 85–114. Springer, Berlin (1982). https://doi.org/10.1007/bfb0093151
GAMS Development Corporation: General Algebraic Modeling System Release 38.3 (2022). http://www.gams.com
Garmanjani, R., Júdice, D., Vicente, L.N.: Trust-region methods without using derivatives: Worst case complexity and the nonsmooth case. SIAM J. Optim. 26(4), 1987–2011 (2016). https://doi.org/10.1137/151005683
Article MathSciNet Google Scholar
Grapiglia, G.N., Yuan, J., Yuan, Y.: A derivative-free trust-region algorithm for composite nonsmooth optimization. Comput. Appl. Math. 35(2), 475–499 (2016). https://doi.org/10.1007/s40314-014-0201-4
Article MathSciNet Google Scholar
Hare, W.L., Planiden, C., Sagastizábal, C.: A derivative-free \(\cal{VU} \)-algorithm for convex finite-max problems. Optim. Methods Softw. 35(3), 521–559 (2020). https://doi.org/10.1080/10556788.2019.1668944
Article MathSciNet Google Scholar
Hare, W.L., Sagastizábal, C.: A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim. 20(5), 2442–2473 (2010). https://doi.org/10.1137/090754595
Article MathSciNet Google Scholar
Hare, W.L., Sagastizábal, C., Solodov, M.: A proximal bundle method for nonsmooth nonconvex functions with inexact information. Comput. Optim. Appl. 63(1), 1–28 (2016). https://doi.org/10.1007/s10589-015-9762-4
Article MathSciNet Google Scholar
Karmitsa, N., Bagirov, A.M.: Limited memory discrete gradient bundle method for nonsmooth derivative-free optimization. Optimization 61(12), 1491–1509 (2012). https://doi.org/10.1080/02331934.2012.687736
Article MathSciNet Google Scholar
Khan, K.A., Larson, J., Wild, S.M.: Manifold sampling for optimization of nonconvex functions that are piecewise linear compositions of smooth components. SIAM J. Optim. 28(4), 3001–3024 (2018). https://doi.org/10.1137/17m114741x
Article MathSciNet Google Scholar
Kiwiel, K.C.: Restricted step and Levenberg–Marquardt techniques in proximal bundle methods for nonconvex nondifferentiable optimization. SIAM J. Optim. 6(1), 227–249 (1996). https://doi.org/10.1137/0806013
Article MathSciNet Google Scholar
Kiwiel, K.C.: Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 18(2), 379–388 (2007). https://doi.org/10.1137/050639673
Article MathSciNet Google Scholar
Kiwiel, K.C.: A nonderivative version of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 20(4), 1983–1994 (2010). https://doi.org/10.1137/090748408
Article MathSciNet Google Scholar
Larson, J., Menickelly, M., Wild, S.M.: Manifold sampling for \(\ell _1\) nonconvex optimization. SIAM J. Optim. 26(4), 2540–2563 (2016). https://doi.org/10.1137/15M1042097
Article MathSciNet Google Scholar
Larson, J., Menickelly, M., Zhou, B.: Manifold sampling for optimizing nonsmooth nonconvex compositions. SIAM J. Optim. 31(4), 2638–2664 (2021). https://doi.org/10.1137/20M1378089
Article MathSciNet Google Scholar
Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-Newton methods. Math. Program. 141(1), 135–163 (2013). https://doi.org/10.1007/s10107-012-0514-2
Article MathSciNet Google Scholar
Liuzzi, G., Lucidi, S., Rinaldi, F., Vicente, L.N.: Trust-region methods for the derivative-free optimization of nonsmooth black-box functions. SIAM J. Optim. 29(4), 3012–3035 (2019). https://doi.org/10.1137/19m125772x
Article MathSciNet Google Scholar
Mäkelä, M.: Survey of bundle methods for nonsmooth optimization. Optim. Methods Softw. 17(1), 1–29 (2002). https://doi.org/10.1080/10556780290027828
Article ADS MathSciNet Google Scholar
Menickelly, M., Wild, S.M.: Robust learning of trimmed estimators via manifold sampling. In: Modern Trends in Nonconvex Optimization for Machine Learning-ICML 2018 Workshop (2018). https://sites.google.com/view/icml2018nonconvex/papers
Menickelly, M., Wild, S.M.: Derivative-free robust optimization by outer approximations. Math. Program. 179(1–2), 157–193 (2020). https://doi.org/10.1007/s10107-018-1326-9
Article MathSciNet Google Scholar
Moré, J.J., Wild, S.M.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20(1), 172–191 (2009). https://doi.org/10.1137/080724083
Article MathSciNet Google Scholar
Murtagh, B., Saunders, M.: MINOS 5.6. Tech. rep., Department of Operations Research, Stanford University (2016)
Riis, E.S., Ehrhardt, M.J., Quispel, G.R.W., Schönlieb, C.B.: A geometric integration approach to nonsmooth, nonconvex optimisation. Found. Comput. Math. (2021). https://doi.org/10.1007/s10208-020-09489-2
Article Google Scholar
Rockafellar, R., Wets, R.J.B.: Variational Analysis, vol. 317. Springer, Berlin (2009). https://doi.org/10.1007/978-3-642-02431-3
Book Google Scholar
Sagastizábal, C.: Composite proximal bundle method. Math. Program. 140(1), 189–233 (2013). https://doi.org/10.1007/s10107-012-0600-5
Article MathSciNet Google Scholar
Sahinidis, N.V.: BARON 22.3.21: Global Optimization of Mixed-Integer Nonlinear Programs (2022)
Schichl, H., Fendl, H.: A second order bundle algorithm for nonsmooth, nonconvex optimization problems. In: Bagirov, A.M., Gaudioso, M., Karmitsa, N., Mäkelä, M.M., Taheri, S. (eds.) Numerical Nonsmooth Optimization, pp. 117–165. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-34910-3_4
Chapter Google Scholar
Scholtes, S.: Introduction to Piecewise Differentiable Equations. Springer, Berlin (2012). https://doi.org/10.1007/978-1-4614-4340-7
Book Google Scholar
Waltz, R., Nocedal, J.: KNITRO: A Package for Nonlinear Optimization. Manual (2002)
Wiedemann, H.: Particle Accelerator Physics. Springer, Berlin (2015). https://doi.org/10.1007/978-3-319-18317-6
Book Google Scholar
Wild, S.M.: Solving derivative-free nonlinear least squares problems with POUNDERS. In: Terlaky, T., Anjos, M.F., Ahmed, S. (eds.) Advances and Trends in Optimization with Engineering Applications, pp. 529–540. SIAM, Philadelphia (2017). https://doi.org/10.1137/1.9781611974683.ch40
Chapter Google Scholar
Womersley, R., Fletcher, R.: An algorithm for composite nonsmooth optimization problems. J. Optim. Theory Appl. 48(3), 493–523 (1986). https://doi.org/10.1007/bf00940574
Article MathSciNet Google Scholar
Womersley, R.S.: Censored discrete linear \(l_1\) approximation. SIAM J. Sci. Stat. Comput. 7(1), 105–122 (1986). https://doi.org/10.1137/0907008
Article MathSciNet Google Scholar
Yuan, Y.: Conditions for convergence of trust region algorithms for nonsmooth optimization. Math. Program. 31(2), 220–228 (1985). https://doi.org/10.1007/bf02591750
Article MathSciNet Google Scholar
Yuan, Yx.: On the superlinear convergence of a trust region algorithm for nonsmooth optimization. Math. Program. 31(3), 269–285 (1985). https://doi.org/10.1007/bf02591949
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank Geovani Nunes Grapiglia for initial discussions of convergence analysis results.

Funding

This work was supported in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research and Office of High-Energy Physics, Scientific Discovery through Advanced Computing (SciDAC) Program through the FASTMath Institute and the CAMPA Project under Contract No. DE-AC02-06CH11357.

Author information

Author notes

Jeffrey Larson and Matt Menickelly have contributed equally to this work.

Authors and Affiliations

Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL, 60439, USA
Jeffrey Larson & Matt Menickelly

Authors

Jeffrey Larson
Matt Menickelly

Corresponding author

Correspondence toJeffrey Larson.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Table of notation

\({\mathbb {A}}\!\left( z \right) \):

Set of indices of essentially active functions at a point z

\({\mathcal {B}}\):

Euclidean ball

Sampled gradients of f used in numerical tests

Expensive inner function, with components \(F_i\)

Matrix with columns of vectors from \(g_j^k\)

\({\mathbb {G}}\):

Set of indices used to generate G

Model Hessian \(H^k\)

\({\mathfrak {H}}\):

Set of selection functions

\(I_n\):

The identity matrix for \({\mathcal {R}}^n\)

Used for Lipschitz constants, (e.g., \(K_h, K_{\nabla \! F}, K_{\nabla \! F_i}\) )

\({\mathbb {K}}\):

Special sets of iterates

\(L\):

Level set

\(L_{\textrm{max}}\):

Level set plus \(\varDelta _{\textrm{max}}\) padding

\({\mathcal {L}}\):

The Lagrangian function

\({\mathbb {L}}_{\infty }\):

\(\{i\in 1,\dots ,n: \ell _i = -\infty \}\)

Vector mapping of the n models \(m^{F_i}\)

\({\mathbb {N}}\):

The set of Integers

\(Q_i\) is used to define quadratics in test functions

\({\mathcal {R}}\):

The set of real numbers

Set of points \(S^t\) sampled around each x evaluated by a method

\({\mathbb {U}}_{\infty }\):

\(\triangleq \{i\in 1,\dots ,n: u_i = \infty \}\)

A collection of points y from the domain of F that have been evaluated

The value \( [a^k]_j \triangleq f(x^k) - f_j(x^k) + \beta _{j,k}\)

\(b_i\) is used to define quadratics in test functions

\(c_i\) are the censors for the censored-L1 loss function, Also \(c_1,c_2\) are algorithmic constants

\(d_i\) are the data for the censored-L1 loss function

Vector of all ones

Composite objective function \(f \triangleq h \circ F\). Also, sometimes \(f_j \triangleq h_j \circ F\)

The generators, \(g_j^k = \left[ \nabla m^F_k\right] \nabla h_j(F(x^k))\)

Nonsmooth, outer piecewise-selection function

\(h_j\):

Smooth selection functions defining h

General index

Iteration of the algorithm

\(\ell \):

Lower bounds

\(m^{F_i}\):

A model of \(F_i\)

Dimension of domain of F (and f)

Dimension of domain of h

Trust-region subproblem step, \(s^*\) or \(s^k\) or \({\tilde{s}}^k\)

Index for points evaluated by methods (not necessarily the iterate k)

\(u\):

Upper bound on domain

Primal variables for the problem

Points in the domain of F

Points in the domain of h

\(\beta \):

A nonnegative offset added to affine functions in primal model

\(\gamma _{\textrm{d}}\):

Trust-region decrease factor

\(\gamma _{\textrm{i}}\):

Trust-region increase factor

\(\varDelta \):

Trust region radius

\(\varDelta _{\textrm{max}}\):

Upper bound on trust region radius

\(\eta \):

Algorithmic acceptability tolerances

\(\kappa \):

Bounds on errors (between models/functions, fraction of Cauchy decrease) \(\kappa _{i, \textrm{eg}}, \kappa _{\textrm{g}}, \kappa _{\textrm{H}}, \kappa _{\textrm{fcd}}\)

\(\lambda \):

Dual variables (\(\lambda _a, \lambda _{\ell }, \lambda _u\))

\(\pi \):

Used to denote projection problem solution

\(\rho \):

Ratio of actual-versus-predicted decrease

\(\sigma \):

A mapping between two convex sets

\(\tau \):

Tolerance used for data profiles

\(\upchi \):

Stationary measure

\(\varOmega \):

Domain of test problems, either \({\mathcal {R}}^n\) or \([\ell ,u]\)

\(\partial _{\textrm{C}}\):

Clarke subdifferential Operations:

\({{\textbf{c}}}{{\textbf{l}}}\left( {\mathcal {S}} \right) \):

Closure of a set \({\mathcal {S}}\)

\(\textbf{int}\left( {\mathcal {S}} \right) \):

Interior of a set \({\mathcal {S}}\)

\({{\textbf{c}}}{{\textbf{o}}}\left( {\mathcal {S}}\right) \):

Convex hull of a set \({\mathcal {S}}\)

\(\textbf{proj}\left( 0 , {\mathcal {S}} \right) \):

Projection of zero onto a set \({\mathcal {S}}\)

\({{\textbf{i}}}{{\textbf{m}}}_F\left( {\mathcal {S}} \right) \):

Image of set \({\mathcal {S}}\) under F

Additional data profiles

See Figs. 3, 4, 5 and 6.

Fig. 3

Data profiles for subproblems with the pointwise-minimum-squared function \(h_1\), unconstrained and bound-constrained, for three values of \(\tau \)

Fig. 4

Data profiles for subproblems with the pointwise-maximum-squared function \(h_2\), unconstrained and bound-constrained, for three values of \(\tau \)

Fig. 5

Data profiles for subproblems with the censored-L1-loss function \(h_3\), unconstrained and bound-constrained, for three values of \(\tau \)

Fig. 6

Data profiles for subproblem with the piecewise-quadratic function \(h_4\), unconstrained and bound-constrained, for three values of \(\tau \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Larson, J., Menickelly, M. Structure-aware methods for expensive derivative-free nonsmooth composite optimization.Math. Prog. Comp. 16, 1–36 (2024). https://doi.org/10.1007/s12532-023-00245-5

Download citation

Received: 15 July 2022
Accepted: 06 June 2023
Published: 19 August 2023
Version of record: 19 August 2023
Issue date: March 2024
DOI: https://doi.org/10.1007/s12532-023-00245-5

Structure-aware methods for expensive derivative-free nonsmooth composite optimization (original) (raw)

Abstract

Access this article

Buy Now

Similar content being viewed by others

Data availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Table of notation

Additional data profiles

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Structure-aware methods for expensive derivative-free nonsmooth composite optimization (original) (raw)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Table of notation

Additional data profiles

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification