Regularized Machine Learning in the Genetic Prediction of Complex Traits (original) (raw)

< Back to Article

Figure 2

Penalty terms and loss functions.

(A) Penalty terms: _L_0-norm imposes the most explicit constraint on the model complexity as it effectively counts the number of nonzero entries in the model parameter vector. While it is possible to train prediction models with _L_0-penalty using, e.g., greedy or other types of discrete optimization methods, the problem becomes mathematically challenging due to the nonconvexity of the constraint, especially when other than the squared loss function is used. The convexity of the _L_1 and _L_2 norms makes them easier for the optimization. While the _L_2 norm has good regularization properties, it must be used together with either _L_0 or _L_1 norms to perform feature selection. (B) Loss functions: The plain classification error is difficult to minimize due to its nonconvex and discontinuous nature, and therefore one often resorts to its better behaving surrogates, including the hinge loss used with SVMs, the cross-entropy used with logistic regression, or the squared error used with regularized least-squares classification and regression. These surrogates in turn differ both in their quality of approximating the classification error and in terms of the optimization machinery they can be minimized with (Text S1).

Figure 2

doi: https://doi.org/10.1371/journal.pgen.1004754.g002