Surette Bierman - Academia.edu (original) (raw)

Uploads

Papers by Surette Bierman

Research paper thumbnail of Variable Selection for Support Vector Machines

Communications in Statistics - Simulation and Computation, Jul 7, 2009

ABSTRACT

Research paper thumbnail of Variable Selection for Support Vector Machines

Communications in Statistics - Simulation and Computation, 2009

The support vector machine (SVM) is a powerful binary classification tool with high accuracy and ... more The support vector machine (SVM) is a powerful binary classification tool with high accuracy and great flexibility. It has achieved great success, but its performance can be seriously impaired if many redundant covariates are included. Some efforts have been devoted to studying variable selection for SVMs, but asymptotic properties, such as variable selection consistency, are largely unknown when the number of predictors diverges to 1. We establish a unified theory for a general class of non-convex penalized SVMs. We first prove that, in ultrahigh dimensions, there is one local minimizer to the objective function of non-convex penalized SVMs having the desired oracle property. We further address the problem of non-unique local minimizers by showing that the local linear approximation algorithm is guaranteed to converge to the oracle estimator even in the ultrahigh dimensional setting if an appropriate initial estimator is available.This condition on the initial estimator is verified to be automatically valid as long as the dimensions are moderately high. Numerical examples provide supportive evidence.

Research paper thumbnail of A bias-variance analysis of ensemble learning for classification

A decomposition of the expected prediction error into bias and variance components is useful when... more A decomposition of the expected prediction error into bias and variance components is useful when investigating the accuracy of a predictor. However, in classification such a decomposition is not as straightforward as in the case of squared-error loss in regression. As a result various definitions of bias and variance for classification can be found in the literature. In this paper these definitions are reviewed and an empirical study of a particular bias-variance decomposition is presented for ensemble classifiers.

Research paper thumbnail of Feature selection and kernel specification for support vector machines using multi-objective genetic algorithms

Support Vector Machines (SVMs) have shown to be popular for classification problems. There are tu... more Support Vector Machines (SVMs) have shown to be popular for classification problems. There are tuning parameters that need to be specified before fitting SVMs. Genetic algorithms (GA) have been used as optimization algorithm for selecting parameters, but most applications excluded the selection of a kernel function. GA has a further extension called multi-objective GA where multiple criteria are specified and the fitness of possible solutions are determined by their level of dominance. The use of multi-objective GA applied to SVMs is demonstrated where the optimization criteria are prediction error, number of variables and number of support vectors. The kernel function, kernel parameters and cost parameter(C) form part of the member definition of the GA. Benchmark and simulated data sets are used to show how this approach provides a range of solutions that are trade-offs of the various optimization criteria. For the standard GA where prediction error is used as fitness criterion...

Research paper thumbnail of Interpretable multi-label classification by means of multivariate linear regression

CITATION: Bierman, S. 2019. Interpretable multi-label classification by means of multivariate lin... more CITATION: Bierman, S. 2019. Interpretable multi-label classification by means of multivariate linear regression. South African Statistics Journal, 53(1):1-13.

Research paper thumbnail of Variable Selection for Kernel Classifiers: A Feature-to-Input Space Approach

Research paper thumbnail of A measure of post variable selection error in multiple linear regression, and its estimation

Journal of Statistical Computation and Simulation

ABSTRACT

Research paper thumbnail of Variable Selection for Kernel Classification

Communications in Statistics— …, 2011

... Mika et al. (199912. Mika , S. , Rätsch , G. , Weston , J. , Schölkopf , B. , Müller , K.-R. ... more ... Mika et al. (199912. Mika , S. , Rätsch , G. , Weston , J. , Schölkopf , B. , Müller , K.-R. ( 1999 ). Fisher discriminant analysis with kernels . In : Hu , Y.-H. , Larsen , J. , Wilson , E. , Douglas , S. , eds. Neural Networks for Signal Processing . Vol. IX . New York : IEEE Press , pp. ...

Research paper thumbnail of Variable selection for kernel methods with application to binary classification

The problem of variable selection in binary kernel classification is addressed in this thesis. Ke... more The problem of variable selection in binary kernel classification is addressed in this thesis. Kernel methods are fairly recent additions to the statistical toolbox, having originated approximately two decades ago in machine learning and artificial intelligence. These methods are growing in popularity and are already frequently applied in regression and classification problems. A special thank you also to my dad, Klopper Oosthuizen, for many investments in me, and for his love and support, and to my family and friends. VIII CONTENTS CHAPTER 1: INTRODUCTION 1.1 NOTATION 1.2 OVERVIEW OF THE THESIS CHAPTER 2: VARIABLE SELECTION FOR KERNEL METHODS 2.1 INTRODUCTION 2.2 AN OVERVIEW OF KERNEL METHODS 2.2.1 BASIC CONCEPTS 2.2.2 KERNEL FUNCTIONS AND THE KERNEL TRICK 2.2.3 CONSTRUCTING A KERNEL CLASSIFIER 2.2.4 A REGULARISATION PERSPECTIVE 2.3 VARIABLE SELECTION IN BINARY CLASSIFICATION: IMPORTANT ASPECTS 2.3.1 THE RELEVANCE OF VARIABLES 2.3.2 SELECTION STRATEGIES AND CRITERIA 2.4 VARIABLE SELECTION FOR KERNEL METHODS 2.4.1 THE NEED FOR VARIABLE SELECTION 2.4.2 COMPLICATING FACTORS AND POSSIBLE APPROACHES 2.5 SUMMARY CHAPTER 3: KERNEL VARIABLE SELECTION IN INPUT SPACE 1 K 3.4 MONTE CARLO SIMULATION STUDY IX 3.4.1 EXPERIMENTAL DESIGN 3.4.2 STEPS IN EACH SIMULATION REPETITION 3.4.3 GENERATING THE TRAINING AND TEST DATA 3.4.4 HYPERPARAMETER SPECIFICATION 3.4.5 THE VARIABLE SELECTION PROCEDURES 3.4.6 RESULTS AND CONCLUSIONS 3.5 SUMMARY CHAPTER 4: ALGORITHM-INDEPENDENT AND ALGORITHM-DEPENDENT SELECTION IN FEATURE SPACE 4.1 INTRODUCTION 4.2 SUPPORT VECTOR MACHINES 4.2.1 THE TRAINING DATA ARE LINEARLY SEPARABLE IN INPUT SPACE 4.2.2 THE TRAINING DATA ARE LINEARLY SEPARABLE IN FEATURE SPACE 4.2.3 HANDLING NOISY DATA 4.3 KERNEL FISHER DISCRIMINANT ANALYSIS 4.3.1 LINEAR DISCRIMINANT ANALYSIS 4.3.2 THE KERNEL FISHER DISCRIMINANT FUNCTION

Research paper thumbnail of A meta-analysis of research in random forests for classification

2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech)

Research paper thumbnail of Variable Selection for Support Vector Machines

Communications in Statistics - Simulation and Computation, Jul 7, 2009

ABSTRACT

Research paper thumbnail of Variable Selection for Support Vector Machines

Communications in Statistics - Simulation and Computation, 2009

The support vector machine (SVM) is a powerful binary classification tool with high accuracy and ... more The support vector machine (SVM) is a powerful binary classification tool with high accuracy and great flexibility. It has achieved great success, but its performance can be seriously impaired if many redundant covariates are included. Some efforts have been devoted to studying variable selection for SVMs, but asymptotic properties, such as variable selection consistency, are largely unknown when the number of predictors diverges to 1. We establish a unified theory for a general class of non-convex penalized SVMs. We first prove that, in ultrahigh dimensions, there is one local minimizer to the objective function of non-convex penalized SVMs having the desired oracle property. We further address the problem of non-unique local minimizers by showing that the local linear approximation algorithm is guaranteed to converge to the oracle estimator even in the ultrahigh dimensional setting if an appropriate initial estimator is available.This condition on the initial estimator is verified to be automatically valid as long as the dimensions are moderately high. Numerical examples provide supportive evidence.

Research paper thumbnail of A bias-variance analysis of ensemble learning for classification

A decomposition of the expected prediction error into bias and variance components is useful when... more A decomposition of the expected prediction error into bias and variance components is useful when investigating the accuracy of a predictor. However, in classification such a decomposition is not as straightforward as in the case of squared-error loss in regression. As a result various definitions of bias and variance for classification can be found in the literature. In this paper these definitions are reviewed and an empirical study of a particular bias-variance decomposition is presented for ensemble classifiers.

Research paper thumbnail of Feature selection and kernel specification for support vector machines using multi-objective genetic algorithms

Support Vector Machines (SVMs) have shown to be popular for classification problems. There are tu... more Support Vector Machines (SVMs) have shown to be popular for classification problems. There are tuning parameters that need to be specified before fitting SVMs. Genetic algorithms (GA) have been used as optimization algorithm for selecting parameters, but most applications excluded the selection of a kernel function. GA has a further extension called multi-objective GA where multiple criteria are specified and the fitness of possible solutions are determined by their level of dominance. The use of multi-objective GA applied to SVMs is demonstrated where the optimization criteria are prediction error, number of variables and number of support vectors. The kernel function, kernel parameters and cost parameter(C) form part of the member definition of the GA. Benchmark and simulated data sets are used to show how this approach provides a range of solutions that are trade-offs of the various optimization criteria. For the standard GA where prediction error is used as fitness criterion...

Research paper thumbnail of Interpretable multi-label classification by means of multivariate linear regression

CITATION: Bierman, S. 2019. Interpretable multi-label classification by means of multivariate lin... more CITATION: Bierman, S. 2019. Interpretable multi-label classification by means of multivariate linear regression. South African Statistics Journal, 53(1):1-13.

Research paper thumbnail of Variable Selection for Kernel Classifiers: A Feature-to-Input Space Approach

Research paper thumbnail of A measure of post variable selection error in multiple linear regression, and its estimation

Journal of Statistical Computation and Simulation

ABSTRACT

Research paper thumbnail of Variable Selection for Kernel Classification

Communications in Statistics— …, 2011

... Mika et al. (199912. Mika , S. , Rätsch , G. , Weston , J. , Schölkopf , B. , Müller , K.-R. ... more ... Mika et al. (199912. Mika , S. , Rätsch , G. , Weston , J. , Schölkopf , B. , Müller , K.-R. ( 1999 ). Fisher discriminant analysis with kernels . In : Hu , Y.-H. , Larsen , J. , Wilson , E. , Douglas , S. , eds. Neural Networks for Signal Processing . Vol. IX . New York : IEEE Press , pp. ...

Research paper thumbnail of Variable selection for kernel methods with application to binary classification

The problem of variable selection in binary kernel classification is addressed in this thesis. Ke... more The problem of variable selection in binary kernel classification is addressed in this thesis. Kernel methods are fairly recent additions to the statistical toolbox, having originated approximately two decades ago in machine learning and artificial intelligence. These methods are growing in popularity and are already frequently applied in regression and classification problems. A special thank you also to my dad, Klopper Oosthuizen, for many investments in me, and for his love and support, and to my family and friends. VIII CONTENTS CHAPTER 1: INTRODUCTION 1.1 NOTATION 1.2 OVERVIEW OF THE THESIS CHAPTER 2: VARIABLE SELECTION FOR KERNEL METHODS 2.1 INTRODUCTION 2.2 AN OVERVIEW OF KERNEL METHODS 2.2.1 BASIC CONCEPTS 2.2.2 KERNEL FUNCTIONS AND THE KERNEL TRICK 2.2.3 CONSTRUCTING A KERNEL CLASSIFIER 2.2.4 A REGULARISATION PERSPECTIVE 2.3 VARIABLE SELECTION IN BINARY CLASSIFICATION: IMPORTANT ASPECTS 2.3.1 THE RELEVANCE OF VARIABLES 2.3.2 SELECTION STRATEGIES AND CRITERIA 2.4 VARIABLE SELECTION FOR KERNEL METHODS 2.4.1 THE NEED FOR VARIABLE SELECTION 2.4.2 COMPLICATING FACTORS AND POSSIBLE APPROACHES 2.5 SUMMARY CHAPTER 3: KERNEL VARIABLE SELECTION IN INPUT SPACE 1 K 3.4 MONTE CARLO SIMULATION STUDY IX 3.4.1 EXPERIMENTAL DESIGN 3.4.2 STEPS IN EACH SIMULATION REPETITION 3.4.3 GENERATING THE TRAINING AND TEST DATA 3.4.4 HYPERPARAMETER SPECIFICATION 3.4.5 THE VARIABLE SELECTION PROCEDURES 3.4.6 RESULTS AND CONCLUSIONS 3.5 SUMMARY CHAPTER 4: ALGORITHM-INDEPENDENT AND ALGORITHM-DEPENDENT SELECTION IN FEATURE SPACE 4.1 INTRODUCTION 4.2 SUPPORT VECTOR MACHINES 4.2.1 THE TRAINING DATA ARE LINEARLY SEPARABLE IN INPUT SPACE 4.2.2 THE TRAINING DATA ARE LINEARLY SEPARABLE IN FEATURE SPACE 4.2.3 HANDLING NOISY DATA 4.3 KERNEL FISHER DISCRIMINANT ANALYSIS 4.3.1 LINEAR DISCRIMINANT ANALYSIS 4.3.2 THE KERNEL FISHER DISCRIMINANT FUNCTION

Research paper thumbnail of A meta-analysis of research in random forests for classification

2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech)