Reliability of Cross-Validation for SVMs in High-Dimensional, Low Sample Size Scenarios (original) (raw)
Abstract
A Support-Vector-Machine (SVM) learns for given 2-class-data a classifier that tries to achieve good generalisation by maximising the minimal margin between the two classes. The performance can be evaluated using cross-validation testing strategies. But in case of low sample size data, high dimensionality might lead to strong side-effects that can significantly bias the estimated performance of the classifier. On simulated data, we illustrate the effects of high dimensionality for cross-validation of both hard- and soft-margin SVMs. Based on the theoretical proofs towards infinity we derive heuristics that can be easily used to validate whether or not given data sets are subject to these constraints.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
- Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar - Lockhart, D.J., Winzeler, E.: Genomics, Gene Expression and DNA Arrays. Nature 405, 827–836 (2000)
Article Google Scholar - Cristianini, N., Shawe-Taylor, J.: Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Google Scholar - Martinetz, T., Labusch, K., Schneegaß, D.: SoftDoubleMinOver: A Simple Procedure for Maximum Margin Classification. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 301–306. Springer, Heidelberg (2005)
Google Scholar - Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, pp. 1137–1145 (1995)
Google Scholar - Hall, P., Marron, J.S., Neeman, A.: Geometric representation of high dimension, low sample size data. J. R. Statist. Soc. 67(3), 427–444 (2005)
Article MATH MathSciNet Google Scholar - Ahn, J., Marron, J.S., Muller, K.M., Chi, Y.Y.: The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika 94(3), 760–766 (2007)
Article MATH Google Scholar - Bartlett, P., Shawe-Taylor, J.: Generalization Performance of Support Vector Machines and Other Pattern Classifiers. In: Advances in Kernel Methods: Support Vector Learning, pp. 43–54. MIT Press, Cambridge (1999)
Google Scholar
Author information
Authors and Affiliations
- Institute for Neuro- and Bioinformatics, University of Lübeck,
Sascha Klement, Amir Madany Mamlouk & Thomas Martinetz
Authors
- Sascha Klement
- Amir Madany Mamlouk
- Thomas Martinetz
Editor information
Véra Kůrková Roman Neruda Jan Koutník
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Klement, S., Madany Mamlouk, A., Martinetz, T. (2008). Reliability of Cross-Validation for SVMs in High-Dimensional, Low Sample Size Scenarios. In: Kůrková, V., Neruda, R., Koutník, J. (eds) Artificial Neural Networks - ICANN 2008. ICANN 2008. Lecture Notes in Computer Science, vol 5163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87536-9\_5
Download citation
- .RIS
- .ENW
- .BIB
- DOI: https://doi.org/10.1007/978-3-540-87536-9\_5
- Publisher Name: Springer, Berlin, Heidelberg
- Print ISBN: 978-3-540-87535-2
- Online ISBN: 978-3-540-87536-9
- eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.