Relief-based feature selection: Introduction and review - PubMed (original) (raw)
Review
Relief-based feature selection: Introduction and review
Ryan J Urbanowicz et al. J Biomed Inform. 2018 Sep.
Abstract
Feature selection plays a critical role in biomedical data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Relief-based algorithms (RBAs), a unique family of filter-style feature selection algorithms that have gained appeal by striking an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability.
Keywords: Epistasis; Feature interaction; Feature selection; Feature weighting; Filter; ReliefF.
Copyright © 2018 Elsevier Inc. All rights reserved.
Figures
Figure 1:
Typical stages of a data mining analysis pipeline. Feature selection is starred as it is the focus of this review. The dotted line indicates how model performance can be fed back into feature processing, iteratively removing irrelevant features or seeking to construct relevant ones.
Figure 2:
Relief updating W_[A_] for a given target instance when it is compared to its nearest miss and hit. In this example, features are discrete with possible values of X, Y, or Z, and endpoint is binary with a value of 0 or 1. Notice that when the value of a feature is different, the corresponding feature weight increases by 1/m for the nearest miss, and reduces by 1_/m_ for the nearest hit.
Figure 3:
Illustrations of RBA neighbor selection and/or instance weighting schemes. Methods with a red/yellow gradient adopt an instance weighting scheme while other methods identify instances as ‘near’ or ‘far’ which then contribute fully to feature weight updates. These illustrations are conceptual and are not drawn to scale.
Figure 4:
Illustrations of the basic concepts behind key iterative and efficiency approaches including TuRF, Iterative Relief/I-RELIEF, and VLSReliefF. Features are represented as squares, where darker shading indicates a lower feature weight/score.
Similar articles
- Benchmarking relief-based feature selection methods for bioinformatics data mining.
Urbanowicz RJ, Olson RS, Schmitt P, Meeker M, Moore JH. Urbanowicz RJ, et al. J Biomed Inform. 2018 Sep;85:168-188. doi: 10.1016/j.jbi.2018.07.015. Epub 2018 Jul 17. J Biomed Inform. 2018. PMID: 30030120 Free PMC article. - Assessing the limitations of relief-based algorithms in detecting higher-order interactions.
Freda PJ, Ye S, Zhang R, Moore JH, Urbanowicz RJ. Freda PJ, et al. BioData Min. 2024 Oct 1;17(1):37. doi: 10.1186/s13040-024-00390-0. BioData Min. 2024. PMID: 39354639 Free PMC article. - Assessing the Limitations of Relief-Based Algorithms in Detecting Higher-Order Interactions.
Freda PJ, Ye S, Zhang R, Moore JH, Urbanowicz RJ. Freda PJ, et al. Res Sq [Preprint]. 2024 Sep 2:rs.3.rs-4870116. doi: 10.21203/rs.3.rs-4870116/v1. Res Sq. 2024. PMID: 39281873 Free PMC article. Updated. Preprint. - Feature selection methods for big data bioinformatics: A survey from the search perspective.
Wang L, Wang Y, Chang Q. Wang L, et al. Methods. 2016 Dec 1;111:21-31. doi: 10.1016/j.ymeth.2016.08.014. Epub 2016 Aug 31. Methods. 2016. PMID: 27592382 Review. - Statistical Approaches to Candidate Biomarker Panel Selection.
Spratt HM, Ju H. Spratt HM, et al. Adv Exp Med Biol. 2016;919:463-492. doi: 10.1007/978-3-319-41448-5_22. Adv Exp Med Biol. 2016. PMID: 27975231 Free PMC article. Review.
Cited by
- PSO-Based Evolutionary Approach to Optimize Head and Neck Biomedical Image to Detect Mesothelioma Cancer.
Praveen S, Tyagi N, Singh B, Karetla GR, Thalor MA, Joshi K, Tsegaye M. Praveen S, et al. Biomed Res Int. 2022 Aug 5;2022:3618197. doi: 10.1155/2022/3618197. eCollection 2022. Biomed Res Int. 2022. PMID: 36033562 Free PMC article. Retracted. - Early Detection of Freezing of Gait during Walking Using Inertial Measurement Unit and Plantar Pressure Distribution Data.
Pardoel S, Shalin G, Nantel J, Lemaire ED, Kofman J. Pardoel S, et al. Sensors (Basel). 2021 Mar 23;21(6):2246. doi: 10.3390/s21062246. Sensors (Basel). 2021. PMID: 33806984 Free PMC article. - Classification of Game Demand and the Presence of Experimental Pain Using Functional Near-Infrared Spectroscopy.
Fairclough SH, Dobbins C, Stamp K. Fairclough SH, et al. Front Neuroergon. 2021 Dec 21;2:695309. doi: 10.3389/fnrgo.2021.695309. eCollection 2021. Front Neuroergon. 2021. PMID: 38235227 Free PMC article. - Transforming Motor Imagery Analysis: A Novel EEG Classification Framework Using AtSiftNet Method.
Xu H, Haider W, Aziz MZ, Sun Y, Yu X. Xu H, et al. Sensors (Basel). 2024 Oct 7;24(19):6466. doi: 10.3390/s24196466. Sensors (Basel). 2024. PMID: 39409506 Free PMC article. - Classification of Multiple H&E Images via an Ensemble Computational Scheme.
Longo LHDC, Roberto GF, Tosta TAA, de Faria PR, Loyola AM, Cardoso SV, Silva AB, do Nascimento MZ, Neves LA. Longo LHDC, et al. Entropy (Basel). 2023 Dec 28;26(1):34. doi: 10.3390/e26010034. Entropy (Basel). 2023. PMID: 38248160 Free PMC article.
References
- Agre G, Dzhondzhorov A, 2016. A weighted feature selection method for instance-based classification. In: International Conference on Artificial Intelligence: Methodology, Systems, and Applications Springer, pp. 14–25.
- Aha DW, Kibler D, Albert MK, 1991. Instance-based learning algorithms. Machine learning 6 (1), 37–66.
- Almuallim H, Dietterich TG, 1991. Learning with many irrelevant features. In: AAAI. Vol. 91 pp. 547–552.
- Arauzo-Azofra A, Benitez JM, Castro JL, 2004. A feature set measure based on relief. In: Proceedings of the fifth international conference on Recent Advances in Soft Computing pp. 104–109.
- Belanche LA, Gonz´alez FF, 2011. Review and evaluation of feature selection algorithms in synthetic problems. arXiv preprint arXiv:1101.2320.
Publication types
MeSH terms
Grants and funding
- R01 EY022300/EY/NEI NIH HHS/United States
- U01 TR001263/TR/NCATS NIH HHS/United States
- U01 DK112217/DK/NIDDK NIH HHS/United States
- R01 LM011360/LM/NLM NIH HHS/United States
- P30 ES013508/ES/NIEHS NIH HHS/United States
- R01 LM009012/LM/NLM NIH HHS/United States
- UC4 DK112217/DK/NIDDK NIH HHS/United States
- R01 HL134015/HL/NHLBI NIH HHS/United States
- R01 LM010098/LM/NLM NIH HHS/United States
- R01 AI116794/AI/NIAID NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources