A hybrid feature subset selection algorithm for analysis of high correlation proteomic data (original) (raw)

2012, Journal of Medical Signals & Sensors

A major problem in the treatment of cancer is the lack of a suitable technique for early diagnosis of the disease. The ovarian cancer is a widespread disease within the population of women, and its early diagnosis can greatly prevent the mortality rate. [1] With current diagnostic tools, the disease is diagnosed at an advanced clinical stage in more than 80% of patients that the 5-year survival is only 35% after late stage presentation. [2] It is known that the pathological changes within an organ can be reflected as proteomic patterns in biological fluids such as plasma, serum, and urine. [3] The surface-enhanced laser desorption and ionization time-of-flight mass spectrometry (SELDI-TOF MS) has been used to provide proteomics profile from biological fluids. [4-6] The mass spectrum data analysis is a fast and rather inexpensive procedure to diagnose the disease, and it may potentially allow cancer screening without any complication during the time of diagnosis. In many screening tasks, the input data are presented by a very large number of features of A b s t r A c t Pathological changes within an organ can be reflected as proteomic patterns in biological fluids such as plasma, serum, and urine. The surface-enhanced laser desorption and ionization time-of-flight mass spectrometry (SELDI-TOF MS) has been used to generate proteomic profiles from biological fluids. Mass spectrometry yields redundant noisy data that the most data points are irrelevant features for differentiating between cancer and normal cases. In this paper, we have proposed a hybrid feature subset selection algorithm based on maximum-discrimination and minimum-correlation coupled with peak scoring criteria. Our algorithm has been applied to two independent SELDI-TOF MS datasets of ovarian cancer obtained from the NCI-FDA clinical proteomics databank. The proposed algorithm has used to extract a set of proteins as potential biomarkers in each dataset. We applied the linear discriminate analysis to identify the important biomarkers. The selected biomarkers have been able to successfully diagnose the ovarian cancer patients from the noncancer control group with an accuracy of 100%, a sensitivity of 100%, and a specificity of 100% in the two datasets. The hybrid algorithm has the advantage that increases reproducibility of selected biomarkers and able to find a small set of proteins with high discrimination power.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.