Vimal Kumar Dubey - Academia.edu (original) (raw)
Volume 3 Issue 4 by Vimal Kumar Dubey
Citation/Export MLA Dr. Amit Kumar Saxena, Vimal Kumar Dubey, “A Survey on Feature Selection Alg... more Citation/Export
MLA
Dr. Amit Kumar Saxena, Vimal Kumar Dubey, “A Survey on Feature Selection Algorithms”, April 15 Volume 3 Issue 4 , International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), ISSN: 2321-8169, PP: 1895 - 1899, DOI: 10.17762/ijritcc2321-8169.160431
APA
Dr. Amit Kumar Saxena, Vimal Kumar Dubey, April 15 Volume 3 Issue 4, “A Survey on Feature Selection Algorithms”, International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), ISSN: 2321-8169, PP: 1895 - 1899, DOI: 10.17762/ijritcc2321-8169.160431
Papers by Vimal Kumar Dubey
2016 International Conference on Control, Computing, Communication and Materials (ICCCCM)
Filter-based feature selection techniques are less complex compare to Wrapper-based feature selec... more Filter-based feature selection techniques are less complex compare to Wrapper-based feature selection techniques in case of High Dimensional datasets. In this paper, we proposed a filter method feature selection, which is Cosine Similarity-based Filter feature selection Technique (CSF) for High-Dimensional Datasets. In this method, absolute cosine similarity with respect to class label is used to ordering the features and from ordered features list a user-defined number of features is selected. Dataset with selected features is tested for classification accuracy using Multi-classifier system (K-Nearest Neighbor (KNN), Classification and Regression Tree (CART), Naive Bayes (NB) and Support Vector Machine (SVM)). This method is applied to four high-dimensional binary class datasets and obtained accuracy shows that method is either better or equivalent compared to other existing methods.
International Journal of Data Mining, Modelling and Management
Hybrid methods are very important for feature selection in case of the classification of high-dim... more Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.
Journal of Information Technology Research
A novel hybrid method based on Cosine Similarity and Mutual Information is presented to find out ... more A novel hybrid method based on Cosine Similarity and Mutual Information is presented to find out relevant feature subset. Initially, the supervised Cosine Similarity of each feature is calculated with respect to the class vector and then features are grouped based on the obtained cosine similarity values. From each group the best mutual informative feature is selected. The selected features subset is tested using the three classifiers namely Naïve Bayes (NB), K-Nearest Neighbor and Classification and Regression trees (CART) for getting classification accuracy. The proposed method is applied to various high dimensional datasets. Obtained results showed that the proposed method is capable of eliminating the redundant and irrelevant features.
2016 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), 2016
In this paper, we propose a hybrid classification model, which has correlation based filter featu... more In this paper, we propose a hybrid classification model, which has correlation based filter feature selection algorithm and support vector machine as a classifier. In this method, features are ordered according to their Absolute correlation value with respect to the class attribute. Then top K Features are selected from ordered list of features to form a reduced dataset. The classification accuracy is measured using SVM classifiers with and without extending features of the reduced dataset. This proposed classifier model is applied to five high-dimensional binary class datasets. It is observed that the proposed method yields higher classification accuracies in the case of three out of five high dimensional datasets with a reasonably small number of features.
2015 39th National Systems Conference (NSC), 2015
Due to day to day use of information processing in society, the size of the databases has become ... more Due to day to day use of information processing in society, the size of the databases has become tremendously high. It has been realized that most of the times, all parameters (called features precisely here) are not required to decide the outcome (or decision) of an instance. Therefore feature selection is an important step in data processing. In this paper, a novel method is presented to select features. In the method, cosine similarity of individual feature of the database with the respective class is computed and kept in an array in descending order. The first feature of this array is combined with rest of the features sequentially one by one. If the classification accuracy of the combination of features increases then the combination is accepted otherwise the responsible features are eliminated from the combination. In this manner all features are tested and a final subset of features is obtained. The results obtained after rigorous experiments on the proposed method on high dimensional databases and comparing with other methods reported so far are encouraging. It is therefore recommended that the proposed method can be applied for high dimensional data processing.
International Journal on Recent and Innovation Trends in Computing and Communication, 2015
One major component of machine learning is feature analysis which comprises of mainly two process... more One major component of machine learning is feature analysis which comprises of mainly two processes: feature selection and feature extraction. Due to its applications in several areas including data mining, soft computing and big data analysis, feature selection has got a reasonable importance. This paper presents an introductory concept of feature selection with various inherent approaches. The paper surveys historic developments reported in feature selection with supervised and unsupervised methods. The recent developments with the state of the art in the on-going feature selection algorithms have also been summarized in the paper including their hybridizations.
2016 International Conference on ICT in Business Industry & Government (ICTBIG), 2016
In this paper, a feature selection method is presented for the multiclass data sets. This method ... more In this paper, a feature selection method is presented for the multiclass data sets. This method is the hybridization of k-means clustering using cosine similarity as a distance measure and information Gain. In the method unsupervised Cosine Similarity is used for grouping of features i.e. K-means clustering is used to make a cluster of features and then information gain is employed to select a most relevant feature from each cluster. The dataset with the selected feature is tested for classification accuracy with cross - validation approach. Three classifiers namely Naïve Bayes (NB), K-Nearest Neighbor and Classification and Regression trees (CART) has been used as the base classifiers for getting classification accuracy. Obtained results are compared with filter-based feature selection technique (Information Gain).
Vehicular Technology Ieee Transactions on, Jul 1, 2003
In this paper, the integration of power control and turbo coding is adopted to achieve reliable c... more In this paper, the integration of power control and turbo coding is adopted to achieve reliable communications over Ka-band code-division multiple-access (CDMA)-based low earth orbit (LEO) satellite systems. The effect of imperfect power control on the bit error ratio (BER) performance is analyzed, and the upper bounds on BER are also derived for the case of slow and fast imperfect power control. The analytical and simulation results show that power-control error (PCE) degrades the BER performance of turbo-decoded systems significantly in the waterfall region. In the region of error floor, the degradation effect slows down and the BER obtained under imperfect power control eventually converges with the BER under perfect power control. Moreover, in the waterfall region, the correlation characteristics of the PCE fluctuation do not degrade the BER performance as anticipated.
Proceedings of Ieee Singapore International Conference on Networks International Conference on Information Engineering 93, Sep 6, 1993
Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003
Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003
Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003
Journal of Plant Interactions, 2015
Proceedings of GLOBECOM '95, 1995
The 8-State Ungerboeck (1982) code is modified by the output symbol re-assignment technique with ... more The 8-State Ungerboeck (1982) code is modified by the output symbol re-assignment technique with the Euclidean distance redistribution strategy. Two versions of this new design are presented, one of which performs consistently better than the existing 8-State Ungerboeck code in the Rician and land-mobile satellite fading channels
IEEE/AFCEA EUROCOMM 2000. Information Systems for Enhanced Public Safety and Security (Cat. No.00EX405), 2000
ABSTRACT
Proceedings of Vehicular Technology Conference - VTC, 1996
We study the urban mobile communication channel by the time-varying impulse response h(τ,t) using... more We study the urban mobile communication channel by the time-varying impulse response h(τ,t) using the random linear time-variant (LTV) filter theory. The scattering function, which contains much valuable information of the mobile communication channel, is generated and analyzed in the time-frequency domain. The characteristics of the channel can be efficiently extracted from the scattering function. A measurement-based simulation method and
Proceedings of GLOBECOM '95, 1995
IEEE VTS 53rd Vehicular Technology Conference, Spring 2001. Proceedings (Cat. No.01CH37202), 2001
IEEE International Conference on Communications, 2003. ICC '03., 2003
Citation/Export MLA Dr. Amit Kumar Saxena, Vimal Kumar Dubey, “A Survey on Feature Selection Alg... more Citation/Export
MLA
Dr. Amit Kumar Saxena, Vimal Kumar Dubey, “A Survey on Feature Selection Algorithms”, April 15 Volume 3 Issue 4 , International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), ISSN: 2321-8169, PP: 1895 - 1899, DOI: 10.17762/ijritcc2321-8169.160431
APA
Dr. Amit Kumar Saxena, Vimal Kumar Dubey, April 15 Volume 3 Issue 4, “A Survey on Feature Selection Algorithms”, International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), ISSN: 2321-8169, PP: 1895 - 1899, DOI: 10.17762/ijritcc2321-8169.160431
2016 International Conference on Control, Computing, Communication and Materials (ICCCCM)
Filter-based feature selection techniques are less complex compare to Wrapper-based feature selec... more Filter-based feature selection techniques are less complex compare to Wrapper-based feature selection techniques in case of High Dimensional datasets. In this paper, we proposed a filter method feature selection, which is Cosine Similarity-based Filter feature selection Technique (CSF) for High-Dimensional Datasets. In this method, absolute cosine similarity with respect to class label is used to ordering the features and from ordered features list a user-defined number of features is selected. Dataset with selected features is tested for classification accuracy using Multi-classifier system (K-Nearest Neighbor (KNN), Classification and Regression Tree (CART), Naive Bayes (NB) and Support Vector Machine (SVM)). This method is applied to four high-dimensional binary class datasets and obtained accuracy shows that method is either better or equivalent compared to other existing methods.
International Journal of Data Mining, Modelling and Management
Hybrid methods are very important for feature selection in case of the classification of high-dim... more Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.
Journal of Information Technology Research
A novel hybrid method based on Cosine Similarity and Mutual Information is presented to find out ... more A novel hybrid method based on Cosine Similarity and Mutual Information is presented to find out relevant feature subset. Initially, the supervised Cosine Similarity of each feature is calculated with respect to the class vector and then features are grouped based on the obtained cosine similarity values. From each group the best mutual informative feature is selected. The selected features subset is tested using the three classifiers namely Naïve Bayes (NB), K-Nearest Neighbor and Classification and Regression trees (CART) for getting classification accuracy. The proposed method is applied to various high dimensional datasets. Obtained results showed that the proposed method is capable of eliminating the redundant and irrelevant features.
2016 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), 2016
In this paper, we propose a hybrid classification model, which has correlation based filter featu... more In this paper, we propose a hybrid classification model, which has correlation based filter feature selection algorithm and support vector machine as a classifier. In this method, features are ordered according to their Absolute correlation value with respect to the class attribute. Then top K Features are selected from ordered list of features to form a reduced dataset. The classification accuracy is measured using SVM classifiers with and without extending features of the reduced dataset. This proposed classifier model is applied to five high-dimensional binary class datasets. It is observed that the proposed method yields higher classification accuracies in the case of three out of five high dimensional datasets with a reasonably small number of features.
2015 39th National Systems Conference (NSC), 2015
Due to day to day use of information processing in society, the size of the databases has become ... more Due to day to day use of information processing in society, the size of the databases has become tremendously high. It has been realized that most of the times, all parameters (called features precisely here) are not required to decide the outcome (or decision) of an instance. Therefore feature selection is an important step in data processing. In this paper, a novel method is presented to select features. In the method, cosine similarity of individual feature of the database with the respective class is computed and kept in an array in descending order. The first feature of this array is combined with rest of the features sequentially one by one. If the classification accuracy of the combination of features increases then the combination is accepted otherwise the responsible features are eliminated from the combination. In this manner all features are tested and a final subset of features is obtained. The results obtained after rigorous experiments on the proposed method on high dimensional databases and comparing with other methods reported so far are encouraging. It is therefore recommended that the proposed method can be applied for high dimensional data processing.
International Journal on Recent and Innovation Trends in Computing and Communication, 2015
One major component of machine learning is feature analysis which comprises of mainly two process... more One major component of machine learning is feature analysis which comprises of mainly two processes: feature selection and feature extraction. Due to its applications in several areas including data mining, soft computing and big data analysis, feature selection has got a reasonable importance. This paper presents an introductory concept of feature selection with various inherent approaches. The paper surveys historic developments reported in feature selection with supervised and unsupervised methods. The recent developments with the state of the art in the on-going feature selection algorithms have also been summarized in the paper including their hybridizations.
2016 International Conference on ICT in Business Industry & Government (ICTBIG), 2016
In this paper, a feature selection method is presented for the multiclass data sets. This method ... more In this paper, a feature selection method is presented for the multiclass data sets. This method is the hybridization of k-means clustering using cosine similarity as a distance measure and information Gain. In the method unsupervised Cosine Similarity is used for grouping of features i.e. K-means clustering is used to make a cluster of features and then information gain is employed to select a most relevant feature from each cluster. The dataset with the selected feature is tested for classification accuracy with cross - validation approach. Three classifiers namely Naïve Bayes (NB), K-Nearest Neighbor and Classification and Regression trees (CART) has been used as the base classifiers for getting classification accuracy. Obtained results are compared with filter-based feature selection technique (Information Gain).
Vehicular Technology Ieee Transactions on, Jul 1, 2003
In this paper, the integration of power control and turbo coding is adopted to achieve reliable c... more In this paper, the integration of power control and turbo coding is adopted to achieve reliable communications over Ka-band code-division multiple-access (CDMA)-based low earth orbit (LEO) satellite systems. The effect of imperfect power control on the bit error ratio (BER) performance is analyzed, and the upper bounds on BER are also derived for the case of slow and fast imperfect power control. The analytical and simulation results show that power-control error (PCE) degrades the BER performance of turbo-decoded systems significantly in the waterfall region. In the region of error floor, the degradation effect slows down and the BER obtained under imperfect power control eventually converges with the BER under perfect power control. Moreover, in the waterfall region, the correlation characteristics of the PCE fluctuation do not degrade the BER performance as anticipated.
Proceedings of Ieee Singapore International Conference on Networks International Conference on Information Engineering 93, Sep 6, 1993
Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003
Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003
Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003
Journal of Plant Interactions, 2015
Proceedings of GLOBECOM '95, 1995
The 8-State Ungerboeck (1982) code is modified by the output symbol re-assignment technique with ... more The 8-State Ungerboeck (1982) code is modified by the output symbol re-assignment technique with the Euclidean distance redistribution strategy. Two versions of this new design are presented, one of which performs consistently better than the existing 8-State Ungerboeck code in the Rician and land-mobile satellite fading channels
IEEE/AFCEA EUROCOMM 2000. Information Systems for Enhanced Public Safety and Security (Cat. No.00EX405), 2000
ABSTRACT
Proceedings of Vehicular Technology Conference - VTC, 1996
We study the urban mobile communication channel by the time-varying impulse response h(τ,t) using... more We study the urban mobile communication channel by the time-varying impulse response h(τ,t) using the random linear time-variant (LTV) filter theory. The scattering function, which contains much valuable information of the mobile communication channel, is generated and analyzed in the time-frequency domain. The characteristics of the channel can be efficiently extracted from the scattering function. A measurement-based simulation method and
Proceedings of GLOBECOM '95, 1995
IEEE VTS 53rd Vehicular Technology Conference, Spring 2001. Proceedings (Cat. No.01CH37202), 2001
IEEE International Conference on Communications, 2003. ICC '03., 2003
The 8th International Conference on Communication Systems, 2002. ICCS 2002., 2002
We analyze the inter-carrier-interference (ICI) caused by multiple Rician path and Doppler spread... more We analyze the inter-carrier-interference (ICI) caused by multiple Rician path and Doppler spreading in orthogonal frequency division multiplexing (OFDM) systems. The general form for the ICI power is derived by taking into consideration the correlation equation of the Rician channel. We have derived a equation for estimating the frequency tracking parameter. We also present examples by using ideal beamforming at the mobile. For a multiple Rician channel, it shows that the reduction in ICI power is not as much as in the case of Rayleigh channel. However, in the presence of only one Rician path, the reduction in ICI power is marginally better than the Rayleigh channel.