Vimal Kumar Dubey - Academia.edu (original) (raw)

Volume 3 Issue 4 by Vimal Kumar Dubey

Research paper thumbnail of A Survey on Feature Selection Algorithms

Citation/Export MLA Dr. Amit Kumar Saxena, Vimal Kumar Dubey, “A Survey on Feature Selection Alg... more Citation/Export
MLA

Dr. Amit Kumar Saxena, Vimal Kumar Dubey, “A Survey on Feature Selection Algorithms”, April 15 Volume 3 Issue 4 , International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), ISSN: 2321-8169, PP: 1895 - 1899, DOI: 10.17762/ijritcc2321-8169.160431
APA

Dr. Amit Kumar Saxena, Vimal Kumar Dubey, April 15 Volume 3 Issue 4, “A Survey on Feature Selection Algorithms”, International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), ISSN: 2321-8169, PP: 1895 - 1899, DOI: 10.17762/ijritcc2321-8169.160431

Papers by Vimal Kumar Dubey

Research paper thumbnail of Cosine similarity based filter technique for feature selection

2016 International Conference on Control, Computing, Communication and Materials (ICCCCM)

Filter-based feature selection techniques are less complex compare to Wrapper-based feature selec... more Filter-based feature selection techniques are less complex compare to Wrapper-based feature selection techniques in case of High Dimensional datasets. In this paper, we proposed a filter method feature selection, which is Cosine Similarity-based Filter feature selection Technique (CSF) for High-Dimensional Datasets. In this method, absolute cosine similarity with respect to class label is used to ordering the features and from ordered features list a user-defined number of features is selected. Dataset with selected features is tested for classification accuracy using Multi-classifier system (K-Nearest Neighbor (KNN), Classification and Regression Tree (CART), Naive Bayes (NB) and Support Vector Machine (SVM)). This method is applied to four high-dimensional binary class datasets and obtained accuracy shows that method is either better or equivalent compared to other existing methods.

Research paper thumbnail of Hybrid feature selection methods for high-dimensional multi-class datasets

International Journal of Data Mining, Modelling and Management

Hybrid methods are very important for feature selection in case of the classification of high-dim... more Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.

Research paper thumbnail of A Cosine-Similarity Mutual-Information Approach for Feature Selection on High Dimensional Datasets

Journal of Information Technology Research

A novel hybrid method based on Cosine Similarity and Mutual Information is presented to find out ... more A novel hybrid method based on Cosine Similarity and Mutual Information is presented to find out relevant feature subset. Initially, the supervised Cosine Similarity of each feature is calculated with respect to the class vector and then features are grouped based on the obtained cosine similarity values. From each group the best mutual informative feature is selected. The selected features subset is tested using the three classifiers namely Naïve Bayes (NB), K-Nearest Neighbor and Classification and Regression trees (CART) for getting classification accuracy. The proposed method is applied to various high dimensional datasets. Obtained results showed that the proposed method is capable of eliminating the redundant and irrelevant features.

Research paper thumbnail of Hybrid classification model of correlation-based feature selection and support vector machine

2016 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), 2016

In this paper, we propose a hybrid classification model, which has correlation based filter featu... more In this paper, we propose a hybrid classification model, which has correlation based filter feature selection algorithm and support vector machine as a classifier. In this method, features are ordered according to their Absolute correlation value with respect to the class attribute. Then top K Features are selected from ordered list of features to form a reduced dataset. The classification accuracy is measured using SVM classifiers with and without extending features of the reduced dataset. This proposed classifier model is applied to five high-dimensional binary class datasets. It is observed that the proposed method yields higher classification accuracies in the case of three out of five high dimensional datasets with a reasonably small number of features.

Research paper thumbnail of A sequential cosine similarity based feature selection technique for high dimensional datasets

2015 39th National Systems Conference (NSC), 2015

Due to day to day use of information processing in society, the size of the databases has become ... more Due to day to day use of information processing in society, the size of the databases has become tremendously high. It has been realized that most of the times, all parameters (called features precisely here) are not required to decide the outcome (or decision) of an instance. Therefore feature selection is an important step in data processing. In this paper, a novel method is presented to select features. In the method, cosine similarity of individual feature of the database with the respective class is computed and kept in an array in descending order. The first feature of this array is combined with rest of the features sequentially one by one. If the classification accuracy of the combination of features increases then the combination is accepted otherwise the responsible features are eliminated from the combination. In this manner all features are tested and a final subset of features is obtained. The results obtained after rigorous experiments on the proposed method on high dimensional databases and comparing with other methods reported so far are encouraging. It is therefore recommended that the proposed method can be applied for high dimensional data processing.

Research paper thumbnail of A Survey on Feature Selection Algorithms

International Journal on Recent and Innovation Trends in Computing and Communication, 2015

One major component of machine learning is feature analysis which comprises of mainly two process... more One major component of machine learning is feature analysis which comprises of mainly two processes: feature selection and feature extraction. Due to its applications in several areas including data mining, soft computing and big data analysis, feature selection has got a reasonable importance. This paper presents an introductory concept of feature selection with various inherent approaches. The paper surveys historic developments reported in feature selection with supervised and unsupervised methods. The recent developments with the state of the art in the on-going feature selection algorithms have also been summarized in the paper including their hybridizations.

Research paper thumbnail of A cluster-filter feature selection approach

2016 International Conference on ICT in Business Industry & Government (ICTBIG), 2016

In this paper, a feature selection method is presented for the multiclass data sets. This method ... more In this paper, a feature selection method is presented for the multiclass data sets. This method is the hybridization of k-means clustering using cosine similarity as a distance measure and information Gain. In the method unsupervised Cosine Similarity is used for grouping of features i.e. K-means clustering is used to make a cluster of features and then information gain is employed to select a most relevant feature from each cluster. The dataset with the selected feature is tested for classification accuracy with cross - validation approach. Three classifiers namely Naïve Bayes (NB), K-Nearest Neighbor and Classification and Regression trees (CART) has been used as the base classifiers for getting classification accuracy. Obtained results are compared with filter-based feature selection technique (Information Gain).

Research paper thumbnail of The Performance of Turbo Coding Over Power-Controlled Fading Channel in Ka-Band LEO Satellite Systems

Vehicular Technology Ieee Transactions on, Jul 1, 2003

In this paper, the integration of power control and turbo coding is adopted to achieve reliable c... more In this paper, the integration of power control and turbo coding is adopted to achieve reliable communications over Ka-band code-division multiple-access (CDMA)-based low earth orbit (LEO) satellite systems. The effect of imperfect power control on the bit error ratio (BER) performance is analyzed, and the upper bounds on BER are also derived for the case of slow and fast imperfect power control. The analytical and simulation results show that power-control error (PCE) degrades the BER performance of turbo-decoded systems significantly in the waterfall region. In the region of error floor, the degradation effect slows down and the BER obtained under imperfect power control eventually converges with the BER under perfect power control. Moreover, in the waterfall region, the correlation characteristics of the PCE fluctuation do not degrade the BER performance as anticipated.

Research paper thumbnail of Optical fiber system for video and telemetry signal transmission

Proceedings of Ieee Singapore International Conference on Networks International Conference on Information Engineering 93, Sep 6, 1993

Research paper thumbnail of Performance analysis for downlink MC-CDMA systems with space-time block codes in frequency-selective Rayleigh fading channels

Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003

Research paper thumbnail of Geometric channel model for multipath propagation in city-street grids

Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003

Research paper thumbnail of The accuracy of geometric channel modeling methods

Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003

Research paper thumbnail of Expression of coat protein gene of Cucumber mosaic virus (CMV-subgroup IA) Gladiolus isolate in Nicotiana tabacum

Journal of Plant Interactions, 2015

Research paper thumbnail of Modified 8-state Ungerboeck code for band-limited satellite fading channels

Proceedings of GLOBECOM '95, 1995

The 8-State Ungerboeck (1982) code is modified by the output symbol re-assignment technique with ... more The 8-State Ungerboeck (1982) code is modified by the output symbol re-assignment technique with the Euclidean distance redistribution strategy. Two versions of this new design are presented, one of which performs consistently better than the existing 8-State Ungerboeck code in the Rician and land-mobile satellite fading channels

Research paper thumbnail of Performance of frequency and time domain coded OFDM over fast fading LEO channels

IEEE/AFCEA EUROCOMM 2000. Information Systems for Enhanced Public Safety and Security (Cat. No.00EX405), 2000

ABSTRACT

Research paper thumbnail of Generation of scattering functions by computer simulation for mobile communication channels

Proceedings of Vehicular Technology Conference - VTC, 1996

We study the urban mobile communication channel by the time-varying impulse response h(τ,t) using... more We study the urban mobile communication channel by the time-varying impulse response h(τ,t) using the random linear time-variant (LTV) filter theory. The scattering function, which contains much valuable information of the mobile communication channel, is generated and analyzed in the time-frequency domain. The characteristics of the channel can be efficiently extracted from the scattering function. A measurement-based simulation method and

Research paper thumbnail of Performance of pitch synchronous multi-band (PSMB) speech coder with error-correction coding

Proceedings of GLOBECOM '95, 1995

Research paper thumbnail of The integration of power control and turbo coding in Ka-band CDMA based LEO satellite system

IEEE VTS 53rd Vehicular Technology Conference, Spring 2001. Proceedings (Cat. No.01CH37202), 2001

Research paper thumbnail of Application of angular diversity in OFDM systems

IEEE International Conference on Communications, 2003. ICC '03., 2003

Research paper thumbnail of A Survey on Feature Selection Algorithms

Citation/Export MLA Dr. Amit Kumar Saxena, Vimal Kumar Dubey, “A Survey on Feature Selection Alg... more Citation/Export
MLA

Dr. Amit Kumar Saxena, Vimal Kumar Dubey, “A Survey on Feature Selection Algorithms”, April 15 Volume 3 Issue 4 , International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), ISSN: 2321-8169, PP: 1895 - 1899, DOI: 10.17762/ijritcc2321-8169.160431
APA

Dr. Amit Kumar Saxena, Vimal Kumar Dubey, April 15 Volume 3 Issue 4, “A Survey on Feature Selection Algorithms”, International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), ISSN: 2321-8169, PP: 1895 - 1899, DOI: 10.17762/ijritcc2321-8169.160431

Research paper thumbnail of Cosine similarity based filter technique for feature selection

2016 International Conference on Control, Computing, Communication and Materials (ICCCCM)

Filter-based feature selection techniques are less complex compare to Wrapper-based feature selec... more Filter-based feature selection techniques are less complex compare to Wrapper-based feature selection techniques in case of High Dimensional datasets. In this paper, we proposed a filter method feature selection, which is Cosine Similarity-based Filter feature selection Technique (CSF) for High-Dimensional Datasets. In this method, absolute cosine similarity with respect to class label is used to ordering the features and from ordered features list a user-defined number of features is selected. Dataset with selected features is tested for classification accuracy using Multi-classifier system (K-Nearest Neighbor (KNN), Classification and Regression Tree (CART), Naive Bayes (NB) and Support Vector Machine (SVM)). This method is applied to four high-dimensional binary class datasets and obtained accuracy shows that method is either better or equivalent compared to other existing methods.

Research paper thumbnail of Hybrid feature selection methods for high-dimensional multi-class datasets

International Journal of Data Mining, Modelling and Management

Hybrid methods are very important for feature selection in case of the classification of high-dim... more Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.

Research paper thumbnail of A Cosine-Similarity Mutual-Information Approach for Feature Selection on High Dimensional Datasets

Journal of Information Technology Research

A novel hybrid method based on Cosine Similarity and Mutual Information is presented to find out ... more A novel hybrid method based on Cosine Similarity and Mutual Information is presented to find out relevant feature subset. Initially, the supervised Cosine Similarity of each feature is calculated with respect to the class vector and then features are grouped based on the obtained cosine similarity values. From each group the best mutual informative feature is selected. The selected features subset is tested using the three classifiers namely Naïve Bayes (NB), K-Nearest Neighbor and Classification and Regression trees (CART) for getting classification accuracy. The proposed method is applied to various high dimensional datasets. Obtained results showed that the proposed method is capable of eliminating the redundant and irrelevant features.

Research paper thumbnail of Hybrid classification model of correlation-based feature selection and support vector machine

2016 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), 2016

In this paper, we propose a hybrid classification model, which has correlation based filter featu... more In this paper, we propose a hybrid classification model, which has correlation based filter feature selection algorithm and support vector machine as a classifier. In this method, features are ordered according to their Absolute correlation value with respect to the class attribute. Then top K Features are selected from ordered list of features to form a reduced dataset. The classification accuracy is measured using SVM classifiers with and without extending features of the reduced dataset. This proposed classifier model is applied to five high-dimensional binary class datasets. It is observed that the proposed method yields higher classification accuracies in the case of three out of five high dimensional datasets with a reasonably small number of features.

Research paper thumbnail of A sequential cosine similarity based feature selection technique for high dimensional datasets

2015 39th National Systems Conference (NSC), 2015

Due to day to day use of information processing in society, the size of the databases has become ... more Due to day to day use of information processing in society, the size of the databases has become tremendously high. It has been realized that most of the times, all parameters (called features precisely here) are not required to decide the outcome (or decision) of an instance. Therefore feature selection is an important step in data processing. In this paper, a novel method is presented to select features. In the method, cosine similarity of individual feature of the database with the respective class is computed and kept in an array in descending order. The first feature of this array is combined with rest of the features sequentially one by one. If the classification accuracy of the combination of features increases then the combination is accepted otherwise the responsible features are eliminated from the combination. In this manner all features are tested and a final subset of features is obtained. The results obtained after rigorous experiments on the proposed method on high dimensional databases and comparing with other methods reported so far are encouraging. It is therefore recommended that the proposed method can be applied for high dimensional data processing.

Research paper thumbnail of A Survey on Feature Selection Algorithms

International Journal on Recent and Innovation Trends in Computing and Communication, 2015

One major component of machine learning is feature analysis which comprises of mainly two process... more One major component of machine learning is feature analysis which comprises of mainly two processes: feature selection and feature extraction. Due to its applications in several areas including data mining, soft computing and big data analysis, feature selection has got a reasonable importance. This paper presents an introductory concept of feature selection with various inherent approaches. The paper surveys historic developments reported in feature selection with supervised and unsupervised methods. The recent developments with the state of the art in the on-going feature selection algorithms have also been summarized in the paper including their hybridizations.

Research paper thumbnail of A cluster-filter feature selection approach

2016 International Conference on ICT in Business Industry & Government (ICTBIG), 2016

In this paper, a feature selection method is presented for the multiclass data sets. This method ... more In this paper, a feature selection method is presented for the multiclass data sets. This method is the hybridization of k-means clustering using cosine similarity as a distance measure and information Gain. In the method unsupervised Cosine Similarity is used for grouping of features i.e. K-means clustering is used to make a cluster of features and then information gain is employed to select a most relevant feature from each cluster. The dataset with the selected feature is tested for classification accuracy with cross - validation approach. Three classifiers namely Naïve Bayes (NB), K-Nearest Neighbor and Classification and Regression trees (CART) has been used as the base classifiers for getting classification accuracy. Obtained results are compared with filter-based feature selection technique (Information Gain).

Research paper thumbnail of The Performance of Turbo Coding Over Power-Controlled Fading Channel in Ka-Band LEO Satellite Systems

Vehicular Technology Ieee Transactions on, Jul 1, 2003

In this paper, the integration of power control and turbo coding is adopted to achieve reliable c... more In this paper, the integration of power control and turbo coding is adopted to achieve reliable communications over Ka-band code-division multiple-access (CDMA)-based low earth orbit (LEO) satellite systems. The effect of imperfect power control on the bit error ratio (BER) performance is analyzed, and the upper bounds on BER are also derived for the case of slow and fast imperfect power control. The analytical and simulation results show that power-control error (PCE) degrades the BER performance of turbo-decoded systems significantly in the waterfall region. In the region of error floor, the degradation effect slows down and the BER obtained under imperfect power control eventually converges with the BER under perfect power control. Moreover, in the waterfall region, the correlation characteristics of the PCE fluctuation do not degrade the BER performance as anticipated.

Research paper thumbnail of Optical fiber system for video and telemetry signal transmission

Proceedings of Ieee Singapore International Conference on Networks International Conference on Information Engineering 93, Sep 6, 1993

Research paper thumbnail of Performance analysis for downlink MC-CDMA systems with space-time block codes in frequency-selective Rayleigh fading channels

Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003

Research paper thumbnail of Geometric channel model for multipath propagation in city-street grids

Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003

Research paper thumbnail of The accuracy of geometric channel modeling methods

Fourth International Conference on Information Communications and Signal Processing 2003 and the Fourth Pacific Rim Conference on Multimedia Proceedings of the 2003 Joint, 2003

Research paper thumbnail of Expression of coat protein gene of Cucumber mosaic virus (CMV-subgroup IA) Gladiolus isolate in Nicotiana tabacum

Journal of Plant Interactions, 2015

Research paper thumbnail of Modified 8-state Ungerboeck code for band-limited satellite fading channels

Proceedings of GLOBECOM '95, 1995

The 8-State Ungerboeck (1982) code is modified by the output symbol re-assignment technique with ... more The 8-State Ungerboeck (1982) code is modified by the output symbol re-assignment technique with the Euclidean distance redistribution strategy. Two versions of this new design are presented, one of which performs consistently better than the existing 8-State Ungerboeck code in the Rician and land-mobile satellite fading channels

Research paper thumbnail of Performance of frequency and time domain coded OFDM over fast fading LEO channels

IEEE/AFCEA EUROCOMM 2000. Information Systems for Enhanced Public Safety and Security (Cat. No.00EX405), 2000

ABSTRACT

Research paper thumbnail of Generation of scattering functions by computer simulation for mobile communication channels

Proceedings of Vehicular Technology Conference - VTC, 1996

We study the urban mobile communication channel by the time-varying impulse response h(τ,t) using... more We study the urban mobile communication channel by the time-varying impulse response h(τ,t) using the random linear time-variant (LTV) filter theory. The scattering function, which contains much valuable information of the mobile communication channel, is generated and analyzed in the time-frequency domain. The characteristics of the channel can be efficiently extracted from the scattering function. A measurement-based simulation method and

Research paper thumbnail of Performance of pitch synchronous multi-band (PSMB) speech coder with error-correction coding

Proceedings of GLOBECOM '95, 1995

Research paper thumbnail of The integration of power control and turbo coding in Ka-band CDMA based LEO satellite system

IEEE VTS 53rd Vehicular Technology Conference, Spring 2001. Proceedings (Cat. No.01CH37202), 2001

Research paper thumbnail of Application of angular diversity in OFDM systems

IEEE International Conference on Communications, 2003. ICC '03., 2003

Research paper thumbnail of Effect of employing beamforming on OFDM systems in Rician channel

The 8th International Conference on Communication Systems, 2002. ICCS 2002., 2002

We analyze the inter-carrier-interference (ICI) caused by multiple Rician path and Doppler spread... more We analyze the inter-carrier-interference (ICI) caused by multiple Rician path and Doppler spreading in orthogonal frequency division multiplexing (OFDM) systems. The general form for the ICI power is derived by taking into consideration the correlation equation of the Rician channel. We have derived a equation for estimating the frequency tracking parameter. We also present examples by using ideal beamforming at the mobile. For a multiple Rician channel, it shows that the reduction in ICI power is not as much as in the case of Rayleigh channel. However, in the presence of only one Rician path, the reduction in ICI power is marginally better than the Rayleigh channel.