Bichitrananda Patra - Academia.edu (original) (raw)
Uploads
Papers by Bichitrananda Patra
A major challenge in biomedical studies in recent years has been the classification of gene expre... more A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality of the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. This paper provides a comparison between dimension reduction technique, namely Partial Least Squares (PLS)method and a hybrid feature selection scheme, and evaluates the relative performance of four different supervised classification procedures such as Radial Basis Function Network (RBFN), Multilayer Perceptron Network (MLP), Support Vector Machine using Polynomial kernel function(Polynomial-SVM) and Support Vector Machine using RBF kernel function (RBF-SVM) incorporating those methods. Experimental results show that the Partial Least-Squares(PLS) regression method is an appropriate feature selection method and a combined use of different classification and feature selection approaches makes it possible to construct high performance classification models for microarray data.
A novel approach to combining clustering and feature selection is presented. Feature selection fo... more A novel approach to combining clustering and feature selection is presented. Feature selection for clustering is a problem rarely addressed in the literature. Although recently there has been some work on the area, there is a lack of extensive empirical evaluation to assess the potential of each method. It implements a wrapper strategy for feature selection, in the sense that the features are directly selected by optimizing the discriminative power of the used partitioning algorithm. Experiments with real-world datasets demonstrate that our method is able to infer both meaningful partitions and meaningful subsets of features. In this paper, we present a comparative study on four feature selection heuristics by applying them to two sets of data. The first set of data is gene expression profiles from colon biopsy samples and the second set of data are gene expression profiles from ALL/AML data set. Based on features chosen by these methods, error rates of several clustering algorithms were obtained for analysis. Results confirm the utility of feature selection for clustering.
Classification, a data mining task is an effective method to classify the data in the process of ... more Classification, a data mining task is an effective method to classify the data in the process of Knowledge Data Discovery. Classification method algorithms are widely used in medical field to classify the medical data for diagnosis. Feature Selection increases the accuracy of the Classifier because it eliminates irrelevant attributes. This paper analyzes the performance of neural network classifiers with and without feature selection in terms of accuracy and efficiency to build a model on four different datasets. This paper provides rough feature selection scheme, and evaluates the relative performance of four different neural network classification procedures such as Learning Vector Quantisation (LVQ) -LVQ1, LVQ3, optimizedlearning-rate LVQ1 (OLVQ1), and The Self-Organizing Map (SOM) incorporating those methods. Experimental results show that the LVQ3 neural classification is an appropriate classification method makes it possible to construct high performance classification models for microarray data.
Microarray data with reference to gene expression profiles have provided some valuable results re... more Microarray data with reference to gene expression profiles have provided some valuable results related to a variety of problems, and contributed to advances in clinical medicine. Microarray data characteristically have a high dimension and small sample size, which makes it difficult for a general classification method to obtain correct data for classification. However, not every gene is potentially relevant for distinguishing the sample class. Thus, in order to analyze gene expression profiles correctly, feature (gene) selection is crucial for the classification process, and an effective gene extraction method is necessary for eliminating irrelevant genes and decreasing the classification error rate. The purpose of gene expression analysis is to discriminate between classes of samples, and to predict the relative importance of each gene for sample classification. In this paper , irrelevant genes are eliminated in two stages, employing correlation-based feature selection (CFS)as an evaluator and binary particle swarm optimization (BPSO)as a search technique at the first phase and in the second phase of elimination it (CFS+PSO) combines Quick-Reduct algorithm and forms an integrated filter method. Since the data consist of a large number of redundant features, an initial redundancy reduction of the gene is done to enable faster convergence. Then Rough set theory is employed to generate reducts, which represent the minimal sets of non-redundant gene capable of discerning between all objects, in a multi-objective framework. The effectiveness of the proposed approach was verified on two different multi-class microarray datasets using K-nearest neighbour (K-NN), Naïve Bayes( NB) and J48 classifiers with leave-one-out cross-validation (LOOCV) method.
Gene expression data usually contains a large number of genes, but a small number of samples. Fea... more Gene expression data usually contains a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. Classification of tissue samples into tumor or normal is one of the applications of microarray technology. When classifying tissue samples, gene selection plays an important role. In this paper, we propose a two-stage selection algorithm for genomic data by combining some existing statistical gene selection techniques and ROC score of SVM and k-nn classifiers. The motivation for the use of a Support Vector Machine is that DNA microarray problems can be very high dimensional and have very few training data. This type of situation is particularly well suited for an SVM approach. The proposed approach is carried out by first grouping genes with similar expression profiles into distinct clusters, calculating the cluster quality and the discriminative score for each gene by using statistical techniques, and then selecting informative genes from these clusters based on the cluster quality and discriminative score .In the second stage, the effectiveness of this technique is investigated by comparing ROC score of SVM that uses different kernel functions and k-nn classifiers. Then Leave One Out Cross Validation (LOOCV) is used to validate the techniques.
To this paper we have study to Reduce the time Complexity of Earliest Deadline First (EDF), a gl... more To this paper we have study to Reduce the time Complexity of Earliest Deadline First (EDF), a global scheduling scheme for Earliest Deadline First in Real Time System tasks on a Multiprocessors system. Several admission control algorithms
for earliest dead line first are presented, both for hard and soft real time tasks. The average performance of these admission control algorithms is compared with the performance of known partitioning schemes. We have applied some modification to the global earliest deadline first algorithms to decrease the number of task migration and also to add predictability to its behavior. The Aim of this work is to provide a sensitivity analysis for task deadline context of multiprocessor system by using a new approach of EFDF (Earliest Feasible Deadline First) algorithm.
In order to decrease the number of migrations we prevent a job from moving one processor to another processor if it is among them higher priority jobs. Therefore, a job will continue its execution on the same processor if possible (processor affinity). The result of these comparisons outlines some situations where one scheme is preferable over the other. Partitioning schemes are better suited for hard real - time systems, while a global scheme is preferable for soft real - time systems.
A major challenge in biomedical studies in recent years has been the classification of gene expre... more A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality of the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. This paper provides a comparison between dimension reduction technique, namely Partial Least Squares (PLS)method and a hybrid feature selection scheme, and evaluates the relative performance of four different supervised classification procedures such as Radial Basis Function Network (RBFN), Multilayer Perceptron Network (MLP), Support Vector Machine using Polynomial kernel function(Polynomial-SVM) and Support Vector Machine using RBF kernel function (RBF-SVM) incorporating those methods. Experimental results show that the Partial Least-Squares(PLS) regression method is an appropriate feature selection method and a combined use of different classification and feature selection approaches makes it possible to construct high performance classification models for microarray data.
A major challenge in biomedical studies in recent years has been the classification of gene expre... more A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality of the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. This paper provides a comparison between dimension reduction technique, namely Partial Least Squares (PLS)method and a hybrid feature selection scheme, and evaluates the relative performance of four different supervised classification procedures such as Radial Basis Function Network (RBFN), Multilayer Perceptron Network (MLP), Support Vector Machine using Polynomial kernel function(Polynomial-SVM) and Support Vector Machine using RBF kernel function (RBF-SVM) incorporating those methods. Experimental results show that the Partial Least-Squares(PLS) regression method is an appropriate feature selection method and a combined use of different classification and feature selection approaches makes it possible to construct high performance classification models for microarray data.
A novel approach to combining clustering and feature selection is presented. Feature selection fo... more A novel approach to combining clustering and feature selection is presented. Feature selection for clustering is a problem rarely addressed in the literature. Although recently there has been some work on the area, there is a lack of extensive empirical evaluation to assess the potential of each method. It implements a wrapper strategy for feature selection, in the sense that the features are directly selected by optimizing the discriminative power of the used partitioning algorithm. Experiments with real-world datasets demonstrate that our method is able to infer both meaningful partitions and meaningful subsets of features. In this paper, we present a comparative study on four feature selection heuristics by applying them to two sets of data. The first set of data is gene expression profiles from colon biopsy samples and the second set of data are gene expression profiles from ALL/AML data set. Based on features chosen by these methods, error rates of several clustering algorithms were obtained for analysis. Results confirm the utility of feature selection for clustering.
Classification, a data mining task is an effective method to classify the data in the process of ... more Classification, a data mining task is an effective method to classify the data in the process of Knowledge Data Discovery. Classification method algorithms are widely used in medical field to classify the medical data for diagnosis. Feature Selection increases the accuracy of the Classifier because it eliminates irrelevant attributes. This paper analyzes the performance of neural network classifiers with and without feature selection in terms of accuracy and efficiency to build a model on four different datasets. This paper provides rough feature selection scheme, and evaluates the relative performance of four different neural network classification procedures such as Learning Vector Quantisation (LVQ) -LVQ1, LVQ3, optimizedlearning-rate LVQ1 (OLVQ1), and The Self-Organizing Map (SOM) incorporating those methods. Experimental results show that the LVQ3 neural classification is an appropriate classification method makes it possible to construct high performance classification models for microarray data.
Microarray data with reference to gene expression profiles have provided some valuable results re... more Microarray data with reference to gene expression profiles have provided some valuable results related to a variety of problems, and contributed to advances in clinical medicine. Microarray data characteristically have a high dimension and small sample size, which makes it difficult for a general classification method to obtain correct data for classification. However, not every gene is potentially relevant for distinguishing the sample class. Thus, in order to analyze gene expression profiles correctly, feature (gene) selection is crucial for the classification process, and an effective gene extraction method is necessary for eliminating irrelevant genes and decreasing the classification error rate. The purpose of gene expression analysis is to discriminate between classes of samples, and to predict the relative importance of each gene for sample classification. In this paper , irrelevant genes are eliminated in two stages, employing correlation-based feature selection (CFS)as an evaluator and binary particle swarm optimization (BPSO)as a search technique at the first phase and in the second phase of elimination it (CFS+PSO) combines Quick-Reduct algorithm and forms an integrated filter method. Since the data consist of a large number of redundant features, an initial redundancy reduction of the gene is done to enable faster convergence. Then Rough set theory is employed to generate reducts, which represent the minimal sets of non-redundant gene capable of discerning between all objects, in a multi-objective framework. The effectiveness of the proposed approach was verified on two different multi-class microarray datasets using K-nearest neighbour (K-NN), Naïve Bayes( NB) and J48 classifiers with leave-one-out cross-validation (LOOCV) method.
Gene expression data usually contains a large number of genes, but a small number of samples. Fea... more Gene expression data usually contains a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. Classification of tissue samples into tumor or normal is one of the applications of microarray technology. When classifying tissue samples, gene selection plays an important role. In this paper, we propose a two-stage selection algorithm for genomic data by combining some existing statistical gene selection techniques and ROC score of SVM and k-nn classifiers. The motivation for the use of a Support Vector Machine is that DNA microarray problems can be very high dimensional and have very few training data. This type of situation is particularly well suited for an SVM approach. The proposed approach is carried out by first grouping genes with similar expression profiles into distinct clusters, calculating the cluster quality and the discriminative score for each gene by using statistical techniques, and then selecting informative genes from these clusters based on the cluster quality and discriminative score .In the second stage, the effectiveness of this technique is investigated by comparing ROC score of SVM that uses different kernel functions and k-nn classifiers. Then Leave One Out Cross Validation (LOOCV) is used to validate the techniques.
To this paper we have study to Reduce the time Complexity of Earliest Deadline First (EDF), a gl... more To this paper we have study to Reduce the time Complexity of Earliest Deadline First (EDF), a global scheduling scheme for Earliest Deadline First in Real Time System tasks on a Multiprocessors system. Several admission control algorithms
for earliest dead line first are presented, both for hard and soft real time tasks. The average performance of these admission control algorithms is compared with the performance of known partitioning schemes. We have applied some modification to the global earliest deadline first algorithms to decrease the number of task migration and also to add predictability to its behavior. The Aim of this work is to provide a sensitivity analysis for task deadline context of multiprocessor system by using a new approach of EFDF (Earliest Feasible Deadline First) algorithm.
In order to decrease the number of migrations we prevent a job from moving one processor to another processor if it is among them higher priority jobs. Therefore, a job will continue its execution on the same processor if possible (processor affinity). The result of these comparisons outlines some situations where one scheme is preferable over the other. Partitioning schemes are better suited for hard real - time systems, while a global scheme is preferable for soft real - time systems.
A major challenge in biomedical studies in recent years has been the classification of gene expre... more A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality of the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. This paper provides a comparison between dimension reduction technique, namely Partial Least Squares (PLS)method and a hybrid feature selection scheme, and evaluates the relative performance of four different supervised classification procedures such as Radial Basis Function Network (RBFN), Multilayer Perceptron Network (MLP), Support Vector Machine using Polynomial kernel function(Polynomial-SVM) and Support Vector Machine using RBF kernel function (RBF-SVM) incorporating those methods. Experimental results show that the Partial Least-Squares(PLS) regression method is an appropriate feature selection method and a combined use of different classification and feature selection approaches makes it possible to construct high performance classification models for microarray data.