Anirban Mukhopadhyay | University of Kalyani (original) (raw)
Papers by Anirban Mukhopadhyay
ABSTRACTPancreatic Ductal Adenocarcinoma (PDAC) is the most lethal type of pancreatic cancer (PC)... more ABSTRACTPancreatic Ductal Adenocarcinoma (PDAC) is the most lethal type of pancreatic cancer (PC), late detection of which leads to its therapeutic failure. This study aims to find out key regulatory genes and their impact on the progression of the disease helping the etiology of the disease which is still largely unknown. We leverage the landmark advantages of time-series gene expression data of this disease, and thereby the identified key regulators capture the characteristics of gene activity patterns in the progression of the cancer. We have identified the key modules and predicted gene functions of top genes from the compiled gene association network (GAN). Here, we have used the natural cubic spline regression model (splineTimeR) to identify differentially expressed genes (DEG) from the PDAC microarray time-series data downloaded from gene expression omnibus (GEO). First, we have identified key transcriptomic regulators (TR) and DNA binding transcription factors (DbTF). Subseq...
Procedia Technology, 2013
Clustering ensemble refers to the problem of obtaining a final clustering of some data set from a... more Clustering ensemble refers to the problem of obtaining a final clustering of some data set from a set of input clustering solutions. In this article, the clustering ensemble problem has been modeled as a multiobjective optimization problem and a multiobjective evolutionary algorithm has been used for this purpose. The proposed multiobjective evolutionary clustering ensemble algorithm (MOECEA) evolves a clustering solution from the input clusterings by optimizing two criteria simultaneously. The first objective is to maximize the similarity of the resultant clustering with all the input clusterings, where the similarity between two clustering solutions is computed using adjusted Rand index. The second criteria is to minimize the standard deviation among the similarity scores in order to prevent the evolved clustering solution to be very similar with one of the input clusterings and very dissimilar with the others. The performance of the proposed algorithm has been compared with that of other well-known existing cluster ensemble algorithms for a number of artificial and real-life data sets.
Proceedings of the 12th International Conference on Algorithms in Bioinformatics, Sep 10, 2012
Studies in Computational Intelligence, 2014
The result of one clustering algorithm can be very different from that of another for the same in... more The result of one clustering algorithm can be very different from that of another for the same input dataset as the other input parameters of an algorithm can substantially affect the behavior and execution of the algorithm. Cluster validity indices measure the goodness of a clustering solution. Cluster validation is very important issue in clustering analysis because the result of clustering needs to be validated in most applications. In most clustering algorithms, the number of clusters is set as a user parameter. There are a number of approaches to find the best number of clusters. Validity measures can be used to find the partitioning that best fits the underlying data (to find how good the clustering is). This chapter describes an application (CLUSTER) developed in the Matlab/GUI environment that represents an interface between the user and the results of various clustering algorithms. The user selects algorithm, internal validity index, external validity index, number of clusters, number of iterations etc. from the active windows. In this Package we compare the results of k-means, fuzzy c-means, hierarchical clustering and multiobjective clustering with support vector machine (MocSvm). This chapter presents a MATLAB Graphical User Interface (GUI) that allows the user to easily “find” the goodness of a clustering solution and immediately see the difference of those algorithms graphically. Matlab (R2008a) Graphical User Interface is used to implement this application package.
Lecture Notes in Computer Science, 2009
Abstract. Fuzzy clustering is an important tool for analyzing microar-ray cancer data sets in ord... more Abstract. Fuzzy clustering is an important tool for analyzing microar-ray cancer data sets in order classify the tissue samples. This article describes a real-coded Genetic Algorithm (GA) based fuzzy clustering method that combines with popular Artificial Neural Network (ANN) / ...
2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), 2008
Abstract An important approach to unsupervised pixel classification in remote sensing satellite i... more Abstract An important approach to unsupervised pixel classification in remote sensing satellite imagery is to use clustering in the spectral domain. In this article, a recently proposed multiobjective fuzzy clustering scheme has been combined with artificial neural ...
2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2011
Abstract MicroRNAs (miRNAs) are small non-coding RNAs that have been shown to play important role... more Abstract MicroRNAs (miRNAs) are small non-coding RNAs that have been shown to play important roles in gene regulation and various biological processes. The abnormal expression of some specific miRNAs often results in the development of cancer. In this ...
Multiobjective Genetic Algorithms for Clustering, 2011
Microarray technology has significant impact on cancer research. It is utilized in cancer diagnos... more Microarray technology has significant impact on cancer research. It is utilized in cancer diagnosis by means of classification of tissue samples. When microarray datasets are organized as gene vs. sample, they are very helpful in the classification of different types of tissues and the identification of those genes whose expression levels are good diagnostic indicators. The microarrays where tissue samples represent cancerous (malignant) and non-cancerous (benign) cells, their classification results in a binary cancer classification.
Multiobjective Genetic Algorithms for Clustering, 2011
An important goal in microarray data analysis is to identify sets of genes with similar expressio... more An important goal in microarray data analysis is to identify sets of genes with similar expression profiles. Clustering algorithms have been applied on microarray data either to group the genes across experimental conditions/samples [396, 44, 302, 355, 448] or group the samples across the genes [317, 339, 400, 384]. Clustering techniques, which aim to find the clusters of genes over all experimental conditions, may fail to discover the genes having similar expression patterns over a subset of conditions. Similarly, a clustering algorithm that groups conditions/samples across all the genes may not capture the group of samples having similar expression values for a subset of genes.
IEEE Congress on Evolutionary Computation, 2010
ABSTRACT
2009 IEEE Congress on Evolutionary Computation, 2009
AbstractIn this article, a novel multiobjective variable string length real coded genetic fuzzy ... more AbstractIn this article, a novel multiobjective variable string length real coded genetic fuzzy clustering scheme for clustering microarray gene expression data has been proposed. The proposed technique automatically evolves the number of clusters along with the clustering result. ...
2010 International Conference on Systems in Medicine and Biology, 2010
Abstract Identifying possible viral-host protein-protein interactions is an important and useful ... more Abstract Identifying possible viral-host protein-protein interactions is an important and useful approach in developing new drugs targeting those interactions. In this article, a recently published dataset containing records of interactions between a set of HIV-1 proteins and a ...
Advances in Intelligent Systems and Computing, 2015
It is important to understand the interaction mechanism among co-expressed and co-regulated genes... more It is important to understand the interaction mechanism among co-expressed and co-regulated genes in stem cell to restrict the abnormal growth of cell tissues (tumor) which may lead to cancer. In this article, differentially co-expressed and co-regulated genes exist in normal stem cells and stem cell derived tumors are identified from sample Bone Marrow microarray data. By performing statistical t-test between sample groups, first we have identified differentially expressed genes (DEG). Then up-regulated (UR) and down-regulated (DR) genes are separated by setting a p-value cutoff at 0.001. After identifying the differentially expressed genes, distinguished co-expressed up-regulated and down-regulated genes are found. Subsequently, we have constructed pair-wise co-expression networks with the co-expressed genes. Finally, we have studied the significance of co-expressed genes with gene ontology (GO) and we have found significant GO-ids. This study is expected to lead to finding of pathways for diseases.
Microarray technology facilitates the monitoring of the expression levels of thousands of genes o... more Microarray technology facilitates the monitoring of the expression levels of thousands of genes over different experimental conditions simultaneously. Clustering is a popular data mining tool which can be applied to microarray gene expression data to identify co-expressed genes. Most of the traditional clustering methods optimize a single clustering goodness criterion and thus may not be capable of performing well on all kinds of datasets. Motivated by this, in this article, a multiobjective clustering technique that optimizes cluster compactness and separation simultaneously, has been improved through a novel support vector machine classification based cluster ensemble method. The superiority of MOCSVMEN (MultiObjective Clustering with Support Vector Machine based ENsemble) has been established by comparing its performance with that of several well known existing microarray data clustering algorithms. Two real-life benchmark gene expression datasets have been used for testing the comparative performances of different algorithms. A recently developed metric, called Biological Homogeneity Index (BHI), which computes the clustering goodness with respect to functional annotation, has been used for the comparison purpose.
Algorithms in Bioinformatics, 2012
In an attempt to analyse coexpression in a time series microarray gene expression dataset, we int... more In an attempt to analyse coexpression in a time series microarray gene expression dataset, we introduce here a novel, fast triclustering algorithm δ-TRIMAX that aims to find a group of genes that are coexpressed over a subset of samples across a subset of time-points. Here we defined a novel mean-squared residue score for such 3D dataset. At first it uses a greedy approach to find triclusters that have a meansquared residue score below a threshold δ by deleting nodes from the dataset and then in the next step adds some nodes, keeping the mean squared residue score of the resultant tricluster below δ. So, the goal of our algorithm is to find large and coherent triclusters from the 3D gene expression dataset. Additionally, we have defined an affirmation score to measure the performance of our triclustering algorithm for an artificial dataset. To show biological significance of the triclusters we have conducted GO enrichment analysis. We have also performed enrichment analysis of transcription factor binding sites to establish coregulation of a group of coexpressed genes.
Mathematical Modelling and Scientific Computation, 2012
This article presents how genetic algorithm (GA) method can be efficiently used to the goal progr... more This article presents how genetic algorithm (GA) method can be efficiently used to the goal programming (GP) formulation of Economic-Environmental Power Dispatch (EEPD) problem with target intervals in a power system operation and planning environment.
2011 International Conference on Communication and Industrial Application, 2011
In this paper, a Genetic Algorithm (GA) based fuzzy goal programming (FGP) technique to multiobje... more In this paper, a Genetic Algorithm (GA) based fuzzy goal programming (FGP) technique to multiobjective optimal planning of electric power generation and dispatch problem in power system operation and planning phases is presented. In the proposed approach, fuel cost, environmental emission and voltage deviation objectives of optimal power flow calculation are fuzzily described. In the solution process, the GA method is used in an iterative manner for satisficing the goal levels on the basis of needs and desires of the power system operation and planning perspective. In the GA based solution search process, the conventional Roulette wheel selection scheme, single-point crossover and bit-by-bit mutation are taken into consideration to reach a satisfactory decision. The developed method has been tested on IEEE 6-generator 30-bus System.
Computational Intelligence and Pattern Analysis in Biological Informatics, 2010
Mukhopadhyay, A., Maulik, U. and Bandyopadhyay, S.(2010) Identifying Potential Gene Markers Using... more Mukhopadhyay, A., Maulik, U. and Bandyopadhyay, S.(2010) Identifying Potential Gene Markers Using SVM Classifier Ensemble, in Computational Intelligence and Pattern Analysis in Biological Informatics (eds U. Maulik, S. Bandyopadhyay and JTL Wang), John Wiley & ...
TENCON 2008 - 2008 IEEE Region 10 Conference, 2008
ABSTRACTPancreatic Ductal Adenocarcinoma (PDAC) is the most lethal type of pancreatic cancer (PC)... more ABSTRACTPancreatic Ductal Adenocarcinoma (PDAC) is the most lethal type of pancreatic cancer (PC), late detection of which leads to its therapeutic failure. This study aims to find out key regulatory genes and their impact on the progression of the disease helping the etiology of the disease which is still largely unknown. We leverage the landmark advantages of time-series gene expression data of this disease, and thereby the identified key regulators capture the characteristics of gene activity patterns in the progression of the cancer. We have identified the key modules and predicted gene functions of top genes from the compiled gene association network (GAN). Here, we have used the natural cubic spline regression model (splineTimeR) to identify differentially expressed genes (DEG) from the PDAC microarray time-series data downloaded from gene expression omnibus (GEO). First, we have identified key transcriptomic regulators (TR) and DNA binding transcription factors (DbTF). Subseq...
Procedia Technology, 2013
Clustering ensemble refers to the problem of obtaining a final clustering of some data set from a... more Clustering ensemble refers to the problem of obtaining a final clustering of some data set from a set of input clustering solutions. In this article, the clustering ensemble problem has been modeled as a multiobjective optimization problem and a multiobjective evolutionary algorithm has been used for this purpose. The proposed multiobjective evolutionary clustering ensemble algorithm (MOECEA) evolves a clustering solution from the input clusterings by optimizing two criteria simultaneously. The first objective is to maximize the similarity of the resultant clustering with all the input clusterings, where the similarity between two clustering solutions is computed using adjusted Rand index. The second criteria is to minimize the standard deviation among the similarity scores in order to prevent the evolved clustering solution to be very similar with one of the input clusterings and very dissimilar with the others. The performance of the proposed algorithm has been compared with that of other well-known existing cluster ensemble algorithms for a number of artificial and real-life data sets.
Proceedings of the 12th International Conference on Algorithms in Bioinformatics, Sep 10, 2012
Studies in Computational Intelligence, 2014
The result of one clustering algorithm can be very different from that of another for the same in... more The result of one clustering algorithm can be very different from that of another for the same input dataset as the other input parameters of an algorithm can substantially affect the behavior and execution of the algorithm. Cluster validity indices measure the goodness of a clustering solution. Cluster validation is very important issue in clustering analysis because the result of clustering needs to be validated in most applications. In most clustering algorithms, the number of clusters is set as a user parameter. There are a number of approaches to find the best number of clusters. Validity measures can be used to find the partitioning that best fits the underlying data (to find how good the clustering is). This chapter describes an application (CLUSTER) developed in the Matlab/GUI environment that represents an interface between the user and the results of various clustering algorithms. The user selects algorithm, internal validity index, external validity index, number of clusters, number of iterations etc. from the active windows. In this Package we compare the results of k-means, fuzzy c-means, hierarchical clustering and multiobjective clustering with support vector machine (MocSvm). This chapter presents a MATLAB Graphical User Interface (GUI) that allows the user to easily “find” the goodness of a clustering solution and immediately see the difference of those algorithms graphically. Matlab (R2008a) Graphical User Interface is used to implement this application package.
Lecture Notes in Computer Science, 2009
Abstract. Fuzzy clustering is an important tool for analyzing microar-ray cancer data sets in ord... more Abstract. Fuzzy clustering is an important tool for analyzing microar-ray cancer data sets in order classify the tissue samples. This article describes a real-coded Genetic Algorithm (GA) based fuzzy clustering method that combines with popular Artificial Neural Network (ANN) / ...
2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), 2008
Abstract An important approach to unsupervised pixel classification in remote sensing satellite i... more Abstract An important approach to unsupervised pixel classification in remote sensing satellite imagery is to use clustering in the spectral domain. In this article, a recently proposed multiobjective fuzzy clustering scheme has been combined with artificial neural ...
2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2011
Abstract MicroRNAs (miRNAs) are small non-coding RNAs that have been shown to play important role... more Abstract MicroRNAs (miRNAs) are small non-coding RNAs that have been shown to play important roles in gene regulation and various biological processes. The abnormal expression of some specific miRNAs often results in the development of cancer. In this ...
Multiobjective Genetic Algorithms for Clustering, 2011
Microarray technology has significant impact on cancer research. It is utilized in cancer diagnos... more Microarray technology has significant impact on cancer research. It is utilized in cancer diagnosis by means of classification of tissue samples. When microarray datasets are organized as gene vs. sample, they are very helpful in the classification of different types of tissues and the identification of those genes whose expression levels are good diagnostic indicators. The microarrays where tissue samples represent cancerous (malignant) and non-cancerous (benign) cells, their classification results in a binary cancer classification.
Multiobjective Genetic Algorithms for Clustering, 2011
An important goal in microarray data analysis is to identify sets of genes with similar expressio... more An important goal in microarray data analysis is to identify sets of genes with similar expression profiles. Clustering algorithms have been applied on microarray data either to group the genes across experimental conditions/samples [396, 44, 302, 355, 448] or group the samples across the genes [317, 339, 400, 384]. Clustering techniques, which aim to find the clusters of genes over all experimental conditions, may fail to discover the genes having similar expression patterns over a subset of conditions. Similarly, a clustering algorithm that groups conditions/samples across all the genes may not capture the group of samples having similar expression values for a subset of genes.
IEEE Congress on Evolutionary Computation, 2010
ABSTRACT
2009 IEEE Congress on Evolutionary Computation, 2009
AbstractIn this article, a novel multiobjective variable string length real coded genetic fuzzy ... more AbstractIn this article, a novel multiobjective variable string length real coded genetic fuzzy clustering scheme for clustering microarray gene expression data has been proposed. The proposed technique automatically evolves the number of clusters along with the clustering result. ...
2010 International Conference on Systems in Medicine and Biology, 2010
Abstract Identifying possible viral-host protein-protein interactions is an important and useful ... more Abstract Identifying possible viral-host protein-protein interactions is an important and useful approach in developing new drugs targeting those interactions. In this article, a recently published dataset containing records of interactions between a set of HIV-1 proteins and a ...
Advances in Intelligent Systems and Computing, 2015
It is important to understand the interaction mechanism among co-expressed and co-regulated genes... more It is important to understand the interaction mechanism among co-expressed and co-regulated genes in stem cell to restrict the abnormal growth of cell tissues (tumor) which may lead to cancer. In this article, differentially co-expressed and co-regulated genes exist in normal stem cells and stem cell derived tumors are identified from sample Bone Marrow microarray data. By performing statistical t-test between sample groups, first we have identified differentially expressed genes (DEG). Then up-regulated (UR) and down-regulated (DR) genes are separated by setting a p-value cutoff at 0.001. After identifying the differentially expressed genes, distinguished co-expressed up-regulated and down-regulated genes are found. Subsequently, we have constructed pair-wise co-expression networks with the co-expressed genes. Finally, we have studied the significance of co-expressed genes with gene ontology (GO) and we have found significant GO-ids. This study is expected to lead to finding of pathways for diseases.
Microarray technology facilitates the monitoring of the expression levels of thousands of genes o... more Microarray technology facilitates the monitoring of the expression levels of thousands of genes over different experimental conditions simultaneously. Clustering is a popular data mining tool which can be applied to microarray gene expression data to identify co-expressed genes. Most of the traditional clustering methods optimize a single clustering goodness criterion and thus may not be capable of performing well on all kinds of datasets. Motivated by this, in this article, a multiobjective clustering technique that optimizes cluster compactness and separation simultaneously, has been improved through a novel support vector machine classification based cluster ensemble method. The superiority of MOCSVMEN (MultiObjective Clustering with Support Vector Machine based ENsemble) has been established by comparing its performance with that of several well known existing microarray data clustering algorithms. Two real-life benchmark gene expression datasets have been used for testing the comparative performances of different algorithms. A recently developed metric, called Biological Homogeneity Index (BHI), which computes the clustering goodness with respect to functional annotation, has been used for the comparison purpose.
Algorithms in Bioinformatics, 2012
In an attempt to analyse coexpression in a time series microarray gene expression dataset, we int... more In an attempt to analyse coexpression in a time series microarray gene expression dataset, we introduce here a novel, fast triclustering algorithm δ-TRIMAX that aims to find a group of genes that are coexpressed over a subset of samples across a subset of time-points. Here we defined a novel mean-squared residue score for such 3D dataset. At first it uses a greedy approach to find triclusters that have a meansquared residue score below a threshold δ by deleting nodes from the dataset and then in the next step adds some nodes, keeping the mean squared residue score of the resultant tricluster below δ. So, the goal of our algorithm is to find large and coherent triclusters from the 3D gene expression dataset. Additionally, we have defined an affirmation score to measure the performance of our triclustering algorithm for an artificial dataset. To show biological significance of the triclusters we have conducted GO enrichment analysis. We have also performed enrichment analysis of transcription factor binding sites to establish coregulation of a group of coexpressed genes.
Mathematical Modelling and Scientific Computation, 2012
This article presents how genetic algorithm (GA) method can be efficiently used to the goal progr... more This article presents how genetic algorithm (GA) method can be efficiently used to the goal programming (GP) formulation of Economic-Environmental Power Dispatch (EEPD) problem with target intervals in a power system operation and planning environment.
2011 International Conference on Communication and Industrial Application, 2011
In this paper, a Genetic Algorithm (GA) based fuzzy goal programming (FGP) technique to multiobje... more In this paper, a Genetic Algorithm (GA) based fuzzy goal programming (FGP) technique to multiobjective optimal planning of electric power generation and dispatch problem in power system operation and planning phases is presented. In the proposed approach, fuel cost, environmental emission and voltage deviation objectives of optimal power flow calculation are fuzzily described. In the solution process, the GA method is used in an iterative manner for satisficing the goal levels on the basis of needs and desires of the power system operation and planning perspective. In the GA based solution search process, the conventional Roulette wheel selection scheme, single-point crossover and bit-by-bit mutation are taken into consideration to reach a satisfactory decision. The developed method has been tested on IEEE 6-generator 30-bus System.
Computational Intelligence and Pattern Analysis in Biological Informatics, 2010
Mukhopadhyay, A., Maulik, U. and Bandyopadhyay, S.(2010) Identifying Potential Gene Markers Using... more Mukhopadhyay, A., Maulik, U. and Bandyopadhyay, S.(2010) Identifying Potential Gene Markers Using SVM Classifier Ensemble, in Computational Intelligence and Pattern Analysis in Biological Informatics (eds U. Maulik, S. Bandyopadhyay and JTL Wang), John Wiley & ...
TENCON 2008 - 2008 IEEE Region 10 Conference, 2008