Abdul Nazeer K A | National Institute of Technology, Calicut (original) (raw)
Uploads
Papers by Abdul Nazeer K A
2019 IEEE 9th International Conference on Advanced Computing (IACC)
Recent research results show that ontology can be used to improve the accuracy of document cluste... more Recent research results show that ontology can be used to improve the accuracy of document clustering. Previous studies mainly focused on the preprocessing part of text document using ontology. In this paper, we propose a hybrid approach, concentrating on both the preprocessing task as well as the clustering algorithm. This is with an objective of reducing the number of features and execution time, eliminate synonymous problems and enhance the accuracy of clustering. Cosine similarity is used as similarity measure. The preprocessing part uses a WordNet Ontology based feature extraction method. In clustering, the initial centroids are found by applying the Red Black Tree based sorting method. The data points are allocated to the suitable clusters using a novel approach, by maintaining the path of similarity between data points and nearest cluster centroids. Experimental results on some of the existing clustering algorithms with cosine similarity are compared with our novel clustering technique. Results show that the proposed hybrid approach executes better on the Newsgroup dataset with considerable improvements in dimensionality reduction, running time, and accuracy.
INTERNATIONAL JOURNAL OF MANAGEMENT & INFORMATION TECHNOLOGY
The need of high quality clustering is very important in the modern era of information processing... more The need of high quality clustering is very important in the modern era of information processing. Clustering is one of the most important data analysis methods and the k-means clustering is commonly used for diverse applications. Despite its simplicity and ease of implementation, the k-means algorithm is computationally expensive and the quality of clusters is determined by the random choice of initial centroids. Different methods were proposed for improving the accuracy and efficiency of the k-means algorithm. In this paper, we propose a new approach that improves the accuracy of clustering microarray based gene expression data sets. In the proposed method, the initial centroids are determined by using the Red Black Tree and an improved heuristic approach is used to assign the data items to the nearest centroids. Experimental results show that the proposed algorithm performs better than other existing algorithms.
Proceedings of the Seventh International Conference on Mathematics and Computing, 2022
2020 6th IEEE Congress on Information Science and Technology (CiSt), 2020
Recursive neural networks (Tree-RNNs) based on dependency trees are ubiquitous in modeling senten... more Recursive neural networks (Tree-RNNs) based on dependency trees are ubiquitous in modeling sentence meanings as they effectively capture semantic relationships between nonneighborhood words. However, recognizing semantically dissimilar sentences with the same words and syntax is still a challenge to Tree-RNNs. This work proposes an improvement to Dependency Tree-RNN (DT-RNN) using the grammatical relationship type identified in the dependency parse. Our experiments on semantic relatedness scoring (SRS) and recognizing textual entailment (RTE) in sentence pairs using SICK (Sentence Involving Compositional Knowledge) dataset show encouraging results. The model achieved a 2% improvement in classification accuracy for the RTE task over the DT-RNN model. The results show that Pearson's and Spearman's correlation measures between the model's predicted similarity scores and human ratings are higher than those of standard DT-RNNs.
Cancer subtype discovery fromomicsdata requires techniques to estimate the number of natural clus... more Cancer subtype discovery fromomicsdata requires techniques to estimate the number of natural clusters in the data. Automatically estimating the number of clusters has been a challenging problem in Machine Learning. Using clustering algorithms together with internal cluster validity indexes have been a popular method of estimating the number of clusters in biomolecular data. We propose a Hierarchical Agglomerative Clustering algorithm, namedSilHAC, which can automatically estimate the number of natural clusters and can find the associated clustering solution.SilHACis parameterless. We also present two hybrids ofSilHACwithSpectral ClusteringandK-Meansrespectively as components.SilHACand the hybrids could find reasonable estimates for the number of clusters and the associated clustering solution when applied to a collection of cancer gene expression datasets. The proposed methods are better alternatives to the ‘clustering algorithm - internal cluster validity index’ pipelines for estim...
Computational biology and chemistry, 2018
Recently, differential DNA Methylation is known to affect the regulatory mechanism of biological ... more Recently, differential DNA Methylation is known to affect the regulatory mechanism of biological pathways. A pathway encompasses a set of interacting genes or gene products that altogether perform a given biological function. Pathways often encode strong methylation signatures that are capable of distinguishing biologically distinct subtypes. Even though Next Generation Sequencing techniques such as MeDIP-seq and MBD-isolated genome sequencing (MiGS) allow for genome-wide identification of clinical and biological subtypes, there is a pressing need for computational methods to compare epigenetic signatures across pathways. A novel alignment method, called DEEPAligner (Deep Encoded Epigenetic Pathway Aligner), is proposed in this paper that finds functionally consistent and topologically sound alignments of epigenetic signatures from pathway networks. A deep embedding framework is used to obtain epigenetic signatures from pathways which are then aligned for functional consistency and ...
Computers in Biology and Medicine, 2016
Identification of pathways that show significant difference in activity between disease and contr... more Identification of pathways that show significant difference in activity between disease and control samples have been an interesting topic of research for over a decade. Pathways so identified serve as potential indicators of aberrations in phenotype or a disease condition. Recently, epigenetic mechanisms such as DNA methylation are known to play an important role in altering the regulatory mechanism of biological pathways. It is reasonable to think that a set of genes that show significant difference in expression and methylation interact together to form a network of pathways. Existing pathway identification methods fail to capture the complex interplay between interacting pathways. This paper proposes a novel framework to identify biological pathways that are dysregulated by epigenetic mechanisms. Experiments on four benchmark cancer datasets and comparison with state-of-the-art pathway identification methods reveal the effectiveness of the proposed approach. The proposed framework incorporates both topology and biological relationships of pathways. Comparison with state-of-the-art techniques reveals promising results. Epigenetic signatures identified from pathway interaction networks can help to advance Molecular Pathological Epidemiology (MPE) research efforts by predicting tumor molecular changes.
Proceedings of the World Congress on …, 2009
Abstract Emergence of modern techniques for scientific data collection has resulted in large sca... more Abstract Emergence of modern techniques for scientific data collection has resulted in large scale accumulation of data per-taining to diverse fields. Conventional database querying methods are inadequate to extract useful information from huge data banks. Cluster analysis is one ...
2014 IEEE 4th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), 2014
Motivation: The identification of new therapeutic uses of existing drugs, or drug repositioning, ... more Motivation: The identification of new therapeutic uses of existing drugs, or drug repositioning, offers the possibility of faster drug development, reduced risk, lesser cost and shorter paths to approval. The advent of high throughput microarray technology has enabled comprehensive monitoring of transcriptional response associated with various disease states and drug treatments. This data can be used to characterize disease and drug effects and thereby give a measure of the association between a given drug and a disease. Several computational methods have been proposed in the literature that make use of publicly available transcriptional data to reposition drugs against diseases. Method: In this work, we carry out a data mining process using publicly available gene expression data sets associated with a few diseases and drugs, to identify the existing drugs that can be used to treat genes causing lung cancer and breast cancer. Results: Three strong candidates for repurposing have been identified-Letrozole and GDC-0941 against lung cancer, and Ribavirin against breast cancer. Letrozole and GDC-0941 are drugs currently used in breast cancer treatment and Ribavirin is used in the treatment of Hepatitis C.
Bioinformation, 2013
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biol... more Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k-¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.
2019 IEEE 9th International Conference on Advanced Computing (IACC)
Recent research results show that ontology can be used to improve the accuracy of document cluste... more Recent research results show that ontology can be used to improve the accuracy of document clustering. Previous studies mainly focused on the preprocessing part of text document using ontology. In this paper, we propose a hybrid approach, concentrating on both the preprocessing task as well as the clustering algorithm. This is with an objective of reducing the number of features and execution time, eliminate synonymous problems and enhance the accuracy of clustering. Cosine similarity is used as similarity measure. The preprocessing part uses a WordNet Ontology based feature extraction method. In clustering, the initial centroids are found by applying the Red Black Tree based sorting method. The data points are allocated to the suitable clusters using a novel approach, by maintaining the path of similarity between data points and nearest cluster centroids. Experimental results on some of the existing clustering algorithms with cosine similarity are compared with our novel clustering technique. Results show that the proposed hybrid approach executes better on the Newsgroup dataset with considerable improvements in dimensionality reduction, running time, and accuracy.
INTERNATIONAL JOURNAL OF MANAGEMENT & INFORMATION TECHNOLOGY
The need of high quality clustering is very important in the modern era of information processing... more The need of high quality clustering is very important in the modern era of information processing. Clustering is one of the most important data analysis methods and the k-means clustering is commonly used for diverse applications. Despite its simplicity and ease of implementation, the k-means algorithm is computationally expensive and the quality of clusters is determined by the random choice of initial centroids. Different methods were proposed for improving the accuracy and efficiency of the k-means algorithm. In this paper, we propose a new approach that improves the accuracy of clustering microarray based gene expression data sets. In the proposed method, the initial centroids are determined by using the Red Black Tree and an improved heuristic approach is used to assign the data items to the nearest centroids. Experimental results show that the proposed algorithm performs better than other existing algorithms.
Proceedings of the Seventh International Conference on Mathematics and Computing, 2022
2020 6th IEEE Congress on Information Science and Technology (CiSt), 2020
Recursive neural networks (Tree-RNNs) based on dependency trees are ubiquitous in modeling senten... more Recursive neural networks (Tree-RNNs) based on dependency trees are ubiquitous in modeling sentence meanings as they effectively capture semantic relationships between nonneighborhood words. However, recognizing semantically dissimilar sentences with the same words and syntax is still a challenge to Tree-RNNs. This work proposes an improvement to Dependency Tree-RNN (DT-RNN) using the grammatical relationship type identified in the dependency parse. Our experiments on semantic relatedness scoring (SRS) and recognizing textual entailment (RTE) in sentence pairs using SICK (Sentence Involving Compositional Knowledge) dataset show encouraging results. The model achieved a 2% improvement in classification accuracy for the RTE task over the DT-RNN model. The results show that Pearson's and Spearman's correlation measures between the model's predicted similarity scores and human ratings are higher than those of standard DT-RNNs.
Cancer subtype discovery fromomicsdata requires techniques to estimate the number of natural clus... more Cancer subtype discovery fromomicsdata requires techniques to estimate the number of natural clusters in the data. Automatically estimating the number of clusters has been a challenging problem in Machine Learning. Using clustering algorithms together with internal cluster validity indexes have been a popular method of estimating the number of clusters in biomolecular data. We propose a Hierarchical Agglomerative Clustering algorithm, namedSilHAC, which can automatically estimate the number of natural clusters and can find the associated clustering solution.SilHACis parameterless. We also present two hybrids ofSilHACwithSpectral ClusteringandK-Meansrespectively as components.SilHACand the hybrids could find reasonable estimates for the number of clusters and the associated clustering solution when applied to a collection of cancer gene expression datasets. The proposed methods are better alternatives to the ‘clustering algorithm - internal cluster validity index’ pipelines for estim...
Computational biology and chemistry, 2018
Recently, differential DNA Methylation is known to affect the regulatory mechanism of biological ... more Recently, differential DNA Methylation is known to affect the regulatory mechanism of biological pathways. A pathway encompasses a set of interacting genes or gene products that altogether perform a given biological function. Pathways often encode strong methylation signatures that are capable of distinguishing biologically distinct subtypes. Even though Next Generation Sequencing techniques such as MeDIP-seq and MBD-isolated genome sequencing (MiGS) allow for genome-wide identification of clinical and biological subtypes, there is a pressing need for computational methods to compare epigenetic signatures across pathways. A novel alignment method, called DEEPAligner (Deep Encoded Epigenetic Pathway Aligner), is proposed in this paper that finds functionally consistent and topologically sound alignments of epigenetic signatures from pathway networks. A deep embedding framework is used to obtain epigenetic signatures from pathways which are then aligned for functional consistency and ...
Computers in Biology and Medicine, 2016
Identification of pathways that show significant difference in activity between disease and contr... more Identification of pathways that show significant difference in activity between disease and control samples have been an interesting topic of research for over a decade. Pathways so identified serve as potential indicators of aberrations in phenotype or a disease condition. Recently, epigenetic mechanisms such as DNA methylation are known to play an important role in altering the regulatory mechanism of biological pathways. It is reasonable to think that a set of genes that show significant difference in expression and methylation interact together to form a network of pathways. Existing pathway identification methods fail to capture the complex interplay between interacting pathways. This paper proposes a novel framework to identify biological pathways that are dysregulated by epigenetic mechanisms. Experiments on four benchmark cancer datasets and comparison with state-of-the-art pathway identification methods reveal the effectiveness of the proposed approach. The proposed framework incorporates both topology and biological relationships of pathways. Comparison with state-of-the-art techniques reveals promising results. Epigenetic signatures identified from pathway interaction networks can help to advance Molecular Pathological Epidemiology (MPE) research efforts by predicting tumor molecular changes.
Proceedings of the World Congress on …, 2009
Abstract Emergence of modern techniques for scientific data collection has resulted in large sca... more Abstract Emergence of modern techniques for scientific data collection has resulted in large scale accumulation of data per-taining to diverse fields. Conventional database querying methods are inadequate to extract useful information from huge data banks. Cluster analysis is one ...
2014 IEEE 4th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), 2014
Motivation: The identification of new therapeutic uses of existing drugs, or drug repositioning, ... more Motivation: The identification of new therapeutic uses of existing drugs, or drug repositioning, offers the possibility of faster drug development, reduced risk, lesser cost and shorter paths to approval. The advent of high throughput microarray technology has enabled comprehensive monitoring of transcriptional response associated with various disease states and drug treatments. This data can be used to characterize disease and drug effects and thereby give a measure of the association between a given drug and a disease. Several computational methods have been proposed in the literature that make use of publicly available transcriptional data to reposition drugs against diseases. Method: In this work, we carry out a data mining process using publicly available gene expression data sets associated with a few diseases and drugs, to identify the existing drugs that can be used to treat genes causing lung cancer and breast cancer. Results: Three strong candidates for repurposing have been identified-Letrozole and GDC-0941 against lung cancer, and Ribavirin against breast cancer. Letrozole and GDC-0941 are drugs currently used in breast cancer treatment and Ribavirin is used in the treatment of Hepatitis C.
Bioinformation, 2013
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biol... more Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k-¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.