vishan gupta - Academia.edu (original) (raw)

Papers by vishan gupta

Research paper thumbnail of Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model

Big Data Mining and Analytics, 2021

A novel coronavirus (SARS-CoV-2) is an unusual viral pneumonia in patients, first found in late D... more A novel coronavirus (SARS-CoV-2) is an unusual viral pneumonia in patients, first found in late December 2019, latter it declared a pandemic by World Health Organizations because of its fatal effects on public health. In this present, cases of COVID-19 pandemic are exponentially increasing day by day in the whole world. Here, we are detecting the COVID-19 cases, i.e., confirmed, death, and cured cases in India only. We are performing this analysis based on the cases occurring in different states of India in chronological dates. Our dataset contains multiple classes so we are performing multi-class classification. On this dataset, first, we performed data cleansing and feature selection, then performed forecasting of all classes using random forest, linear model, support vector machine, decision tree, and neural network, where random forest model outperformed the others, therefore, the random forest is used for prediction and analysis of all the results. The K-fold cross-validation is performed to measure the consistency of the model.

Research paper thumbnail of OUP accepted manuscript

The Computer Journal, 2020

The in-silico toxicity prediction techniques are useful to reduce rodents testing (in-vivo). Auth... more The in-silico toxicity prediction techniques are useful to reduce rodents testing (in-vivo). Authors have proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their various physicochemical properties (molecular descriptors), which can bind to the antioxidant response elements (AREs). The software PaDEL-Descriptor is used for extracting the different features of drug molecules. The ARE data set has total 7439 drug molecules, of which 1147 are active and 6292 are inactive, and each drug molecule contains 1444 features. We have proposed a novel ensemble-based model that can efficiently classify active (binding) and inactive (non-binding) compounds of the data set. Initially, we performed feature selection using random forest importance algorithm in R, and subsequently, we have resolved the class imbalance issue by ensemble learning method itself, where we divided the data set into five data frames, which have an almost equal number of ac...

Research paper thumbnail of Activity Assessment of Small Drug Molecules in Estrogen Receptor using Multilevel Prediction Model

IET Systems Biology, 2019

The authors have proposed an efficient multilevel prediction model for better activity assessment... more The authors have proposed an efficient multilevel prediction model for better activity assessment to test whether certain chemical compounds can disrupt processes in the human body that may create negative health effects. Here, a computational method (in-silico) is proposed for the quality prediction of drugs in terms of their activity, activity score, potency, and efficacy for estrogen receptors (ERs) by using various physicochemical properties (molecular descriptors). PaDEL-Descriptor is used for features extraction. The ER dataset has 8481 drug molecules where 1084 are active, and 7397 are inactive, and each drug molecule has 1444 features. This dataset is highly imbalanced and has a substantial number of features. Initially, a class imbalance problem is resolved through synthetic minority oversampling technique algorithm, and feature selection is done using FSelector library of R. A machine learning based multilevel prediction model is developed where classification is performed on its first level and regression on its second level. By using all these strategies simultaneously, outperformed accuracy is achieved in comparison to many other computational approaches. The K-fold cross-validation is performed to measure the consistency of the model for all the target classes. Finally, the validity of the proposed method on some AIDS therapy's drug molecules is proved.

Research paper thumbnail of Survey paper on various techniques of recognition and tracking

2015 International Conference on Advances in Computer Engineering and Applications, 2015

This paper describes different methods used for identification and tracking of player and license... more This paper describes different methods used for identification and tracking of player and license plate. Although object (Player and license plate) recognition have projected many challenges because the algorithms did not give correct results. But time to time, several techniques are discovered to identify the object. In this survey paper, we describe and compare many techniques of identification through table and bar graph.

Research paper thumbnail of Diagnosis of Breast Cancer using Decision Tree Data Mining Technique

International Journal of Computer Applications, 2014

Cancer is a big issue all around the world. It is a disease, which is fatal in many cases and has... more Cancer is a big issue all around the world. It is a disease, which is fatal in many cases and has affected the lives of many and will continue to affect the lives of many more. Breast cancer represents the second primary cause of cancer deaths in women today and has become the most common cancer among women both in the developed and the developing world in the last years. 40,000 women die in a year from this disease, which is one woman every 13 minute dying from this disease everyday. Early detection of breast cancer is far easier to cure. This paper presents a decision tree based data mining technique for early detection of breast cancer. Breast cancer diagnosis differentiates benign (lacks ability to invade neighboring tissue) from malignant (ability to invade neighboring tissue) breast tumors. This paper also discusses various data mining approaches that have been utilized for breast cancer diagnosis, and also summarizes breast cancer in general (types, risk factors, symptoms and treatment).

Research paper thumbnail of Toxicity prediction of small drug molecules of aryl hydrocarbon receptor using a proposed ensemble model

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES, 2019

Quantitative structure-activity relationships and quantitative structure-property relationships h... more Quantitative structure-activity relationships and quantitative structure-property relationships have proved their usefulness for predicting toxicities of drug molecules regarding their biological activities. In silico toxicity prediction techniques are essential for reducing testing on rodents (in vivo) and for a less time-consuming and more cost-efficient alternative for the identification of toxic effects at an early stage of drug development. The authors aim to build a prediction model for better assessment of toxicity to quickly and efficiently test whether certain chemical compounds have the potential to disrupt the processes in the human body that may adversely affect human health. Here, we have proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their various physicochemical properties (molecular descriptors) that can bind to the aryl hydrocarbon receptor. Pharmaceutical data exploration laboratory software is used for extracting the features of drug molecules. The dataset of the aryl hydrocarbon receptor contains 9008 drug molecules, where 1063 are active and 7945 are inactive, and each drug molecule contains 1444 features. It is a novel prediction model based on ensemble learning that can efficiently classify active (binding) and inactive (nonbinding) compounds of the dataset. In our proposed ensemble model, we primarily performed feature selection using the Boruta library in R, after which we resolved the class imbalance problem itself by ensemble learning where we divided the dataset into seven data frames, which have approximately equal numbers of active and inactive drug molecules. An ensemble model based upon the votes of seven random forest models is proposed, which gives an accuracy of 93.76%. K-fold cross-validation is conducted to measure the consistency of the model. Finally, the validity of the proposed ensemble model for some drug molecules of acquired immune deficiency syndrome therapy and androgen receptor has been proved.

Research paper thumbnail of Toxicity Prediction of Pre-Clinical Trial Drugs using Physicochemical Properties and Computational Intelligence Approaches

Research paper thumbnail of Toxicity prediction of small drug molecules of androgen receptor using multilevel ensemble model

Journal of bioinformatics and computational biology, 2019

In this study, efforts are created to develop a quantitative structure-activity relationship (QSA... more In this study, efforts are created to develop a quantitative structure-activity relationship (QSAR)-based model, which are used for the prediction of toxicities to reduce testing in animals, time, and money in the early stages of drug development. An efficient machine learning model is developed to predict the toxicity of those drug molecules which binds to the androgen receptor (AR). Toxicity prediction is performed in terms of their activity, activity score, potency, and efficacy by using various physicochemical properties. A multilevel ensemble model is proposed, where its first level is performed ensemble-based classification of activity, and the second level is performed ensemble-based regression of activity score, potency, and efficacy of only those drug molecules which have been found active during the classification level. The AR dataset has 10,273 drug molecules where 461 are active, and 9812 are inactive, and each drug molecule has 1444 features. Therefore, our dataset is ...

Research paper thumbnail of Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model

Big Data Mining and Analytics, 2021

A novel coronavirus (SARS-CoV-2) is an unusual viral pneumonia in patients, first found in late D... more A novel coronavirus (SARS-CoV-2) is an unusual viral pneumonia in patients, first found in late December 2019, latter it declared a pandemic by World Health Organizations because of its fatal effects on public health. In this present, cases of COVID-19 pandemic are exponentially increasing day by day in the whole world. Here, we are detecting the COVID-19 cases, i.e., confirmed, death, and cured cases in India only. We are performing this analysis based on the cases occurring in different states of India in chronological dates. Our dataset contains multiple classes so we are performing multi-class classification. On this dataset, first, we performed data cleansing and feature selection, then performed forecasting of all classes using random forest, linear model, support vector machine, decision tree, and neural network, where random forest model outperformed the others, therefore, the random forest is used for prediction and analysis of all the results. The K-fold cross-validation is performed to measure the consistency of the model.

Research paper thumbnail of OUP accepted manuscript

The Computer Journal, 2020

The in-silico toxicity prediction techniques are useful to reduce rodents testing (in-vivo). Auth... more The in-silico toxicity prediction techniques are useful to reduce rodents testing (in-vivo). Authors have proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their various physicochemical properties (molecular descriptors), which can bind to the antioxidant response elements (AREs). The software PaDEL-Descriptor is used for extracting the different features of drug molecules. The ARE data set has total 7439 drug molecules, of which 1147 are active and 6292 are inactive, and each drug molecule contains 1444 features. We have proposed a novel ensemble-based model that can efficiently classify active (binding) and inactive (non-binding) compounds of the data set. Initially, we performed feature selection using random forest importance algorithm in R, and subsequently, we have resolved the class imbalance issue by ensemble learning method itself, where we divided the data set into five data frames, which have an almost equal number of ac...

Research paper thumbnail of Activity Assessment of Small Drug Molecules in Estrogen Receptor using Multilevel Prediction Model

IET Systems Biology, 2019

The authors have proposed an efficient multilevel prediction model for better activity assessment... more The authors have proposed an efficient multilevel prediction model for better activity assessment to test whether certain chemical compounds can disrupt processes in the human body that may create negative health effects. Here, a computational method (in-silico) is proposed for the quality prediction of drugs in terms of their activity, activity score, potency, and efficacy for estrogen receptors (ERs) by using various physicochemical properties (molecular descriptors). PaDEL-Descriptor is used for features extraction. The ER dataset has 8481 drug molecules where 1084 are active, and 7397 are inactive, and each drug molecule has 1444 features. This dataset is highly imbalanced and has a substantial number of features. Initially, a class imbalance problem is resolved through synthetic minority oversampling technique algorithm, and feature selection is done using FSelector library of R. A machine learning based multilevel prediction model is developed where classification is performed on its first level and regression on its second level. By using all these strategies simultaneously, outperformed accuracy is achieved in comparison to many other computational approaches. The K-fold cross-validation is performed to measure the consistency of the model for all the target classes. Finally, the validity of the proposed method on some AIDS therapy's drug molecules is proved.

Research paper thumbnail of Survey paper on various techniques of recognition and tracking

2015 International Conference on Advances in Computer Engineering and Applications, 2015

This paper describes different methods used for identification and tracking of player and license... more This paper describes different methods used for identification and tracking of player and license plate. Although object (Player and license plate) recognition have projected many challenges because the algorithms did not give correct results. But time to time, several techniques are discovered to identify the object. In this survey paper, we describe and compare many techniques of identification through table and bar graph.

Research paper thumbnail of Diagnosis of Breast Cancer using Decision Tree Data Mining Technique

International Journal of Computer Applications, 2014

Cancer is a big issue all around the world. It is a disease, which is fatal in many cases and has... more Cancer is a big issue all around the world. It is a disease, which is fatal in many cases and has affected the lives of many and will continue to affect the lives of many more. Breast cancer represents the second primary cause of cancer deaths in women today and has become the most common cancer among women both in the developed and the developing world in the last years. 40,000 women die in a year from this disease, which is one woman every 13 minute dying from this disease everyday. Early detection of breast cancer is far easier to cure. This paper presents a decision tree based data mining technique for early detection of breast cancer. Breast cancer diagnosis differentiates benign (lacks ability to invade neighboring tissue) from malignant (ability to invade neighboring tissue) breast tumors. This paper also discusses various data mining approaches that have been utilized for breast cancer diagnosis, and also summarizes breast cancer in general (types, risk factors, symptoms and treatment).

Research paper thumbnail of Toxicity prediction of small drug molecules of aryl hydrocarbon receptor using a proposed ensemble model

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES, 2019

Quantitative structure-activity relationships and quantitative structure-property relationships h... more Quantitative structure-activity relationships and quantitative structure-property relationships have proved their usefulness for predicting toxicities of drug molecules regarding their biological activities. In silico toxicity prediction techniques are essential for reducing testing on rodents (in vivo) and for a less time-consuming and more cost-efficient alternative for the identification of toxic effects at an early stage of drug development. The authors aim to build a prediction model for better assessment of toxicity to quickly and efficiently test whether certain chemical compounds have the potential to disrupt the processes in the human body that may adversely affect human health. Here, we have proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their various physicochemical properties (molecular descriptors) that can bind to the aryl hydrocarbon receptor. Pharmaceutical data exploration laboratory software is used for extracting the features of drug molecules. The dataset of the aryl hydrocarbon receptor contains 9008 drug molecules, where 1063 are active and 7945 are inactive, and each drug molecule contains 1444 features. It is a novel prediction model based on ensemble learning that can efficiently classify active (binding) and inactive (nonbinding) compounds of the dataset. In our proposed ensemble model, we primarily performed feature selection using the Boruta library in R, after which we resolved the class imbalance problem itself by ensemble learning where we divided the dataset into seven data frames, which have approximately equal numbers of active and inactive drug molecules. An ensemble model based upon the votes of seven random forest models is proposed, which gives an accuracy of 93.76%. K-fold cross-validation is conducted to measure the consistency of the model. Finally, the validity of the proposed ensemble model for some drug molecules of acquired immune deficiency syndrome therapy and androgen receptor has been proved.

Research paper thumbnail of Toxicity Prediction of Pre-Clinical Trial Drugs using Physicochemical Properties and Computational Intelligence Approaches

Research paper thumbnail of Toxicity prediction of small drug molecules of androgen receptor using multilevel ensemble model

Journal of bioinformatics and computational biology, 2019

In this study, efforts are created to develop a quantitative structure-activity relationship (QSA... more In this study, efforts are created to develop a quantitative structure-activity relationship (QSAR)-based model, which are used for the prediction of toxicities to reduce testing in animals, time, and money in the early stages of drug development. An efficient machine learning model is developed to predict the toxicity of those drug molecules which binds to the androgen receptor (AR). Toxicity prediction is performed in terms of their activity, activity score, potency, and efficacy by using various physicochemical properties. A multilevel ensemble model is proposed, where its first level is performed ensemble-based classification of activity, and the second level is performed ensemble-based regression of activity score, potency, and efficacy of only those drug molecules which have been found active during the classification level. The AR dataset has 10,273 drug molecules where 461 are active, and 9812 are inactive, and each drug molecule has 1444 features. Therefore, our dataset is ...