Aditya Dubey CSEIT | Itm University (original) (raw)
Papers by Aditya Dubey CSEIT
2022 IEEE World Conference on Applied Intelligence and Computing (AIC)
Advanced Theory and Simulations
2022 IEEE World Conference on Applied Intelligence and Computing (AIC)
2019 International Conference on Communication and Signal Processing (ICCSP), 2019
Lecture Notes in Electrical Engineering
Scientific Reports, 2021
For most bioinformatics statistical methods, particularly for gene expression data classification... more For most bioinformatics statistical methods, particularly for gene expression data classification, prognosis, and prediction, a complete dataset is required. The gene sample value can be missing due to hardware failure, software failure, or manual mistakes. The missing data in gene expression research dramatically affects the analysis of the collected data. Consequently, this has become a critical problem that requires an efficient imputation algorithm to resolve the issue. This paper proposed a technique considering the local similarity structure that predicts the missing data using clustering and top K nearest neighbor approaches for imputing the missing value. A similarity-based spectral clustering approach is used that is combined with the K-means. The spectral clustering parameters, cluster size, and weighting factors are optimized, and after that, missing values are predicted. For imputing each cluster’s missing value, the top K nearest neighbor approach utilizes the concept o...
2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 2021
2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 2021
With the advent of novel coronavirus pandemic doctors, health workers, and the government too, ar... more With the advent of novel coronavirus pandemic doctors, health workers, and the government too, are trying their best of their capacity to deal with contemporary situations. It is genuine that when a person’s close one is lost, they will react vociferously but accusing the doctors and workers and harming them is also morally indignant as the person saving so many lives his/her own life is in danger. With the boom of technology and how the world has come so close on social media, many social media users are expressing their views in either the support or opposition of the saviors of this pandemic, the doctors and the health care workers. These views of people are enough to create a good or bad impression of any doctor in minds of people and can even create a hostile behavior for that doctor by others, analyzing the stand of the person towards the ongoing violent situation towards workers using a multimodal emotional analysis combining both visual and textual data. This paper uses a Multimodal Transformer model which combines both visual and textual data is the sole purpose of this paper. Apart from the main aim, the paper will also explain whether in social media more information has been carried out by a text or more information can spread through images posted on social media. The paper will explain the use of appropriate loss function for imbalanced data also.
Today’s era is based on IT technologies, so data storage is increasing day by day. Result of that... more Today’s era is based on IT technologies, so data storage is increasing day by day. Result of that big amount of data stored in databases and warehouses. Therefore the Data mining becomes popular to explore and analyze the databases for finding the the interesting and unknown patterns and rules known as association rule mining. Association rule mining is one of the essential tasks of descriptive technique which can be found meaningful patterns from big collection of data. Mining frequent item set is basic principle of association rule mining. Many algorithms have been proposed from last many years including Efficient Mining of Frequent Item Sets on Large Uncertain Databases. An efficient Approach for the implementation of FP Tree computes the minimum-support for mining frequent patterns. Now a day, various techniques face the problem of data redundancy, candidate generation, memory consumption problem (FP-tree Algorithms) and other frequent patterns problem. Because of retailer indus...
2022 International Conference for Advancement in Technology (ICONAT), 2022
2022 International Conference for Advancement in Technology (ICONAT), 2022
In the era of big data, a significant amount of data is produced in many applications areas. Howe... more In the era of big data, a significant amount of data is produced in many applications areas. However due to various reasons including sensor failures, communication failures, environmental disruptions, and human errors, missing values are found frequently These missing data in the observed data make a challenge for other data mining approaches, requiring the missed data to be handled at the preprocessing stage of data mining. Several approaches for handling the missing data have been proposed in the past. These approaches consider the whole dataset for making a prediction, making the whole imputation approach to be cumbersome. This paper proposes the procedure which makes use of the local similarity structure of the dataset for making an Imputation. The K-means clustering technique along with the weighted KNN makes efficient imputation of the missed value. The results are compared against imputations by mean substitution and Fuzzy C Means (FCM). The proposed imputation technique sho...
International Journal of Advanced Computer Science and Applications, 2021
An outlier is a data observation that is considerably irregular from the rest of the dataset. The... more An outlier is a data observation that is considerably irregular from the rest of the dataset. The outlier present in the dataset may cause the integrity of the dataset. Implementing machine learning techniques in various real-world applications and applying those techniques to the healthcare-related dataset will completely change the particular field's present scenario. These applications can highlight the physiological data having anomalous behavior, which can ultimately lead to a fast and necessary response and help to gather more critical knowledge about the particular area. However, a broad amount of study is available about the performance of anomaly detection techniques applied to popular public datasets. But then again, have a minimal amount of analytical work on various supervised and unsupervised methods considering any physiological datasets. The breast cancer dataset is both a universal and numeric dataset. This paper utilized and analyzed four machine learning techniques and their capacity to distinguish anomalies in the breast cancer dataset.
2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), 2021
Missing values create issues during the analysis of the dataset. Learning algorithms in an asymme... more Missing values create issues during the analysis of the dataset. Learning algorithms in an asymmetrical dataset can generate an overrated classification accuracy due to a bias towards the majority class at the expense of the minority class. Missing values in the dataset have a negative impact on the imputation of accuracy; therefore, it could lead to a different output. Some algorithms cannot handle missing values properly, while some techniques give efficient results to estimate the missing values. It is very important to handle missing data because many machine learning algorithm performances reduces due to missing values. It might be possible that the original datasets have some missing data for many factors like data were not kept in a file, data had been corrupted, etc. In this paper, some techniques are discussed which are employed to impute the missing data, and these techniques are compared using their merits and drawbacks.
2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), 2021
Through semi-supervised learning with graphs, the machine learning community has achieved many ad... more Through semi-supervised learning with graphs, the machine learning community has achieved many advantages in extracting information from a large volume of data under inadequate initial label information. Recent research has shown the benefit of weighing the samples that are labelled can give improved accuracy. Instead of providing similar consideration for labelled samples, sample weighting establishes higher weights for samples occupied at the border of multiple classes than labelled samples occupied so far from the boundary. This article proposes a faster way to calculate the sample weights by reducing the multiple clustering methods to single clustering. The new method of sample weighting is verified using a 2D feature set so that sample weighting can be easily visualized. The proposed method does not reduce the time complexity but it can reduce the number of steps required for weighting the samples. The obtained results have shown that this method can improve the speed with acceptable accuracy.
2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 2019
Communications in Computer and Information Science, 2020
International Journal of Engineering Trends and Technology, 2021
International Journal of Advanced Computer Science, Jan 29, 2015
Clustering algorithms are one of the ways of extracting the valuable information apart from a lar... more Clustering algorithms are one of the ways of extracting the valuable information apart from a large database by partitioning them. All of these clustering algorithms have their main goal that is to find clusters by maximizing the similarity in intra clusters and reducing the similarity between different clusters. Besides of their main goal, all of these algorithms work on different problem domain. In this paper, two algorithms K-means and spectral clustering algorithm are described. Both algorithm are tested and evaluated on different applications driven dataset. For calculating the efficiency of the clustering algorithm, silhouette index is used. Performance and accuracy of both the clustering algorithm are presented and compared by using validity index.
2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT), 2020
IOT is a mainstream technology of today’s technology world and has transformed recently with a si... more IOT is a mainstream technology of today’s technology world and has transformed recently with a significant potential for modernizing the lifestyle of modern societies today. Since the term IOT was devised in the year 1999 by Kevin Ashton there is a tremendous increase in the countable devices bridged to the internet in the last few decades. This paper provides a literature survey on various cloud services used in integration with IOT and a study about fog computing used in various application areas. Due to seamless data exchange between IOT devices and sensors, there was a need of platforms for data management, storage, and analysis. Thus cloud computing and fog computing are often used interchangeably for providing high storage capacities and processing capabilities. Different types of IOT based cloud platforms are discussed in this paper depending on their applicability along with their pros and cons precisely.
2022 IEEE World Conference on Applied Intelligence and Computing (AIC)
Advanced Theory and Simulations
2022 IEEE World Conference on Applied Intelligence and Computing (AIC)
2019 International Conference on Communication and Signal Processing (ICCSP), 2019
Lecture Notes in Electrical Engineering
Scientific Reports, 2021
For most bioinformatics statistical methods, particularly for gene expression data classification... more For most bioinformatics statistical methods, particularly for gene expression data classification, prognosis, and prediction, a complete dataset is required. The gene sample value can be missing due to hardware failure, software failure, or manual mistakes. The missing data in gene expression research dramatically affects the analysis of the collected data. Consequently, this has become a critical problem that requires an efficient imputation algorithm to resolve the issue. This paper proposed a technique considering the local similarity structure that predicts the missing data using clustering and top K nearest neighbor approaches for imputing the missing value. A similarity-based spectral clustering approach is used that is combined with the K-means. The spectral clustering parameters, cluster size, and weighting factors are optimized, and after that, missing values are predicted. For imputing each cluster’s missing value, the top K nearest neighbor approach utilizes the concept o...
2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 2021
2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 2021
With the advent of novel coronavirus pandemic doctors, health workers, and the government too, ar... more With the advent of novel coronavirus pandemic doctors, health workers, and the government too, are trying their best of their capacity to deal with contemporary situations. It is genuine that when a person’s close one is lost, they will react vociferously but accusing the doctors and workers and harming them is also morally indignant as the person saving so many lives his/her own life is in danger. With the boom of technology and how the world has come so close on social media, many social media users are expressing their views in either the support or opposition of the saviors of this pandemic, the doctors and the health care workers. These views of people are enough to create a good or bad impression of any doctor in minds of people and can even create a hostile behavior for that doctor by others, analyzing the stand of the person towards the ongoing violent situation towards workers using a multimodal emotional analysis combining both visual and textual data. This paper uses a Multimodal Transformer model which combines both visual and textual data is the sole purpose of this paper. Apart from the main aim, the paper will also explain whether in social media more information has been carried out by a text or more information can spread through images posted on social media. The paper will explain the use of appropriate loss function for imbalanced data also.
Today’s era is based on IT technologies, so data storage is increasing day by day. Result of that... more Today’s era is based on IT technologies, so data storage is increasing day by day. Result of that big amount of data stored in databases and warehouses. Therefore the Data mining becomes popular to explore and analyze the databases for finding the the interesting and unknown patterns and rules known as association rule mining. Association rule mining is one of the essential tasks of descriptive technique which can be found meaningful patterns from big collection of data. Mining frequent item set is basic principle of association rule mining. Many algorithms have been proposed from last many years including Efficient Mining of Frequent Item Sets on Large Uncertain Databases. An efficient Approach for the implementation of FP Tree computes the minimum-support for mining frequent patterns. Now a day, various techniques face the problem of data redundancy, candidate generation, memory consumption problem (FP-tree Algorithms) and other frequent patterns problem. Because of retailer indus...
2022 International Conference for Advancement in Technology (ICONAT), 2022
2022 International Conference for Advancement in Technology (ICONAT), 2022
In the era of big data, a significant amount of data is produced in many applications areas. Howe... more In the era of big data, a significant amount of data is produced in many applications areas. However due to various reasons including sensor failures, communication failures, environmental disruptions, and human errors, missing values are found frequently These missing data in the observed data make a challenge for other data mining approaches, requiring the missed data to be handled at the preprocessing stage of data mining. Several approaches for handling the missing data have been proposed in the past. These approaches consider the whole dataset for making a prediction, making the whole imputation approach to be cumbersome. This paper proposes the procedure which makes use of the local similarity structure of the dataset for making an Imputation. The K-means clustering technique along with the weighted KNN makes efficient imputation of the missed value. The results are compared against imputations by mean substitution and Fuzzy C Means (FCM). The proposed imputation technique sho...
International Journal of Advanced Computer Science and Applications, 2021
An outlier is a data observation that is considerably irregular from the rest of the dataset. The... more An outlier is a data observation that is considerably irregular from the rest of the dataset. The outlier present in the dataset may cause the integrity of the dataset. Implementing machine learning techniques in various real-world applications and applying those techniques to the healthcare-related dataset will completely change the particular field's present scenario. These applications can highlight the physiological data having anomalous behavior, which can ultimately lead to a fast and necessary response and help to gather more critical knowledge about the particular area. However, a broad amount of study is available about the performance of anomaly detection techniques applied to popular public datasets. But then again, have a minimal amount of analytical work on various supervised and unsupervised methods considering any physiological datasets. The breast cancer dataset is both a universal and numeric dataset. This paper utilized and analyzed four machine learning techniques and their capacity to distinguish anomalies in the breast cancer dataset.
2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), 2021
Missing values create issues during the analysis of the dataset. Learning algorithms in an asymme... more Missing values create issues during the analysis of the dataset. Learning algorithms in an asymmetrical dataset can generate an overrated classification accuracy due to a bias towards the majority class at the expense of the minority class. Missing values in the dataset have a negative impact on the imputation of accuracy; therefore, it could lead to a different output. Some algorithms cannot handle missing values properly, while some techniques give efficient results to estimate the missing values. It is very important to handle missing data because many machine learning algorithm performances reduces due to missing values. It might be possible that the original datasets have some missing data for many factors like data were not kept in a file, data had been corrupted, etc. In this paper, some techniques are discussed which are employed to impute the missing data, and these techniques are compared using their merits and drawbacks.
2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), 2021
Through semi-supervised learning with graphs, the machine learning community has achieved many ad... more Through semi-supervised learning with graphs, the machine learning community has achieved many advantages in extracting information from a large volume of data under inadequate initial label information. Recent research has shown the benefit of weighing the samples that are labelled can give improved accuracy. Instead of providing similar consideration for labelled samples, sample weighting establishes higher weights for samples occupied at the border of multiple classes than labelled samples occupied so far from the boundary. This article proposes a faster way to calculate the sample weights by reducing the multiple clustering methods to single clustering. The new method of sample weighting is verified using a 2D feature set so that sample weighting can be easily visualized. The proposed method does not reduce the time complexity but it can reduce the number of steps required for weighting the samples. The obtained results have shown that this method can improve the speed with acceptable accuracy.
2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 2019
Communications in Computer and Information Science, 2020
International Journal of Engineering Trends and Technology, 2021
International Journal of Advanced Computer Science, Jan 29, 2015
Clustering algorithms are one of the ways of extracting the valuable information apart from a lar... more Clustering algorithms are one of the ways of extracting the valuable information apart from a large database by partitioning them. All of these clustering algorithms have their main goal that is to find clusters by maximizing the similarity in intra clusters and reducing the similarity between different clusters. Besides of their main goal, all of these algorithms work on different problem domain. In this paper, two algorithms K-means and spectral clustering algorithm are described. Both algorithm are tested and evaluated on different applications driven dataset. For calculating the efficiency of the clustering algorithm, silhouette index is used. Performance and accuracy of both the clustering algorithm are presented and compared by using validity index.
2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT), 2020
IOT is a mainstream technology of today’s technology world and has transformed recently with a si... more IOT is a mainstream technology of today’s technology world and has transformed recently with a significant potential for modernizing the lifestyle of modern societies today. Since the term IOT was devised in the year 1999 by Kevin Ashton there is a tremendous increase in the countable devices bridged to the internet in the last few decades. This paper provides a literature survey on various cloud services used in integration with IOT and a study about fog computing used in various application areas. Due to seamless data exchange between IOT devices and sensors, there was a need of platforms for data management, storage, and analysis. Thus cloud computing and fog computing are often used interchangeably for providing high storage capacities and processing capabilities. Different types of IOT based cloud platforms are discussed in this paper depending on their applicability along with their pros and cons precisely.