Neetu Sardana - Academia.edu (original) (raw)

Papers by Neetu Sardana

Research paper thumbnail of MSD-Apriori: Discovering borderline-rare items using association mining

2017 Tenth International Conference on Contemporary Computing (IC3), 2017

Research paper thumbnail of Naïve Bayes Approach for Predicting Missing Links in Ego Networks

2016 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS), 2016

Research paper thumbnail of Empirical Analysis of Ensemble Machine Learning Techniques for Bug Triaging

2019 Twelfth International Conference on Contemporary Computing (IC3), 2019

Bug reports are an inescapable part of the software product framework. Nowadays, software advance... more Bug reports are an inescapable part of the software product framework. Nowadays, software advancements have led to the creation of beta versions of software in order to assemble the bug reports from clients. The assembled bug reports are then handled by software developers to make consequent software more reliable as well as robust. However, high recurrence of approaching bug reports forges the process of bug fixing to be a troublesome & tedious process. Bug triaging is an essential component of issue handling process and it deals with the selection of a suitable software developer for handling of reported bug such that the assigned developer is able to fix the reported issue. In the literature, different semi and fully mechanized procedures are proposed to facilitate the endeavor of developer selection in bug repositories. These techniques use historically fixed information from bug repositories to classify any new incoming bugs. In the recent years, ensemble-based classification techniques have gained popularity. These techniques use multiple classifiers for making a prediction and has proved to be outperforming classical machine learning classification. In this paper, we present an empirical study of ensemble-based techniques for classification of new incoming bug reports. We studied 5 ensemble classification techniques, namely Bagging, Boosting, Majority Voting, Average Voting, and Stacking using 25 different machine learning classifiers as base classifiers. The experimental results showed that ensemble classifiers outperform classical machine learning algorithms for selection of suitable developer for handling the bug report.

Research paper thumbnail of Characterization Study of Developers in Non-Reproducible Bugs

2018 Eleventh International Conference on Contemporary Computing (IC3), 2018

Software bugs are inevitable. Increasing size and complexity of software systems makes bug handli... more Software bugs are inevitable. Increasing size and complexity of software systems makes bug handling a cumbersome and time-consuming task. In such scenarios, Non-Reproducible (NR) bug reports are an additional overhead for software developers as these bugs are hard-to-reproduce. This difficulty in reproducing NR bugs is due to varied reasons such as the absence of information required to create the same test environment, resource and time constraints, inability of the assigned developer to fix the issue, etc. However, when NR marked bug reports are reconsidered, a few percentages of these bug reports get Fixed (NRF). This fixation occurs either due to the trial of new solutions to reproduce the bug or due to the assignment of a different developer for the bug report. To find whether a change in developer helps in resolving NR bugs, this paper investigates the developers associated with NR bug reports. We gauged developer similarity (in terms of developer marking bug report as NR and Fix), tossing trends, presence of isBack and expertise level of developers who marked NRF bug reports as NR and Fix. We studied the change history of 24, 995 NR bug reports of Mozilla Firefox project. Our results show that 87.34% NRF bug reports are fixed by a developer who had not marked the bug report initially as NR. Also, we found that average tossing path length in NRF bug reports is three times higher than tossing path length in NR bug reports. This higher rate of bug tossing results in higher fixation probability of NR bug reports. It has also been observed that developers who fix NR bugs possess higher expertise than developers who marks bug reports as NR.

Research paper thumbnail of Extraction of Influencers Across Twitter Using Credibility and Trend Analysis

2018 Eleventh International Conference on Contemporary Computing (IC3), 2018

Influence maximization facilitates in selection of individuals that can help in diffusing the inf... more Influence maximization facilitates in selection of individuals that can help in diffusing the information to maximum people in least time. Credible individuals are selected based on twitter or influencer score. This paper proposes a novel method to find the influencers. Scoring is computed using the features of individuals. Generally these features are based on activity; authority and audience of a user on twitter. First, influence score of a person has been computed using the features like retweets, followers, posts etc. Second, tweet score is computed. For tweet score, user tweets are mined to find their opinion about the subject. Further, Trend score is computed using the opinion of public that are extracted by textual data mining to get better insight about the subject in context. Finally, both influence score and tweet score of a person are correlated with the trend score to infer the final influencers.

Research paper thumbnail of ECLogger: Cross-Project Catch-Block Logging Prediction Using Ensemble of Classifiers

e Informatica Softw. Eng. J., 2017

Background: Software developers insert log statements in the source code to record program execut... more Background: Software developers insert log statements in the source code to record program execution information. However, optimizing the number of log statements in the source code is challenging. Machine learning based within-project logging prediction tools, proposed in previous studies, may not be suitable for new or small software projects. For such software projects, we can use cross-project logging prediction. Aim: The aim of the study presented here is to investigate cross-project logging prediction methods and techniques. Method: The proposed method is ECLogger, which is a novel, ensemble-based, cross-project, catch-block logging prediction model. In the research We use 9 base classifiers were used and combined using ensemble techniques. The performance of ECLogger was evaluated on on three open-source Java projects: Tomcat, CloudStack and Hadoop. Results: ECLoggerBagging, ECLoggerAverageVote, and ECLoggerMajorityVote show a considerable improvement in the average Logged F-...

Research paper thumbnail of Machine Learning or Information Retrieval Techniques for Bug Triaging: Which is better?

e Informatica Softw. Eng. J., 2017

Bugs are the inevitable part of a software system. Nowadays, large software development projects ... more Bugs are the inevitable part of a software system. Nowadays, large software development projects even release beta versions of their products to gather bug reports from users. The collected bug reports are then worked upon by various developers in order to resolve the defects and make the final software product more reliable. The high frequency of incoming bugs makes the bug handling a difficult and time consuming task. Bug assignment is an integral part of bug triaging that aims at the process of assigning a suitable developer for the reported bug who corrects the source code in order to resolve the bug. There are various semi and fully automated techniques to ease the task of bug assignment. This paper presents the current state of the art of various techniques used for bug report assignment. Through exhaustive research, the authors have observed that machine learning and information retrieval based bug assignment approaches are most popular in literature. A deeper investigation h...

Research paper thumbnail of Evaluating the Performance of Navigation Prediction Model Based on Varied Session Length

Web navigation prediction plays a vital role in web, due to its broad research applications. It c... more Web navigation prediction plays a vital role in web, due to its broad research applications. It can be used for personalization, improvise website design, and business intelligence. Main aim of these applications is to enhance user’s satisfaction levels who are visiting the website. Web navigation prediction model tries to predict the future set of the webpage from their historical navigations. The past navigations are collected in the web server log file. Navigations form the sessions of varied length which are used for building the navigation model. Selecting very long sessions or very small sessions degrades the model performance. Thus, selecting an optimal session length is mandate as it would impact the model performance positively. This paper presents pre-investigation measures like page loss, branching factor and session length. We investigate the performance of prediction model based on two different ranges of session length. First range that has been considered is three to ...

Research paper thumbnail of DPSN: A Novel approach for Disease Prediction based on Social Networks

2019 International Conference on Signal Processing and Communication (ICSC), 2019

Disease prediction has always been challenging and critical research area. A lot of machine learn... more Disease prediction has always been challenging and critical research area. A lot of machine learning approaches have been developed in past to address this problem. In this paper a novel approach for disease prediction has been proposed. The approach is based on social network of diseases which is constructed using similarity among patient symptoms. The edges are weighted on the basis of the cosine similarity between the diseases. Social network metrics like degree and closeness centrality are used to predict high probable disease. The proposed method has been compared with existing machine learning approaches. The experimental results show that, when predicting disease from the set of 18 possible diseases with 30 possible symptoms, the proposed technique show better performance than traditional machine learning methods.

Research paper thumbnail of Comparative Study of Multilabel Classifiers on Software Engineering Q&A Community for Tag Recommendation

In the paper, we are analyzing and classifying the tags of the textual content of Software Engine... more In the paper, we are analyzing and classifying the tags of the textual content of Software Engineering, Stack Exchange Q&A website by using text pre-processing and classification algorithm. These methods were chosen because the tags provided by the dataset were too general to contextualize the questions. Further the selected classification methods are being compared using accuracy, and used to predict the tags correctly. The accuracy found for LinearSVM is the highest among all the classifiers. The tags predicted by LinearSVM has given the best results of 96% as its ROC score.

Research paper thumbnail of Performance Analysis of Naïve Bayes Classifier Over Similarity Score-Based Techniques for Missing Link Prediction in Ego Networks

Keywords Ego Network, Link Prediction, Machine Learning, Performance Analysis, Similarity Score, ... more Keywords Ego Network, Link Prediction, Machine Learning, Performance Analysis, Similarity Score, Social Network

Research paper thumbnail of Detection of iOS Malware apps based on Significant Services Identification using Borda count

2021 Thirteenth International Conference on Contemporary Computing (IC3-2021), 2021

In today's era, smartphones are used in daily lives because they are ubiquitous and provide i... more In today's era, smartphones are used in daily lives because they are ubiquitous and provide internet connectivity everywhere. The primary reason for the increased usage of smartphones is their functional expandability by installing third-party apps, which span a wide range of categories including books, social networking, instant messaging, etc. Users are compelled to use these feature-rich apps. As a result, the menaces because of these apps, which are potentially risky for user's privacy, have increased. As the information on smartphones is perhaps, more personal than compared to data stored on desktops or computers because smartphones remain with individuals throughout the day and generate contextual data through sensors making it an easy target for intruders. Both Android and iOS follow a permission-based access control mechanism to protect the privacy of their users where an app has to specify the permissions it will use during its run-time. However, the users are unaware whether the app is breaching the user's privacy or sharing it with third-party apps. A lot of work for detecting malicious Android apps using feature selection techniques has been conducted because of the availability of a large permission set and labeled data set. However, minimal work has been conducted for the iOS platform because of the limited permission set, limited labeled data set, and closed-source platform. To combat this problem, in the paper we propose an approach to detect malicious iOS apps based on the app's category using static analysis of app permissions to identify the most significant permission. In this work, several feature ranking techniques such as Correlation, Gain ratio, Info gain, OneR, and ReliefF have been employed on a data set of 1150 iOS apps to identify the riskiest permission across 12 different app categories. To improve the permission selection process and improve the precision of classifiers, the Borda count method has been utilized. Our empirical analysis proves that the proposed approach effectively identifies the top 5 risky permissions within different categories.

Research paper thumbnail of Exploratory study of existing approaches for analyzing epidemics

Leveraging Artificial Intelligence in Global Epidemics, 2021

The outbreak of epidemic diseases such as COVID-19, H1N1 swine flu, Ebola, and dengue has caused ... more The outbreak of epidemic diseases such as COVID-19, H1N1 swine flu, Ebola, and dengue has caused different communities to raise their apprehension over preventing and controlling the infectious diseases, as well as determining methods to diminish the disease propagation percentage. Epidemics are generally contiguous in which the number of cases increases at a very rapid rate. It often results in loss of lives as it affects the respiratory tract and lungs and even causes multiorgan failure. Hence, it is imperative to analyze the spread of any virus to make strategies for situational awareness and intervention. Researchers and medical practitioners have actively performed many studies to model the behavior of viruses with varied perspectives. These studies have guided in analyzing the pattern and speed of virus spread. This chapter presents an exploratory study on the existing approaches, such as classical epidemic approaches and Machine Learning approaches, useful for studying the outbreak patterns of epidemics. Besides, the chapter highlights the available epidemics datasets and describes the varied visualization charts that can help in understanding the patterns of virus spread.

Research paper thumbnail of Feature Ranking and Aggregation for Bug Triaging in Open-Source Issue Tracking Systems

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2021

The increasing complexity and team-based projects have lead to the rise of various project manage... more The increasing complexity and team-based projects have lead to the rise of various project management tools and techniques. One of the important components of open- source project management is the usage of bug tracking systems. In the last few decades, software projects have experienced an inescapable appearance of bug reports. One of the main challenges in handling these incoming bugs is triaging of bug reports. Bug triaging can be considered as a mechanism for the election of a suitable software developer for a reported bug who will work towards resolving bug in a timely fashion. There exist several semi and fully automated bug triaging techniques in the existing literature. These techniques often consider varied bug parameters for prominent developer selection. Past researchers have concluded different parameters to be possessing prime importance in the optimal developer selection task. However, a common ranking scale depicting the importance among different bug parameters for bug triaging is not available. This paper presents a methodology to rank the non-textual bug parameters using feature ranking and aggregation techniques. The presented methodology has been evaluated on four open-source systems, namely, Mozilla Firefox, Eclipse, GNome and Open Office. From the experimental evaluation, it has been observed that the ranking of bug parameters is consistent among the different open-source projects of Bugzilla repository.

Research paper thumbnail of Analysis and Visualization of User Navigations on Web

Data Visualization and Knowledge Engineering, 2019

The web is the largest repository of data. The user frequently navigates on the web to access the... more The web is the largest repository of data. The user frequently navigates on the web to access the information. These navigational patterns are stored in weblogs which are growing exponentially with time. This increase in voluminous weblog data raises major challenges concerning handling big data, understanding navigation patterns and the structural complexity of the web, etc. Visualization is a process to view the complex large web data graphically to address these challenges. This chapter describes the various aspects of visualization with which the novel insights can be drawn in the area of web navigation mining. To analyze user navigations, visualization can be applied in two stages: post pre-processing and post pattern discovery. First stage analyses the website structure, website evolution, user navigation behaviour, frequent and rare patterns and detecting noise. Second stage analyses the interesting patterns obtained from prediction modelling of web data. The chapter also highlights popular visualization tools to analyze weblog data.

Research paper thumbnail of Logging Analysis and Prediction in Open Source Java Project

Research Anthology on Usage and Development of Open Source Software, 2021

Log statements present in source code provide important information to the software developers be... more Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Research paper thumbnail of ORFDetector: Ensemble Learning Based Online Recruitment Fraud Detection

2019 Twelfth International Conference on Contemporary Computing (IC3), 2019

Online recruitment fraud (ORF) is a new challenge in the cyber security area. In ORF, scammers gi... more Online recruitment fraud (ORF) is a new challenge in the cyber security area. In ORF, scammers give job seekers lucrative job offers and in-return steal their money and personal information. In India, scammers have stolen millions of moneys from innocent job seekers. Hence, it is important to find solution to this problem. In this paper, we propose, ORFDetector, an ensemble learning based model for ORF detection. We test the proposed model on publicly available dataset of 17,860 annotated jobs. The proposed model is found to be effective and give average f1-score and accuracy of 94% and 95.4, respectively. Additionally, it increases the specificity by 8% as compared to the baseline classifiers.

Research paper thumbnail of PKM3: an optimal Markov model for predicting future navigation sequences of the web surfers

Pattern Analysis and Applications, 2020

Predicting the browsing behavior of the user on the web has gained significant importance, as it ... more Predicting the browsing behavior of the user on the web has gained significant importance, as it improves the productivity of the website owners and also raises the interest of web users. The Markov model has been used immensely for user’s web navigation prediction. To enhance the coverage and accuracy of the Markov model, higher order Markov models are integrated with lower order models. However, this integration results in large state-space complexity. To reduce the state-space complexity, this paper proposes a novel technique, namely Pruned all- K th modified Markov model (PKM3). PKM3 eliminates the irrelevant states from a higher order model, which have a negligible contribution toward prediction. The proposed model is evaluated on four standard weblogs: BMS, MSWEB, CTI and MSNBC. PKM3 performance was optimal for the website in which pages were closely placed and share high interlinking. This pruning-based optimal model achieves a significant reduction in state-space complexity while maintaining comparable accuracy.

Research paper thumbnail of IM NRFixer : A hybrid approach to alleviate class‐imbalance problem for predicting the fixability of Non‐Reproducible bugs

Journal of Software: Evolution and Process, 2020

Research paper thumbnail of Analyzing Performance of Deep Learning Techniques for Web Navigation Prediction

Procedia Computer Science, 2020

Abstract The weblog is dynamic and its size is growing exponentially with time in terms of naviga... more Abstract The weblog is dynamic and its size is growing exponentially with time in terms of navigation sessions. These stored sessions are used for Web Navigation Prediction (WNP). Each user had varied behavior on the web so is their navigated sessions. With a variety of large dynamic sessions, the task of navigation prediction is becoming challenging. There is a need for an effective method to handle large sessions with multiple labels for predicting user desired information. This paper analyses the performance of Deep Learning techniques like Multi-Layer Perceptron and Long-Short Term Memory based on parameters like number of hidden units, number of layers, activation function, optimization function, learning rate, and batch size. The networks were trained on six experimental parameter setups to form 216 models. The performances of these models are evaluated on two real datasets: BMS and CTI. It has been observed that Long-Short Term Memory performs best on most of the setups.

Research paper thumbnail of MSD-Apriori: Discovering borderline-rare items using association mining

2017 Tenth International Conference on Contemporary Computing (IC3), 2017

Research paper thumbnail of Naïve Bayes Approach for Predicting Missing Links in Ego Networks

2016 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS), 2016

Research paper thumbnail of Empirical Analysis of Ensemble Machine Learning Techniques for Bug Triaging

2019 Twelfth International Conference on Contemporary Computing (IC3), 2019

Bug reports are an inescapable part of the software product framework. Nowadays, software advance... more Bug reports are an inescapable part of the software product framework. Nowadays, software advancements have led to the creation of beta versions of software in order to assemble the bug reports from clients. The assembled bug reports are then handled by software developers to make consequent software more reliable as well as robust. However, high recurrence of approaching bug reports forges the process of bug fixing to be a troublesome & tedious process. Bug triaging is an essential component of issue handling process and it deals with the selection of a suitable software developer for handling of reported bug such that the assigned developer is able to fix the reported issue. In the literature, different semi and fully mechanized procedures are proposed to facilitate the endeavor of developer selection in bug repositories. These techniques use historically fixed information from bug repositories to classify any new incoming bugs. In the recent years, ensemble-based classification techniques have gained popularity. These techniques use multiple classifiers for making a prediction and has proved to be outperforming classical machine learning classification. In this paper, we present an empirical study of ensemble-based techniques for classification of new incoming bug reports. We studied 5 ensemble classification techniques, namely Bagging, Boosting, Majority Voting, Average Voting, and Stacking using 25 different machine learning classifiers as base classifiers. The experimental results showed that ensemble classifiers outperform classical machine learning algorithms for selection of suitable developer for handling the bug report.

Research paper thumbnail of Characterization Study of Developers in Non-Reproducible Bugs

2018 Eleventh International Conference on Contemporary Computing (IC3), 2018

Software bugs are inevitable. Increasing size and complexity of software systems makes bug handli... more Software bugs are inevitable. Increasing size and complexity of software systems makes bug handling a cumbersome and time-consuming task. In such scenarios, Non-Reproducible (NR) bug reports are an additional overhead for software developers as these bugs are hard-to-reproduce. This difficulty in reproducing NR bugs is due to varied reasons such as the absence of information required to create the same test environment, resource and time constraints, inability of the assigned developer to fix the issue, etc. However, when NR marked bug reports are reconsidered, a few percentages of these bug reports get Fixed (NRF). This fixation occurs either due to the trial of new solutions to reproduce the bug or due to the assignment of a different developer for the bug report. To find whether a change in developer helps in resolving NR bugs, this paper investigates the developers associated with NR bug reports. We gauged developer similarity (in terms of developer marking bug report as NR and Fix), tossing trends, presence of isBack and expertise level of developers who marked NRF bug reports as NR and Fix. We studied the change history of 24, 995 NR bug reports of Mozilla Firefox project. Our results show that 87.34% NRF bug reports are fixed by a developer who had not marked the bug report initially as NR. Also, we found that average tossing path length in NRF bug reports is three times higher than tossing path length in NR bug reports. This higher rate of bug tossing results in higher fixation probability of NR bug reports. It has also been observed that developers who fix NR bugs possess higher expertise than developers who marks bug reports as NR.

Research paper thumbnail of Extraction of Influencers Across Twitter Using Credibility and Trend Analysis

2018 Eleventh International Conference on Contemporary Computing (IC3), 2018

Influence maximization facilitates in selection of individuals that can help in diffusing the inf... more Influence maximization facilitates in selection of individuals that can help in diffusing the information to maximum people in least time. Credible individuals are selected based on twitter or influencer score. This paper proposes a novel method to find the influencers. Scoring is computed using the features of individuals. Generally these features are based on activity; authority and audience of a user on twitter. First, influence score of a person has been computed using the features like retweets, followers, posts etc. Second, tweet score is computed. For tweet score, user tweets are mined to find their opinion about the subject. Further, Trend score is computed using the opinion of public that are extracted by textual data mining to get better insight about the subject in context. Finally, both influence score and tweet score of a person are correlated with the trend score to infer the final influencers.

Research paper thumbnail of ECLogger: Cross-Project Catch-Block Logging Prediction Using Ensemble of Classifiers

e Informatica Softw. Eng. J., 2017

Background: Software developers insert log statements in the source code to record program execut... more Background: Software developers insert log statements in the source code to record program execution information. However, optimizing the number of log statements in the source code is challenging. Machine learning based within-project logging prediction tools, proposed in previous studies, may not be suitable for new or small software projects. For such software projects, we can use cross-project logging prediction. Aim: The aim of the study presented here is to investigate cross-project logging prediction methods and techniques. Method: The proposed method is ECLogger, which is a novel, ensemble-based, cross-project, catch-block logging prediction model. In the research We use 9 base classifiers were used and combined using ensemble techniques. The performance of ECLogger was evaluated on on three open-source Java projects: Tomcat, CloudStack and Hadoop. Results: ECLoggerBagging, ECLoggerAverageVote, and ECLoggerMajorityVote show a considerable improvement in the average Logged F-...

Research paper thumbnail of Machine Learning or Information Retrieval Techniques for Bug Triaging: Which is better?

e Informatica Softw. Eng. J., 2017

Bugs are the inevitable part of a software system. Nowadays, large software development projects ... more Bugs are the inevitable part of a software system. Nowadays, large software development projects even release beta versions of their products to gather bug reports from users. The collected bug reports are then worked upon by various developers in order to resolve the defects and make the final software product more reliable. The high frequency of incoming bugs makes the bug handling a difficult and time consuming task. Bug assignment is an integral part of bug triaging that aims at the process of assigning a suitable developer for the reported bug who corrects the source code in order to resolve the bug. There are various semi and fully automated techniques to ease the task of bug assignment. This paper presents the current state of the art of various techniques used for bug report assignment. Through exhaustive research, the authors have observed that machine learning and information retrieval based bug assignment approaches are most popular in literature. A deeper investigation h...

Research paper thumbnail of Evaluating the Performance of Navigation Prediction Model Based on Varied Session Length

Web navigation prediction plays a vital role in web, due to its broad research applications. It c... more Web navigation prediction plays a vital role in web, due to its broad research applications. It can be used for personalization, improvise website design, and business intelligence. Main aim of these applications is to enhance user’s satisfaction levels who are visiting the website. Web navigation prediction model tries to predict the future set of the webpage from their historical navigations. The past navigations are collected in the web server log file. Navigations form the sessions of varied length which are used for building the navigation model. Selecting very long sessions or very small sessions degrades the model performance. Thus, selecting an optimal session length is mandate as it would impact the model performance positively. This paper presents pre-investigation measures like page loss, branching factor and session length. We investigate the performance of prediction model based on two different ranges of session length. First range that has been considered is three to ...

Research paper thumbnail of DPSN: A Novel approach for Disease Prediction based on Social Networks

2019 International Conference on Signal Processing and Communication (ICSC), 2019

Disease prediction has always been challenging and critical research area. A lot of machine learn... more Disease prediction has always been challenging and critical research area. A lot of machine learning approaches have been developed in past to address this problem. In this paper a novel approach for disease prediction has been proposed. The approach is based on social network of diseases which is constructed using similarity among patient symptoms. The edges are weighted on the basis of the cosine similarity between the diseases. Social network metrics like degree and closeness centrality are used to predict high probable disease. The proposed method has been compared with existing machine learning approaches. The experimental results show that, when predicting disease from the set of 18 possible diseases with 30 possible symptoms, the proposed technique show better performance than traditional machine learning methods.

Research paper thumbnail of Comparative Study of Multilabel Classifiers on Software Engineering Q&A Community for Tag Recommendation

In the paper, we are analyzing and classifying the tags of the textual content of Software Engine... more In the paper, we are analyzing and classifying the tags of the textual content of Software Engineering, Stack Exchange Q&A website by using text pre-processing and classification algorithm. These methods were chosen because the tags provided by the dataset were too general to contextualize the questions. Further the selected classification methods are being compared using accuracy, and used to predict the tags correctly. The accuracy found for LinearSVM is the highest among all the classifiers. The tags predicted by LinearSVM has given the best results of 96% as its ROC score.

Research paper thumbnail of Performance Analysis of Naïve Bayes Classifier Over Similarity Score-Based Techniques for Missing Link Prediction in Ego Networks

Keywords Ego Network, Link Prediction, Machine Learning, Performance Analysis, Similarity Score, ... more Keywords Ego Network, Link Prediction, Machine Learning, Performance Analysis, Similarity Score, Social Network

Research paper thumbnail of Detection of iOS Malware apps based on Significant Services Identification using Borda count

2021 Thirteenth International Conference on Contemporary Computing (IC3-2021), 2021

In today's era, smartphones are used in daily lives because they are ubiquitous and provide i... more In today's era, smartphones are used in daily lives because they are ubiquitous and provide internet connectivity everywhere. The primary reason for the increased usage of smartphones is their functional expandability by installing third-party apps, which span a wide range of categories including books, social networking, instant messaging, etc. Users are compelled to use these feature-rich apps. As a result, the menaces because of these apps, which are potentially risky for user's privacy, have increased. As the information on smartphones is perhaps, more personal than compared to data stored on desktops or computers because smartphones remain with individuals throughout the day and generate contextual data through sensors making it an easy target for intruders. Both Android and iOS follow a permission-based access control mechanism to protect the privacy of their users where an app has to specify the permissions it will use during its run-time. However, the users are unaware whether the app is breaching the user's privacy or sharing it with third-party apps. A lot of work for detecting malicious Android apps using feature selection techniques has been conducted because of the availability of a large permission set and labeled data set. However, minimal work has been conducted for the iOS platform because of the limited permission set, limited labeled data set, and closed-source platform. To combat this problem, in the paper we propose an approach to detect malicious iOS apps based on the app's category using static analysis of app permissions to identify the most significant permission. In this work, several feature ranking techniques such as Correlation, Gain ratio, Info gain, OneR, and ReliefF have been employed on a data set of 1150 iOS apps to identify the riskiest permission across 12 different app categories. To improve the permission selection process and improve the precision of classifiers, the Borda count method has been utilized. Our empirical analysis proves that the proposed approach effectively identifies the top 5 risky permissions within different categories.

Research paper thumbnail of Exploratory study of existing approaches for analyzing epidemics

Leveraging Artificial Intelligence in Global Epidemics, 2021

The outbreak of epidemic diseases such as COVID-19, H1N1 swine flu, Ebola, and dengue has caused ... more The outbreak of epidemic diseases such as COVID-19, H1N1 swine flu, Ebola, and dengue has caused different communities to raise their apprehension over preventing and controlling the infectious diseases, as well as determining methods to diminish the disease propagation percentage. Epidemics are generally contiguous in which the number of cases increases at a very rapid rate. It often results in loss of lives as it affects the respiratory tract and lungs and even causes multiorgan failure. Hence, it is imperative to analyze the spread of any virus to make strategies for situational awareness and intervention. Researchers and medical practitioners have actively performed many studies to model the behavior of viruses with varied perspectives. These studies have guided in analyzing the pattern and speed of virus spread. This chapter presents an exploratory study on the existing approaches, such as classical epidemic approaches and Machine Learning approaches, useful for studying the outbreak patterns of epidemics. Besides, the chapter highlights the available epidemics datasets and describes the varied visualization charts that can help in understanding the patterns of virus spread.

Research paper thumbnail of Feature Ranking and Aggregation for Bug Triaging in Open-Source Issue Tracking Systems

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2021

The increasing complexity and team-based projects have lead to the rise of various project manage... more The increasing complexity and team-based projects have lead to the rise of various project management tools and techniques. One of the important components of open- source project management is the usage of bug tracking systems. In the last few decades, software projects have experienced an inescapable appearance of bug reports. One of the main challenges in handling these incoming bugs is triaging of bug reports. Bug triaging can be considered as a mechanism for the election of a suitable software developer for a reported bug who will work towards resolving bug in a timely fashion. There exist several semi and fully automated bug triaging techniques in the existing literature. These techniques often consider varied bug parameters for prominent developer selection. Past researchers have concluded different parameters to be possessing prime importance in the optimal developer selection task. However, a common ranking scale depicting the importance among different bug parameters for bug triaging is not available. This paper presents a methodology to rank the non-textual bug parameters using feature ranking and aggregation techniques. The presented methodology has been evaluated on four open-source systems, namely, Mozilla Firefox, Eclipse, GNome and Open Office. From the experimental evaluation, it has been observed that the ranking of bug parameters is consistent among the different open-source projects of Bugzilla repository.

Research paper thumbnail of Analysis and Visualization of User Navigations on Web

Data Visualization and Knowledge Engineering, 2019

The web is the largest repository of data. The user frequently navigates on the web to access the... more The web is the largest repository of data. The user frequently navigates on the web to access the information. These navigational patterns are stored in weblogs which are growing exponentially with time. This increase in voluminous weblog data raises major challenges concerning handling big data, understanding navigation patterns and the structural complexity of the web, etc. Visualization is a process to view the complex large web data graphically to address these challenges. This chapter describes the various aspects of visualization with which the novel insights can be drawn in the area of web navigation mining. To analyze user navigations, visualization can be applied in two stages: post pre-processing and post pattern discovery. First stage analyses the website structure, website evolution, user navigation behaviour, frequent and rare patterns and detecting noise. Second stage analyses the interesting patterns obtained from prediction modelling of web data. The chapter also highlights popular visualization tools to analyze weblog data.

Research paper thumbnail of Logging Analysis and Prediction in Open Source Java Project

Research Anthology on Usage and Development of Open Source Software, 2021

Log statements present in source code provide important information to the software developers be... more Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Research paper thumbnail of ORFDetector: Ensemble Learning Based Online Recruitment Fraud Detection

2019 Twelfth International Conference on Contemporary Computing (IC3), 2019

Online recruitment fraud (ORF) is a new challenge in the cyber security area. In ORF, scammers gi... more Online recruitment fraud (ORF) is a new challenge in the cyber security area. In ORF, scammers give job seekers lucrative job offers and in-return steal their money and personal information. In India, scammers have stolen millions of moneys from innocent job seekers. Hence, it is important to find solution to this problem. In this paper, we propose, ORFDetector, an ensemble learning based model for ORF detection. We test the proposed model on publicly available dataset of 17,860 annotated jobs. The proposed model is found to be effective and give average f1-score and accuracy of 94% and 95.4, respectively. Additionally, it increases the specificity by 8% as compared to the baseline classifiers.

Research paper thumbnail of PKM3: an optimal Markov model for predicting future navigation sequences of the web surfers

Pattern Analysis and Applications, 2020

Predicting the browsing behavior of the user on the web has gained significant importance, as it ... more Predicting the browsing behavior of the user on the web has gained significant importance, as it improves the productivity of the website owners and also raises the interest of web users. The Markov model has been used immensely for user’s web navigation prediction. To enhance the coverage and accuracy of the Markov model, higher order Markov models are integrated with lower order models. However, this integration results in large state-space complexity. To reduce the state-space complexity, this paper proposes a novel technique, namely Pruned all- K th modified Markov model (PKM3). PKM3 eliminates the irrelevant states from a higher order model, which have a negligible contribution toward prediction. The proposed model is evaluated on four standard weblogs: BMS, MSWEB, CTI and MSNBC. PKM3 performance was optimal for the website in which pages were closely placed and share high interlinking. This pruning-based optimal model achieves a significant reduction in state-space complexity while maintaining comparable accuracy.

Research paper thumbnail of IM NRFixer : A hybrid approach to alleviate class‐imbalance problem for predicting the fixability of Non‐Reproducible bugs

Journal of Software: Evolution and Process, 2020

Research paper thumbnail of Analyzing Performance of Deep Learning Techniques for Web Navigation Prediction

Procedia Computer Science, 2020

Abstract The weblog is dynamic and its size is growing exponentially with time in terms of naviga... more Abstract The weblog is dynamic and its size is growing exponentially with time in terms of navigation sessions. These stored sessions are used for Web Navigation Prediction (WNP). Each user had varied behavior on the web so is their navigated sessions. With a variety of large dynamic sessions, the task of navigation prediction is becoming challenging. There is a need for an effective method to handle large sessions with multiple labels for predicting user desired information. This paper analyses the performance of Deep Learning techniques like Multi-Layer Perceptron and Long-Short Term Memory based on parameters like number of hidden units, number of layers, activation function, optimization function, learning rate, and batch size. The networks were trained on six experimental parameter setups to form 216 models. The performances of these models are evaluated on two real datasets: BMS and CTI. It has been observed that Long-Short Term Memory performs best on most of the setups.