Forecasting the Number of Bugs and Vulnerabilities in Software Components using Neural Network Models (original) (raw)

Forecasting number of vulnerabilities using long short-term neural memory network

International Journal of Electrical and Computer Engineering (IJECE), 2021

Cyber-attacks are launched through the exploitation of some existing vulnerabilities in the software, hardware, system and/or network. Machine learning algorithms can be used to forecast the number of post release vulnerabilities. Traditional neural networks work like a black box approach; hence it is unclear how reasoning is used in utilizing past data points in inferring the subsequent data points. However, the long short-term memory network (LSTM), a variant of the recurrent neural network, is able to address this limitation by introducing a lot of loops in its network to retain and utilize past data points for future calculations. Moving on from the previous finding, we further enhance the results to predict the number of vulnerabilities by developing a time series-based sequential model using a long short-term memory neural network. Specifically, this study developed a supervised machine learning based on the non-linear sequential time series forecasting model with a long short-term memory neural network to predict the number of vulnerabilities for three vendors having the highest number of vulnerabilities published in the national vulnerability database (NVD), namely microsoft, IBM and oracle. Our proposed model outperforms the existing models with a prediction result root mean squared error (RMSE) of as low as 0.072.

Detecting Software Vulnerabilities Using Neural Networks

ICMLC, 2021

As software vulnerabilities remain prevalent, automatically detecting software vulnerabilities is crucial for software security. Recently neural networks have been shown to be a promising tool in detecting software vulnerabilities. In this paper, we use neural networks trained with program slices, which extract the syntax and semantic characteristics of the source code of programs, to detect software vulnerabilities in C/C++ programs. To achieve a strong prediction model, we combine different types of program slices and optimize different types of neural networks. Our result shows that combining different types of characteristics of source code and using a balanced ratio of vulnerable program slices and non-vulnerable program slices a balanced accuracy in predicting both vulnerable code and non-vulnerable code. Among different neural networks, BGRU performs the best in detecting software vulnerabilities with an accuracy of 94.89%. CCS CONCEPTS • Security and privacy → Software and application security; • Computing methodologies → Neural networks.

Cybersecurity: Time Series Predictive Modeling of Vulnerabilities of Desktop Operating System Using Linear and Non-Linear Approach

Journal of information security, 2017

Vulnerability forecasting models help us to predict the number of vulnerabilities that may occur in the future for a given Operating System (OS). There exist few models that focus on quantifying future vulnerabilities without consideration of trend, level, seasonality and non linear components of vulnerabilities. Unlike traditional ones, we propose a vulnerability analytic prediction model based on linear and non-linear approaches via time series analysis. We have developed the models based on Auto Regressive Moving Average (ARIMA), Artificial Neural Network (ANN), and Support Vector Machine (SVM) settings. The best model which provides the minimum error rate is selected for prediction of future vulnerabilities. Utilizing time series approach, this study has developed a predictive analytic model for three popular Desktop Operating Systems, namely, Windows 7, Mac OS X, and Linux Kernel by using their reported vulnerabilities on the National Vulnerability Database (NVD). Based on these reported vulnerabilities, we predict ahead their behavior so that the OS companies can make strategic and operational decisions like secure deployment of OS, facilitate backup provisioning, disaster recovery, diversity planning, maintenance scheduling, etc. Similarly, it also helps in assessing current security risks along with estimation of resources needed for handling potential security breaches and to foresee the future releases of security patches. The proposed non-linear analytic models produce very good prediction results in comparison to linear time series models.

Towards a Neural Network based Reliability Prediction Model via Bugs and Changes

Proceedings of the 16th International Conference on Software Technologies, 2021

Nowadays, software systems have become larger and more complex than ever. A system failure could threaten the safety of human life. Discovering the bugs as soon as possible during the software development and investigating the effect of a change in the software system are two main concerns of the software developers to increase system's reliability. Our approach employs a neural network to predict reliability via post-release defects and changes applied during the software development life cycle. The CK metrics are used as predictors variables, whereas the target variable is composed of both bugs and changes having different weights. This paper empirically investigates various prediction models considering different weights for the components of the target variable using five open source projects. Two major perspectives are explored: cross-project to identify the optimum weight values for bugs and changes and cross-project to discover the best training project for a selected weight. The results show that for both cross-project experiments, the best accuracy is obtained for the models with the highest weights for bugs (75% bugs and 25% changes) and that the right fitted project to be used as training is the PDE project.

Measuring, analyzing and predicting security vulnerabilities in software systems

Computers & Security, 2007

In this work we examine the feasibility of quantitatively characterizing some aspects of security. In particular, we investigate if it is possible to predict the number of vulnerabilities that can potentially be present in a software system but may not have been found yet. We use several major operating systems as representatives of complex software systems. The data on vulnerabilities discovered in these systems are analyzed. We examine the results to determine if the density of vulnerabilities in a program is a useful measure. We also address the question about what fraction of software defects are security related, i.e., are vulnerabilities. We examine the dynamics of vulnerability discovery hypothesizing that it may lead us to an estimate of the magnitude of the undiscovered vulnerabilities still present in the system. We consider the vulnerability discovery rate to see if models can be developed to project future trends. Finally, we use the data for both commercial and opensource systems to determine whether the key observations are generally applicable. Our results indicate that the values of vulnerability densities fall within a range of values, just like the commonly used measure of defect density for general defects. Our examination also reveals that it is possible to model the vulnerability discovery using a logistic model that can sometimes be approximated by a linear model.

Secure Environment via Prediction of Software Vulnerabilities-Severity

2019

Prediction of software vulnerabilities-severity is of particular importance. Its most important application is that managers can first deal with the most dangerous vulnerabilities when they have limited resources. This research shows how we can use the former patterns of software vulnerabilities-severity along with machine learning methods to predict the vulnerabilities severity of that software in the future. In this regard, we used the SVM, Decision Trees (DT), Random Forests (RF), K Nearest Neighbors (KNN), bagging and AdaBoost algorithms along with the already reported vulnerabilities of Google Android applications, Apple Safari and the Flash Player. The experimental results showed that the Bagging algorithm can predict Google Android vulnerability with accuracy of 78.21% and f1-measure equal to 77%, the vulnerability of the Flash Player software with accuracy of 82.37% and f1-measure equal to 87.73% and predict the vulnerability severity of the Apple Safari with accuracy of 70...

Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista

2010 Third International Conference on Software Testing, Verification and Validation, 2010

Many factors are believed to increase the vulnerability of software system; for example, the more widely deployed or popular is a software system the more likely it is to be attacked. Early identification of defects has been a widely investigated topic in software engineering research. Early identification of software vulnerabilities can help mitigate these attacks to a large degree by focusing better security verification efforts in these components. Predicting vulnerabilities is complicated by the fact that vulnerabilities are, most often, few in number and introduce significant bias by creating a sparse dataset in the population. As a result, vulnerability prediction can be thought of us preverbally "searching for a needle in a haystack." In this paper, we present a large-scale empirical study on Windows Vista, where we empirically evaluate the efficacy of classical metrics like complexity, churn, coverage, dependency measures, and organizational structure of the company to predict vulnerabilities and assess how well these software measures correlate with vulnerabilities. We observed in our experiments that classical software measures predict vulnerabilities with a high precision but low recall values. The actual dependencies, however, predict vulnerabilities with a lower precision but substantially higher recall.

Learning from what we know: How to perform vulnerability prediction using noisy historical data

Empirical Software Engineering

Vulnerability prediction refers to the problem of identifying system components that are most likely to be vulnerable. Typically, this problem is tackled by training binary classifiers on historical data. Unfortunately, recent research has shown that such approaches underperform due to the following two reasons: a) the imbalanced nature of the problem, and b) the inherently noisy historical data, i.e., most vulnerabilities are discovered much later than they are introduced. This misleads classifiers as they learn to recognize actual vulnerable components as non-vulnerable. To tackle these issues, we propose TROVON, a technique that learns from known vulnerable components rather than from vulnerable and non-vulnerable components, as typically performed. We perform this by contrasting the known vulnerable, and their respective fixed components. This way, TROVON manages to learn from the things we know, i.e., vulnerabilities, hence reducing the effects of noisy and unbalanced data. We ...

Prediction of software reliability using neural networks

Proceedings. 1991 International Symposium on Software Reliability Engineering, 1991

Software reliability growth models have achieved considerable importance in estimating reliability of software products. This paper explores the use of feed-forward neural networks as a model for software reliability growth prediction. To empirically evaluate the predictive capability of this new approach data sets from di erent software projects are used. The neural networks approach exhibits a consistent behavior in prediction and the predictive performance is comparable to that of parametric models.

Beyond heuristics: Learning to classify vulnerabilities and predict exploits

2010

The security demands on modern system administration are enormous and getting worse. Chief among these demands, administrators must monitor the continual ongoing disclosure of software vulnerabilities that have the potential to compromise their systems in some way. Such vulnerabilities include buffer overflow errors, improperly validated inputs, and other unanticipated attack modalities. In 2008, over 7,400 new vulnerabilities were disclosedwell over 100 per week. While no enterprise is affected by all of these disclosures, administrators commonly face many outstanding vulnerabilities across the software systems they manage. Vulnerabilities can be addressed by patches, reconfigurations, and other workarounds; however, these actions may incur down-time or unforeseen side-effects. Thus, a key question for systems administrators is which vulnerabilities to prioritize. From publicly available databases that document past vulnerabilities, we show how to train classifiers that predict whether and how soon a vulnerability is likely to be exploited. As input, our classifiers operate on high dimensional feature vectors that we extract from the text fields, time stamps, cross-references, and other entries in existing vulnerability disclosure reports. Compared to current industry-standard heuristics based on expert knowledge and static formulas, our classifiers predict much more accurately whether and how soon individual vulnerabilities are likely to be exploited.