CREST: The Centre for Research in Engineering Software Technologies Crest (original) (raw)

Papers by CREST: The Centre for Research in Engineering Software Technologies Crest

ESEM '21: Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2021

Background: Static Application Security Testing (SAST) tools purport to assist developers in dete... more Background: Static Application Security Testing (SAST) tools purport to assist developers in detecting security issues in source code. These tools typically use rule-based approaches to scan source code for security vulnerabilities. However, due to the significant shortcomings of these tools (i.e., high false positive rates), learning-based approaches for Software Vulnerability Prediction (SVP) are becoming a popular approach. Aims: Despite the similar objectives of these two approaches, their comparative value is unexplored. We provide an empirical analysis of SAST tools and SVP models, to identify their relative capabilities for source code security analysis. Method: We evaluate the detection and assessment performance of several common SAST tools and SVP models on a variety of vulnerability datasets. We further assess the viability and potential benefits of combining the two approaches. Results: SAST tools and SVP models provide similar detection capabilities, but SVP models exhibit better overall performance for both detection and assessment. Unification of the two approaches is difficult due to lacking synergies. Conclusions: Our study generates 12 main findings which provide insights into the capabilities and synergy of these two approaches. Through these observations we provide recommendations for use and improvement. CCS CONCEPTS • Security and privacy → Software security engineering; • Computing methodologies → Machine learning; • Software and its engineering → Software testing and debugging.

European Conference on Software Architecture, 2021

Blockchains have been increasingly employed in use cases at the network's edge, such as autonomou... more Blockchains have been increasingly employed in use cases at the network's edge, such as autonomous vehicles and edge computing. These use cases usually establish new blockchain networks due to operation costs, performance constraints, and the lack of reliable connectivity to public blockchains. The design of these edge blockchain networks heavily influences the quality attributes of blockchain-oriented software deployed upon them. This paper presents a taxonomy of edge blockchain network designs successfully utilized by the existing literature and analyzes their availability when facing failures at nodes and networks. This taxonomy benefits practitioners and researchers by offering a design guide for establishing blockchain networks for edge use cases.

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021

Several disastrous security attacks can be attributed to delays in patching software vulnerabilit... more Several disastrous security attacks can be attributed to delays in patching software vulnerabilities. While researchers and practitioners have paid significant attention to automate vulnerabilities identification and patch development activities of software security patch management, there has been relatively little effort dedicated to gain an in-depth understanding of the socio-technical aspects, e.g., coordination of interdependent activities of the patching process and patching decisions, that may cause delays in applying security patches. We report on a Grounded Theory study of the role of coordination in security patch management. The reported theory consists of four interrelated dimensions, i.e., causes, breakdowns, constraints, and mechanisms. The theory explains the causes that define the need for coordination among interdependent software/hardware components and multiple stakeholders' decisions, the constraints that can negatively impact coordination, the breakdowns in coordination, and the potential corrective measures. This study provides potentially useful insights for researchers and practitioners who can carefully consider the needs of and devise suitable solutions for supporting the coordination of interdependencies involved in security patch management. CCS CONCEPTS • Software and its engineering → System administration; • Security and privacy → Software security engineering; Vulnerability management; Social aspects of security and privacy.

ACM / IEEE International Symposium on Em- pirical Software Engineering and Measurement (ESEM) (ESEM ’21), Octo- ber 11–15, 2021, Bari, Italy. ACM, New York, NY, USA, , 2021

Background: Security tools play a vital role in enabling developers to build secure software. How... more Background: Security tools play a vital role in enabling developers to build secure software. However, it can be quite challenging to introduce and fully leverage security tools without affecting the speed or frequency of deployments in the DevOps paradigm. Aims: We aim to empirically investigate the key challenges practitioners face when integrating security tools into a DevOps workflow in order to provide recommendations for overcoming the challenges. Method: We conducted a study involving 31 systematically selected webinars on integrating security tools in DevOps. We used a qualitative data analysis method, i.e., thematic analysis, to identify the challenges and emerging solutions related to integrating security tools in rapid deployment environments. Results: We find that whilst traditional security tools are unable to cater for the needs of De-vOps, the industry is moving towards new generations of security tools that have started focusing on the needs of DevOps. We have developed a DevOps workflow that integrates security tools and a set of guidelines by synthesizing practitioners' recommendations in the analyzed webinars. Conclusion: Whilst the latest security tools are addressing some of the requirements of DevOps, there are many tool-related drawbacks yet to be adequately addressed. CCS CONCEPTS • Security and privacy → Software and application security.

Software traceability plays a critical role in software maintenance and evolution. We conducted a... more Software traceability plays a critical role in software maintenance and evolution. We conducted a systematic mapping study with six research questions to understand the benefits, costs, and challenges of using traceability in maintenance and evolution. We systematically selected, analyzed, and synthesized 63 studies published between

Evaluating the effectiveness of security awareness and training programs is critical for minimizi... more Evaluating the effectiveness of security awareness and training programs is critical for minimizing organizations' human security risk. Based on a literature review and industry interviews, we discuss current practices and devise guidelines for measuring the effectiveness of security training and awareness initiatives used by organizations.

26th Pacific Rim International Symposium on Dependable Computing (PRDC), 2021

Internet of Things (IoT) based applications face an increasing number of potential security risks... more Internet of Things (IoT) based applications face an increasing number of potential security risks, which need to be systematically assessed and addressed. Expert-based manual assessment of IoT security is a predominant approach, which is usually inefficient. To address this problem, we propose an automated security assessment framework for IoT networks. Our framework first leverages machine learning and natural language processing to analyze vulnerability descriptions for predicting vulnerability metrics. The predicted metrics are then input into a two-layered graphical security model, which consists of an attack graph at the upper layer to present the network connectivity and an attack tree for each node in the network at the bottom layer to depict the vulnerability information. This security model automatically assesses the security of the IoT network by capturing potential attack paths. We evaluate the viability of our approach using a proof-of-concept smart building system model which contains a variety of real-world IoT devices and potential vulnerabilities. Our evaluation of the proposed framework demonstrates its effectiveness in terms of automatically predicting the vulnerability metrics of new vulnerabilities with more than 90% accuracy, on average, and identifying the most vulnerable attack paths within an IoT network. The produced assessment results can serve as a guideline for cybersecurity professionals to take further actions and mitigate risks in a timely manner.

2021 IEEE 18th International Conference on Software Architecture Companion (ICSA-C), 2021

Collaborative autonomous systems operating at edges, e.g., TurtleBots, need adaptive security mec... more Collaborative autonomous systems operating at edges, e.g., TurtleBots, need adaptive security mechanisms (i.e., Confidentiality, Integrity, Availability) that meet the changing mission requirements and the available processing capacities. We have designed and implemented a security platform that supports secure communication among autonomous systems of robots (e.g., TurtleBots), security of Robot Operating System (ROS) communication network, the integrity of the information exchanged among the robots and secured availability of the data and access to services. For designing the security platform, we have used architecture and design patterns along with the respective security protocols. Our solution provides seamless security incorporation in heterogeneous collaborative autonomous entities. We have leveraged architectural strategies to incorporate publickey encryption, maintain global order of events and incorporate fault tolerance. We assert that the presented security platform can facilitate easy adoption of secure control, communication and information gathering in collaborative autonomous systems with resource constraint edge nodes.

The 42nd International Conference on Information Systems (ICIS'21), Austin, Texas, USA, 2021, 17, 2021

Despite sophisticated phishing email detection systems, and training and awareness programs, huma... more Despite sophisticated phishing email detection systems, and training and awareness programs, humans continue to be tricked by phishing emails. In an attempt to better understand why phishing email attacks still work and how best to mitigate them, we have carried out an empirical study to investigate people's thought processes when reading their emails. We used a scenario-based role-play "think aloud" method and follow-up interviews to collect data from 19 participants. The experiment was conducted using a simulated web email client, and real phishing and legitimate emails adapted to the given scenario. The analysis of the collected data has enabled us to identify eleven factors that influence people's response decisions to both phishing and legitimate emails. Furthermore, based on the user study findings, we discuss novel insights into flaws in the general email decision-making behaviors that could make people susceptible to phishing attacks.

Journal of Parallel and Distributed Computing, 2021

With the advent of Internet of Things (IoT) connecting billions of mobile and stationary devices ... more With the advent of Internet of Things (IoT) connecting billions of mobile and stationary devices to serve real-time applications, cloud computing paradigms face some significant challenges such as high latency and jitter, non-supportive location-awareness and mobility, and non-adaptive communication types. To address these challenges, edge computing paradigms, namely Fog Computing (FC), Mobile Edge Computing (MEC) and Cloudlet, have emerged to shift the digital services from centralized cloud computing to computing at edges. In this article, we analyze cloud and edge computing paradigms from features and pillars perspectives to identify the key motivators of the transitions from one type of virtualized computing paradigm to another one. We then focus on computing and network virtualization techniques as the essence of all these paradigms, and delineate why virtualization features, resource richness and application requirements are the primary factors for the selection of virtualization types in IoT frameworks. Based on these features, we compare the state-of-the-art research studies in the IoT domain. We finally investigate the deployment of virtualized computing and networking resources from performance perspective in an edge-cloud environment, followed by mapping of the existing work to the provided taxonomy for this research domain. The lessons from the reviewed are that the selection of virtualization technique, placement and migration of virtualized resources rely on the requirements of IoT services (i.e., latency, scalability, mobility, multi-tenancy, privacy, and security). As a result, there is a need for prioritizing the requirements, integrating different virtualization techniques, and exploiting a hierarchical edge-cloud architecture.

Many studies have developed Machine Learning (ML) approaches to detect Software Vulnerabilities (... more Many studies have developed Machine Learning (ML) approaches to detect Software Vulnerabilities (SVs) in functions and fine-grained code statements that cause such SVs. However, there is little work on leveraging such detection outputs for data-driven SV assessment to give information about exploitability, impact, and severity of SVs. The information is important to understand SVs and prioritize their fixing. Using large-scale data from 1,782 functions of 429 SVs in 200 real-world projects, we investigate ML models for automating function-level SV assessment tasks, i.e., predicting seven Common Vulnerability Scoring System (CVSS) metrics. We particularly study the value and use of vulnerable statements as inputs for developing the assessment models because SVs in functions are originated in these statements. We show that vulnerable statements are 5.8 times smaller in size, yet exhibit 7.5-114.5% stronger assessment performance (Matthews Correlation Coefficient (MCC)) than non-vulnerable statements. Incorporating context of vulnerable statements further increases the performance by up to 8.9% (0.64 MCC and 0.75 F1-Score). Overall, we provide the initial yet promising ML-based baselines for function-level SV assessment, paving the way for further research in this direction. CCS CONCEPTS • Security and privacy → Software security engineering.

ACM Computing Surveys, 2021

Research at the intersection of cybersecurity, Machine Learning (ML), and Software Engineering (S... more Research at the intersection of cybersecurity, Machine Learning (ML), and Software Engineering (SE) has recently taken significant steps in proposing countermeasures for detecting sophisticated data exfiltration attacks. It is important to systematically review and synthesize the ML-based data exfiltration countermeasures for building a body of knowledge on this important topic. Objective: This article aims at systematically reviewing ML-based data exfiltration countermeasures to identify and classify ML approaches, feature engineering techniques, evaluation datasets, and performance metrics used for these countermeasures. This review also aims at identifying gaps in research on ML-based data exfiltration countermeasures. Method: We used Systematic Literature Review (SLR) method to select and review 92 papers. Results: The review has enabled us to: (a) classify the ML approaches used in the countermeasures into data-driven, and behaviordriven approaches; (b) categorize features into six types: behavioral, content-based, statistical, syntactical, spatial, and temporal; (c) classify the evaluation datasets into simulated, synthesized, and real datasets; and (d) identify 11 performance measures used by these studies. Conclusion: We conclude that: (i) The integration of data-driven and behavior-driven approaches should be explored; (ii) There is a need of developing high quality and large size evaluation datasets; (iii) Incremental ML model training should be incorporated in countermeasures; (iv) Resilience to adversarial learning should be considered and explored during the development of countermeasures to avoid poisoning attacks; and (v) The use of automated feature engineering should be encouraged for efficiently detecting data exfiltration attacks.

Service-Oriented Computing: 19th International Conference, ICSOC 2021, 2021

Log analysis is an important technique that engineers use for troubleshooting faults of large-sca... more Log analysis is an important technique that engineers use for troubleshooting faults of large-scale service-oriented systems. In this study, we propose a novel semi-supervised log-based anomaly detection approach, LogDP, which utilizes the dependency relationships among log events and proximity among log sequences to detect the anomalies in massive unlabeled log data. LogDP divides log events into dependent and independent events, then learns normal patterns of dependent events using dependency and independent events using proximity. Events violating any normal pattern are identified as anomalies. By combining dependency and proximity, LogDP is able to achieve high detection accuracy. Extensive experiments have been conducted on real-world datasets, and the results show that LogDP outperforms six state-of-the-art methods.

Journal of Network and Computer Applications, 2021

Recent years have witnessed the emergence of the Internet of Things (IoT) systems that incorporat... more Recent years have witnessed the emergence of the Internet of Things (IoT) systems that incorporate blockchain (BC) elements in their architecture. Due to discrepancies between the requirements of IoT systems and the characteristics of BC networks, the motivations and design of these blockchain-enabled IoT systems (BC-IoT) are not only intriguing from a research perspective but also invaluable in practice. This paper presents an inductive study of the ''why'' and ''how'' of BC-IoT systems through a Systematic Literature Review of 120 peer-reviewed studies. To capture the diverse nature of BC-IoT integration, we proposed and applied a multi-perspective framework to analyse the existing systems. Regarding their motivations, we studied the improvement objectives and technical problems that drive the integration of BC. Regarding the design, we captured the position of BC within IoT systems as well as the content and processes that IoT systems offload to BC. As these dimensions are not mutually exclusive, they constitute a rich and multi-angle view of BC-IoT integration. Based on these findings, we defined 10 archetypes of BC-IoT systems that embody the core patterns of usage and configuration of BC in IoT systems.

Computer in Human Behaviour Journal, 2021

Mobile health apps (mHealth apps) are being increasingly adopted in the healthcare sector, enabli... more Mobile health apps (mHealth apps) are being increasingly adopted in the healthcare sector, enabling stakeholders such as medics and patients, to utilize health services in a pervasive manner. Despite having several benefits, mHealth apps entail significant security and privacy challenges that can lead to data breaches with serious social, legal, and financial consequences. This research presents an empirical investigation into security awareness of end-users of mHealth apps that are available on major mobile platforms. We conducted end-users' survey-driven case study research in collaboration with two mHealth providers in Saudi Arabia to survey 101 end-users, investigating their security awareness about (i) existing and desired security features, (ii) security-related issues, and (iii) methods to improve security knowledge. The results indicate that while security awareness among the different demographic groups was statistically significant based on their IT knowledge level and education level ,security awareness based on gender, age, and frequency of mHealth app usage was not statistically significant. We also found that the majority of the end-users are unaware of the existing security features provided (e.g., restricted app permissions); however, they desire usable security (e.g., biometric authentication) and are concerned about the privacy of their health information (e.g., data anonymization). End-users suggested that protocols such as two-factor authentication positively impact security but compromise usability. Security-awareness via peer guidance, or training from app providers can increase end-users' trust in mHealth apps. This research investigates human-centric knowledge based on a case study and provides a set of guidelines to develop secure and usable mHealth apps.

IEEE Internet of Things Journal, 2023

A benchmark study of modern distributed databases (e.g., Cassandra, MongoDB, Redis, and MySQL) is... more A benchmark study of modern distributed databases (e.g., Cassandra, MongoDB, Redis, and MySQL) is an important source of information for selecting the right technology for managing data in edge-cloud deployments. While most of the existing studies have investigated the performance and scalability of distributed databases in cloud computing, there is a lack of focus on resource utilization (e.g., energy, bandwidth, and storage consumption) of workload offloading for distributed databases deployed in edge-cloud environments. For this purpose, we conducted experiments on various physical and virtualized computing nodes including variously powered servers, Raspberry Pi, and hybrid cloud (OpenStack and Azure). Our extensive experimental results reveal insights into which database under which offloading scenario is more efficient in terms of energy, bandwidth, and storage consumption.

Association for Computational Linguistics, 2021

Adversarial Examples (AEs) generated by perturbing original training examples are useful in impro... more Adversarial Examples (AEs) generated by perturbing original training examples are useful in improving the robustness of Deep Learning (DL) based models. Most prior works, generate AEs that are either unconscionable due to lexical errors or semantically or functionally deviant from original examples. In this paper, we present ReinforceBug, a reinforcement learning framework, that learns a policy that is transferable on unseen datasets and generates utility-preserving and transferable (on other models) AEs. Our results show that our method is on average 10% more successful as compared to the state-of-the-art attack TextFooler. Moreover, the target models have on average 73.64% confidence in wrong prediction, the generated AEs preserve the functional equivalence and semantic similarity (83.38%) to their original counterparts, and are transferable on other models with an average success rate of 46%.

Journal of Systems and Software, Volume 200, June, 2023

Runtime software patching aims to minimize or eliminate service downtime, user interruptions and ... more Runtime software patching aims to minimize or eliminate service downtime, user interruptions and potential data losses while deploying a patch. Due to modern software systems' high variance and heterogeneity, no universal solutions are available or proposed to deploy and execute patches at runtime. Existing runtime software patching solutions focus on specific cases, scenarios, programming languages and operating systems. This paper aims to identify, investigate and synthesize state-of-the-art runtime software patching approaches and gives an overview of currently unsolved challenges. It further provides insights into multiple aspects of runtime patching approaches such as patch scales, general strategies and responsibilities. This study identifies seven levels of granularity, two key strategies providing a conceptual model of three responsible entities and four capabilities of runtime patching solutions. Through the analysis of the existing literature, this research also reveals open issues hindering more comprehensive adoption of runtime patching in practice. Finally, it proposes several crucial future directions that require further attention from both researchers and practitioners.

Journal of Systems and Software, 2023

Background: Seeking an appropriate architecture for the design of software is always a challenge.... more Background: Seeking an appropriate architecture for the design of software is always a challenge. Although microservices are claimed to be a lightweight architecture style that can improve current practices with several characteristics, many practices are based on different circumstances and reflect variant effects. Empirical inquiry gives us a systematic insight into industrial practices and sufferings on microservices. Objective: This study is to investigate the gaps between ideal visions and real industrial practices in microservices and what expenses microservices bring to industrial practitioners. Method: We carried out a series of industrial interviews with practitioners from 20 software companies. The collected data were then codified using qualitative methods. Results: Eight pairs of common practices and pains of microservices in industry were obtained after synthesizing the rich and detailed data collected. Five aspects that require careful decisions were extracted to help practitioners balance the possible benefits and pains of MSA. Five research directions that need further exploration were identified based on the pains associated with MSA. Conclusion: While the benefits of microservices are confirmed from the point of view of practitioners, decisions should be carefully made and the possible problems identified must be addressed with additional expense from experience. Furthermore, some of the topics and pains outlined, e.g., systematic evaluation and assessment, organizational transformation, decomposition, distributed monitoring, and bug localization, may inspire researchers to conduct further research.

Computer Networks, 2023

The (logically) centralised architecture of the software-defined networks makes them an easy targ... more The (logically) centralised architecture of the software-defined networks makes them an easy target for packet injection attacks. In these attacks, the attacker injects malicious packets into the SDN network to affect the services and performance of the SDN controller and overflow the capacity of the SDN switches. Such attacks have been shown to ultimately stop the network functioning in real-time, leading to network breakdowns. There have been significant works on detecting and defending against similar DoS attacks in non-SDN networks, but detection and protection techniques for SDN against packet injection attacks are still in their infancy. Furthermore, many of the proposed solutions have been shown to be easily bypassed by simple modifications to the attacking packets or by altering the attacking profile. In this paper, we develop novel Graph Convolutional Neural Network models and algorithms for grouping network nodes/users into security classes by learning from network data. We start with two simple classes-nodes that engage in suspicious packet injection attacks and nodes that are not. From these classes, we then partition the network into separate segments with different security policies using distributed Ryu controllers in an SDN network. We show in experiments on an emulated SDN that our detection solution outperforms alternative approaches with above 99% detection accuracy on various types (both old and new) of injection attacks. More importantly, our mitigation solution maintains continuous functions of non-compromised nodes while isolating compromised/suspicious nodes in real-time. All code and data are publicly available for reproducibility of our results.