A multi-stage anomaly detection scheme for augmenting the security in IoT-enabled applications (original) (raw)

Highlights

•
A new ensemble based anomaly detection technique i.e. BFA-PDBSCAN is proposed.
•
Selection of relevant features from the dataset is done using Boruta algorithm.
•
Extended k-medoid algorithm along with Firefly inspired approach performs partitioning.
•
LSH and k-distance graph are employed to solve the traditional problems of DBSCAN.
•
Analysis on several datasets proves the effectiveness of the proposed model.

Abstract

The synergy between data security and high intensive computing has envisioned the way to robust anomaly detection schemes which in turn necessitates the need for efficient data analysis. Data clustering is one of the most important components of data analytics, and plays an important role in various Internet of Things (IoT)-enabled applications such as-Industrial IoT, Smart Grids, Connected Vehicles, etc. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one such clustering technique which is widely used to detect anomalies in large-scale data. However, the traditional DBSCAN algorithm suffers from the nearest neighbor search and parameter selection problems, which may cause the performance of any implemented solution in this environment to deteriorate. To remove these gaps, in this paper, a multi-stage model for anomaly detection has been proposed by rectifying the problems incurred in traditional DBSCAN. In the first stage of the proposed solution, Boruta algorithm is used to capture the relevant set of features from the dataset. In the second stage, firefly algorithm, with a Davies–Bouldin Index based K-medoid approach, is used to perform the partitioning. In the third stage, a kernel-based locality sensitive hashing is used along with the traditional DBSCAN to solve the problem of the nearest neighbor search. Finally, the resulting set of the nearest neighbors are used in k-distance graph to determine the desired set of parameters, i.e., Eps (maximum radius of the neighborhood) and MinPts (minimum number of points in Eps neighborhood) for DBSCAN. Several sets of experiments have been performed on different datasets to demonstrate the effectiveness of the proposed scheme.

Introduction

Internet of Things (IoT) is the next revolutionary paradigm that aims to connect every possible device on Earth with the Internet using IPv6 technology. The connected devices in IoT are referred to as “Smart devices” having computational and communication capabilities without involving human-to-human or human-to- computer interactions [1]. This in turn has led to the evolution of limitless applications of IoT ranging from smart homes, e-healthcare, smart cities, smart grids, Industrial IoT (IIoT), Internet of Vehicles (IoV), etc. [2], [3]. Nevertheless, the information generated from these application domains is essentially heterogeneous and humongous in nature; which requires powerful data processing, storage and analysis to build smart systems and derive the best potential use of IoT [4]. According to recent estimations shared by Gartner [5], nearly 20+ billion devices are projected to be connected by 2020 which in turn would have imminent effect on the 3 V’s of Big Data, i.e., Volume, Velocity, and Variety [6]. This will lead to an explosive increase in the data movement across different smart devices with respect to network activities such as-search requests, logs, location data, tweets, e-commerce, data footprint of individuals, etc. [7].

In addition, the amount of data being produced everyday has increased from terabytes to petabytes from different IoT-enabled sensors, actuators, and devices. These objects are the largest sources of information flow across the Internet; and it is projected that every person will have an average of 6–7 smart devices in the near future [8]. In a nutshell, it can be concluded that, in this age of in-stream data [9] and IoT, there is no limit on the amount of data coming from varied sources. Moreover, the complexity of the data and the amount of noise associated with it is not predefined. Further, the security risks associated with the underlying network traffic have increased manifold in the last decade. Even a minor security risk can trigger consequential challenges ranging from network congestion and downtime issues to severe data and financial losses [10], [11], [12].

The statistics shared by Vectra Network in a post-intrusion report revealed a significant rise in the number of network risks and vulnerabilities [13]. In another report [14], Hewlett Packard Enterprise (HPE) officials reported the emergence of security threats as a major concern for almost every domain; in which the end devices are interconnected with each other. In this context, anomaly detection has surfaced as a recent trend to discover threats/risks at initial stages. In contrast of its traditional counterparts (i.e., signature-based schemes), anomaly detection measures help in proactive analysis of network streams to identify unwanted security risks in the underlying network, thereby enhancing the reliability and safety of the systems. It has been a key research area for many years; however, with an emergence and popularity of Internet of things (IoT), the speed of data flow among different devices continues to exponentially increase. This increase in data movement has led to a significant increase in the penetration of attacks. Hence, research communities are now more focused on developing techniques that can efficiently analyze and detect the underlying traffic patterns [15], [16], [17].

Additionally, different kind of anomalies showcase discrete characteristics under varying network scenarios. To detect these anomalies, numerous techniques have been exploited ranging from statistics, Machine Learning (ML), data mining, information theory, etc. [18], [19]. However, designing generic models for network anomaly detection often possess challenges in identifying various security attack vectors [20], [21]. In this context, model-based approaches are found to be less portable and inappropriate for different application domains since they are susceptible to even minor alterations in the attributes of the underlying network traffic. Thus, in order to cater these challenges, non-parametric approaches have emerged as possible solutions due to their capability to learn, adapt, recognize, and optimize the decisions from the available data inputs by themselves [22]. In this direction, McKinsey Global Institute asserted on the notion that ML, data mining, and predictive analytics will be the major drivers of the next-generation paradigm for innovation and creativity [23]. These techniques adhere to the unknown network conditions to provide a mapping between network states. The pertinent information contained in the mapping measurements are thereby extracted to accurately estimate the network behavior [24].

Section snippets

Several data mining techniques have been proposed for the predictive analysis of anomalies in data. Amongst these, clustering and classification techniques are the most widely used [25], [26]. The reason behind the wide applicability of clustering techniques is that these techniques provide deep insights into the distribution of data without any prior knowledge of data labels, whereas classification techniques solve the purpose when labeled data is available. For clustering, distance-based

Framework for BFA-PDBSCAN

This section presents the framework for the proposed BFA-PDBSCAN algorithm. The flow of the proposed anomaly detection technique is shown in Fig. 1. This technique works in multiple stages, detailed below:

•
Stage 1: Feature selection is considered as the most crucial step in the predictive modeling. Thus, Boruta algorithm is employed in the initial stages of BFA-PDBSCAN technique to reduce the dimensionality of the dataset.
•
Stage 2: The resultant set of features from Stage 1 are fed to the

The proposed technique: Boruta firefly aided partitioning DBSCAN (BFA-PDBSCAN)

In this section, BFA-PDBSCAN is introduced; a modified version of the DBSCAN clustering technique, it works in multiple stages to detect anomalies.

Experiments and discussions

In this section, the effectiveness of the proposed anomaly detection technique has been evaluated by employing the following parameters. First, the query time to compute NNs is evaluated. Second, precision is computed by varying the number of hash tables. Then, six datasets varying in size, dimensionality, and attributes are employed to evaluate the performance of the proposed technique under various evaluation parameters. Further, the rationality of the proposed technique is illustrated by a

Conclusion

In this research work, a novel multi-stage anomaly detection technique (BFA-PDBSCAN) has been proposed for the seamless execution of computations on IoT-enabled applications. The first stage of BFA-PDBSCAN technique deployed Boruta algorithm for feature selection, which captures all the relevant features from the dataset. The resultant set of features extracted from this phase are then fed to the second stage, where the EKM algorithm is employed to partition the dataset. As KM suffers from

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research work was supported by the Visvesvaraya PhD Scheme, Ministry of Electronics and Information Technology (MeitY), Government of India implemented by Digital India Corporation (formerly Media Lab Asia).

Cited by (83)

Smart anomaly detection in sensor systems: A multi-perspective review

2021, Information Fusion
The authors introduce a real-time multi-stage anomaly detection scheme which tries to cover the gaps with the traditional Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. In [148] an ensemble based anomaly detection technique is proposed to identify the malicious behavior of nodes in the cloud environment. RBM and Unscented Kalman Filter are used for feature selection and optimization. View all citing articles on Scopus

Sahil Garg received his B.Tech degree from Maharishi Markandeshwar University, Mullana, India, in 2012; his M.Tech degree from Punjab Technical University, Jalandhar, India in 2014; and his Ph.D. from Thapar Institute of Engineering & Technology (Deemed to be University), Patiala, India, in 2018, all in computer science and engineering. He is currently working as a Postdoctoral Research Fellow with Department of Electrical Engineering, École de technologie supérieure, Université du Québec, Montréal, Canada. He has many research contributions in the area of Machine Learning, Big Data Analytics, Knowledge Discovery, Cloud Computing, Internet of Things, and Vehicular Ad-hoc Networks. Some of his research findings are published in top-cited journals such as IEEE TII, IEEE TMM, IEEE TVT, IEEE TNSM, IEEE TSUSC, IEEE IoT Journal, IEEE Systems Journal, IEEE Communications Magazine, IEEE Network Magazine, IEEE Wireless Communications, IEEE Consumer Electronics Magazine, FGCS, JPDC, and Information Sciences including various International conferences of repute such as-IEEE Globecom, IEEE ICC, IEEE WCNC, IEEE VTC, IEEE Infocom Workshops, ACM MobiCom Workshops, ACM MobiHoc Workshops, etc. He was the recipient of prestigious Visvesvaraya PhD fellowship from the Ministry of Electronics & Information Technology under Government of India (2016-2018). For his research, he has also been awarded the IEEE ICC best paper award in 2018 at Kansas City, USA. He is currently serving as an Associate Editor for Wiley’s International Journal of Communication Systems (IJCS) and Springer’s Human-centric Computing and Information Sciences (HCIS). He has been a lead guest editor for a number of special issues in prestigious journals and magazines such as IEEE TII, IEEE T-ITS, IEEE IoT Journal, IEEE Network, and Elsevier’s FGCS. He also serves as a Workshop Chair, Publication Chair, Publicity Co-Chair, and TPC member for a number of international conferences including IEEE Infocom, IEEE ICC, IEEE Globecom, etc. He is a member of IEEE, IEEE ComSoc, IEEE Computer, IEEE IES, IEEE Smart Grid Community, ACM and IAENG.

Kuljeet Kaur received the B.Tech degree in computer science and engineering from Punjab Technical University, Jalandhar, India, in 2011 and the M.E. (Information Security) and Ph.D. (Computer Science and Engineering) degrees from Thapar Institute of Engineering and Technology (Deemed to be University), Patiala, India, in 2015 and 2018, respectively. She is currently working as a NSERC Postdoctoral Research Fellow with Department of Electrical Engineering, École de technologie supérieure, Université du Québec, Montréal, Canada. Her main research interests include Cloud Computing, Energy Efficiency, Smart Grid, Frequency Support, and Vehicle-to-Grid. Dr. Kaur has secured a number of research articles in top-tier journals such as IEEE Wireless Communications, IEEE TII, IEEE TVT, IEEE TMM, IEEE TSG, IEEE Systems Journal, IEEE IoT Journal, IEEE Communications Magazine, IEEE Wireless Communications, IEEE Network, IEEE PS, Springer PPNA, etc., and various International conferences including IEEE Globecom, IEEE ICC, IEEE PES GM, IEEE WCNC, IEEE Infocom Workshops, ACM MobiCom Workshops, ACM MobiHoc workshops, etc. During her PhD, she received two prestigious fellowships, i.e., INSPIRE fellowship from Department of Science & Technology, India (in 2015) and research scholarship from Tata Consultancy Services (TCS) (from 2016–2018). Dr. Kaur also received the IEEE ICC best paper award in 2018 at Kansas City, USA. She is a member of IEEE, IEEE Communications Society, IEEE Computer, IEEE Women in Engineering, IEEE Software Defined Networks Community, IEEE Smart Grid Community, ACM and IAENG.

Shalini Batra received the Ph.D. Degree in computer science and engineering from Thapar University, Patiala, India, in 2012. She is currently working as an Associate Professor with the Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology (Deemed University), Patiala, India. She has guided many research scholars leading to Ph.D. and M.E./M.Tech. She has authored more than 60 research papers published in various conferences and journals. Her research interests include machine learning, web semantics, big data analytics and vehicular ad-hoc networks.

Georges Kaddoum received the B.Sc. degree in electrical engineering from the École Nationale Supérieure de Techniques Avancées, Brest, France, the M.S. degree in telecommunications and signal processing (circuits, systems, and signal processing) from the Université de Bretagne Occidentale and Telecom Bretagne, Brest, in 2005, and the Ph.D. degree (Hons.) in signal processing and telecommunications from the National Institute of Applied Sciences, University of Toulouse, Toulouse, France, in 2009. Since 2010, he has been a Scientific Consultant of space and wireless telecommunications for several U.S. and Canadian companies. He is currently an Associate Professor of Electrical Engineering with the École de technologie supérieure, University of Quebec, Montréal, QC, Canada. He has authored over 100 journal and conference papers. He holds two pending patents. His recent research activities cover mobile communication systems, secure transmissions, and space communications and navigation. In 2014, he received the ETS Research Chair in physical layer security for wireless networks. He received the Best Paper Award at the 2014 IEEE International Conference on Wireless and Mobile Computing, Networking, and Communications, with three co-authors, and the 2015 IEEE TRANSACTIONS ON COMMUNICATIONS Top Reviewer Award. He is currently serving as an Editor of the IEEE COMMUNICATIONS LETTERS.

Neeraj Kumar received his Ph.D. in CSE from Shri Mata Vaishno Devi University, Katra (J & K), India. He was a post doctoral research fellow in Coventry University, Coventry, UK. He is currently a full Professor in the Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology (Deemed University), Patiala (Pb.), India. He has published more than 150 technical research papers in leading journals and conferences from IEEE, Elsevier, Springer, John Wiley etc. Some of his research findings are published in top cited journals such as IEEE TIE, IEEE TDSC, IEEE TITS, IEEE TCE, IEEE Netw., IEEE Comm., IEEE WC, IEEE IoTJ, IEEE SJ, FGCS, JNCA, and ComCom. He has guided many research scholars leading to Ph.D. and M.E./M.Tech. His research is supported by fundings from Tata Consultancy Service and Department of Science & Technology.

Azzedine Boukerche is a Full Professor and holds a Canada Research Chair position in distributed simulation and wireless and mobile networking at the University of Ottawa. He is the Founding Director of PARADISE Research Laboratory at Ottawa U. Prior to this, he held a faculty position at the University of North Texas, USA. He worked as a Senior Scientist at the Simulation Sciences Division, Metron Corporation located in San Diego. He was also employed as a Faculty at the School of Computer Science McGill University, and taught at Polytechnic of Montreal. He spent a year at the JPL/NASA-California Institute of Technology where he contributed to a project centered about the specification and verification of the software used to control interplanetary spacecraft operated by JPL/NASA Laboratory. His current research interests include sensor networks, mobile ad hoc networks, mobile and pervasive computing, wireless multimedia, QoS service provisioning, performance evaluation and modeling of large-scale distributed systems, distributed computing, large-scale distributed interactive simulation, and parallel discrete event simulation. Dr. Boukerche has published several research papers in these areas.

View full text