Antonio Montieri | Università degli Studi di Napoli "Federico II" (original) (raw)
Conference Proceedings by Antonio Montieri
International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), 2024
IEEE International Conference on Communication (IEEE ICC), 2023
The lifestyle change originated from the COVID-19 pandemic has caused a measurable impact on Inte... more The lifestyle change originated from the COVID-19 pandemic has caused a measurable impact on Internet traffic in terms of volume and application mix, with a sudden increase in usage of communication-and-collaboration apps. In this work, we focus on four of these apps (Skype, Teams, Webex, and Zoom), whose traffic we collect, reliably label at fine (i.e. per-activity) granularity, and analyze from the viewpoint of traffic prediction. The outcome of this analysis is informative for a number of network management tasks, including monitoring, planning, resource provisioning, and (security) policy enforcement. To this aim, we employ state-of-the-art multitask deep learning approaches to assess to which degree the traffic generated by these apps and their different use cases (i.e. activities: audio-call, video-call, and chat) can be forecast at packet level. The experimental analysis investigates the performance of the considered deep learning architectures, in terms of both traffic-prediction accuracy and complexity, and the related trade-off. Equally important, our work is a first attempt at interpreting the results obtained by these predictors via eXplainable Artificial Intelligence (XAI).
The 2nd IEEE International Workshop on "Distributed Intelligent Systems (DistInSys)", 2022
With the increasing popularity of mobile-app services, malicious software is increasing as well. ... more With the increasing popularity of mobile-app services, malicious software is increasing as well. Accordingly, the interest of the scientific community in Machine and Deep Learning solutions for detecting and classifying malware traffic is growing. In this work, we provide a fair assessment of the performance of a number of data-driven strategies to detect and classify Android malware traffic. Three models are taken into account (Decision Tree, Random Forest, and 1-D Convolutional Neural Network) considering both flat (i.e. non-hierarchical) and hierarchical approaches. The experimental analysis performed using a stateof-art dataset (CIC-AAGM2017) reports that Random Forest exhibits the best performance in a flat setup, while moving to a hierarchical approach could cause significant variation in precision and recall. Such results push for further investigating advanced hierarchical setups and learning schemes.
The 27th IEEE Symposium on Computers and Communications (IEEE ISCC 2022), 2022
The generation of synthetic network traffic is necessary to several fundamental networking activi... more The generation of synthetic network traffic is necessary to several fundamental networking activities, ranging from device testing to path monitoring, with implications on security and management. While literature focused on high-rate traffic generation, for many use cases accurate traffic generation is of importance instead. These scenarios have expanded with Network Function Virtualization, Software Defined Networking, and Cloud applications, which introduce further causes for alterations of generated traffic. Such causes are described and experimentally evaluated in this work, where the generation accuracy of D-ITG, an open-source software generator, is investigated in a virtualized environment. A definition of accuracy in terms of Mean Absolute Percentage Error of the sequences of Payload Lengths (PLs) and Inter-Departure Times (IDTs) is exploited to this end. The tool is found accurate for all PLs and for IDTs greater than one millisecond, and after the correction of a systematic error, also from 100 µs.
IEEE International Conference on Computer Communications 2022 (INFOCOM 2022) - The Tenth International Workshop on Security and Privacy in Big Data (BigSecurity), 2022
In recent years, Internet of Things (IoT) traffic has increased dramatically and is expected to g... more In recent years, Internet of Things (IoT) traffic has increased dramatically and is expected to grow further in the next future. Because of their vulnerabilities, IoT devices are often the target of cyber-attacks with dramatic consequences. For this reason, there is a strong need for powerful tools to guarantee a good level of security in IoT networks. Machine and deep learning approaches promise good performance for such a complex task. In this work, we employ state-of-art traffic classifiers based on deep learning and assess their effectiveness in accomplishing IoT attack classification. We aim to recognize different attack classes and distinguish them from benign network traffic. In more detail, we utilize effective and unbiased input data that allow fast (i.e. "early") detection of anomalies and we compare performance with that of traditional (i.e. "postmortem") machine learning classifiers. The experimental results highlight the need for advanced deep learning architectures fed with input data specifically tailored and designed for IoT attack classification. Furthermore, we perform an occlusion analysis to assess the influence on the performance of some network layer fields and the possible bias they may introduce.
IEEE International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 2021
The lockdowns and lifestyle changes during the COVID-19 pandemic have caused a measurable impact ... more The lockdowns and lifestyle changes during the COVID-19 pandemic have caused a measurable impact on Internet traffic in terms of volumes and application mix, with a sudden increase of usage of communication and collaboration apps. In this work, we focus on five such apps, whose traffic we collect, reliably label at fine granularity (per-activity), and analyze from the viewpoint of traffic classification. To this aim, we employ state-of-art deep learning approaches to assess to which degree the apps, their different use cases (activities), and the pairs app-activity can be told apart from each other. We investigate the early behavior of the biflows composing the traffic and the effect of tuning the dimension of the input, via a sensitivity analysis. The experimental analysis highlights the figures of the different architectures, in terms of both traffic-classification performance and complexity w.r.t. different classification tasks, and the related trade-off. The outcome of this analysis is informative for a number of network management tasks, including monitoring, planning, resource provisioning, and (security) policy enforcement.
IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI), 2021
In this work, we address the characterization and modeling of the network traffic generated by co... more In this work, we address the characterization and modeling of the network traffic generated by communication and collaboration apps which have been the object of recent traffic surge due to the COVID-19 pandemic spread. In detail, focusing on five of the top popular mobile apps (collected via the MIRAGE architecture) used for working/studying during the pandemic time frame, we provide characterization at trace and flow level, and modeling by means of Multimodal Markov Chains for both apps and related activities. The results highlight interesting peculiarities related to both the running applications and the specific activities performed. The outcome of this analysis constitutes the stepping stone toward a number of tasks related to network management and traffic analysis, such as identification/classification and prediction, and modern IT management in general.
IEEE International Conference on Communications (ICC), 2021
Traffic Classification (TC), i.e. the collection of procedures for inferring applications and/or ... more Traffic Classification (TC), i.e. the collection of procedures for inferring applications and/or services generating network traffic, represents the workhorse for service management and the enabler for valuable profiling information. Sadly, the growing trend toward encrypted protocols (e.g. TLS) and the evolving nature of network traffic make TC design solutions based on payload-inspection and machine learning, respectively, unsuitable. Conversely, Deep Learning (DL) is currently foreseen as a viable means to design traffic classifiers based on automatically-extracted features, reflecting the complex patterns distilled from the multifaceted (encrypted) traffic nature, implicitly carrying information in "multimodal" fashion. To this end, in this paper a novel multimodal DL approach for multitask TC is explored. The latter is able to capitalize traffic data heterogeneity (by learning both intra-and inter-modality dependencies), overcome performance limitations of existing (myopic) single-modality DL-based TC proposals, and solve different traffic categorization problems associated with different providers' desiderata. Based on a real dataset of encrypted traffic, we report performance gains of our proposal over (a) state-of-art multitask DL architectures and (b) multitask extensions of single-task DL baselines (both based on single-modality philosophy).
4th IEEE International Conference on Computing, Communication and Security (ICCCS), 2019
Network traffic analysis, i.e. the umbrella of procedures for distilling information from network... more Network traffic analysis, i.e. the umbrella of procedures for distilling information from network traffic, represents the enabler for highly-valuable profiling information, other than being the workhorse for several key network management tasks. While it is currently being revolutionized in its nature by the rising share of traffic generated by mobile and hand-held devices, existing design solutions are mainly evaluated on private traffic traces, and only a few public datasets are available, thus clearly limiting repeatability and further advances on the topic. To this end, this paper introduces and describes MIRAGE, a reproducible architecture for mobile-app traffic capture and ground-truth creation. The outcome of this system is MIRAGE-2019, a human-generated dataset for mobile traffic analysis (with associated ground-truth) having the goal of advancing the state-of-the-art in mobile app traffic analysis. A first statistical characterization of the mobile-app traffic in the dataset is provided in this paper. Still, MIRAGE is expected to be capitalized by the networking community for different tasks related to mobile traffic analysis.
IEEE/ACM Network Traffic Measurement and Analysis Conference (TMA 2019), 2019
The spread of handheld devices has led to the unprecedented growth of traffic volumes traversing ... more The spread of handheld devices has led to the unprecedented growth of traffic volumes traversing both local networks and the Internet, appointing mobile traffic classification as a key tool for gathering highly-valuable profiling information, other than traffic engineering and service management. However, the nature of mobile traffic severely challenges state-of-art Machine-Learning (ML) approaches, since the quickly evolving and expanding set of apps generating traffic hinders ML-based approaches, that require domain-expert design. Deep Learning (DL) represents a promising solution to this issue, but results in higher completion times, in turn suggesting the application of the Big-Data (BD) paradigm. In this paper, we investigate for the first time BD-enabled classification of encrypted mobile traffic using DL from a general standpoint, (a) defining general design guidelines , (b) leveraging a public-cloud platform, and (c) resorting to a realistic experimental setup. We found that, while BD represents a transparent accelerator for some tasks, this is not the case for the training phase of DL architectures for traffic classification, requiring a specific BD-informed design. The experimental setup is built upon a three-dimensional investigation path in the BD adoption, namely: (i) completion time, (ii) deployment costs, and (iii) classification performance, highlighting relevant non-trivial trade-offs.
Journal papers by Antonio Montieri
IEEE Open Journal of the Communications Society, 2024
In today's digital landscape, critical services are increasingly dependent on network connectivit... more In today's digital landscape, critical services are increasingly dependent on network connectivity, thus cybersecurity has become paramount. Indeed, the constant escalation of cyberattacks, including zeroday exploits, poses a significant threat. While Network Intrusion Detection Systems (NIDSs) leveraging machine-learning and deep-learning models have proven effective in recent studies, they encounter limitations such as the need for abundant samples of malicious traffic and full retraining upon encountering new attacks. These limitations hinder their adaptability in real-world scenarios. To address these challenges, we design a novel NIDS capable of promptly adapting to classify new attacks and provide timely predictions. Our proposal for attack-traffic classification adopts Few-Shot Class-Incremental Learning (FSCIL) and is based on the Rethinking Few-Shot (RFS) approach, which we experimentally prove to overcome other FSCIL state-of-the-art alternatives based on either meta-learning or transfer learning. We evaluate the proposed NIDS across a wide array of cyberattacks whose traffic is collected in recent publicly available datasets to demonstrate its robustness across diverse network-attack scenarios, including malicious activities in an Internet-of-Things context and cyberattacks targeting servers. We validate various design choices as well, involving the number of traffic samples per attack available, the impact of the features used to represent the traffic objects, and the time to deliver the classification verdict. Experimental results witness that our proposed NIDS effectively retains previously acquired knowledge (with over 94% F1-score) while adapting to new attacks with only few samples available (with over 98% F1-score). Thus, it outperforms non-FSCIL state of the art in terms of classification effectiveness and adaptation time. Moreover, our NIDS exhibits high performance even with traffic collected within short time frames, achieving 95% F1score while reducing the time-to-insight. Finally, we identify possible limitations likely arising in specific application contexts and envision promising research avenues to mitigate them.
Journal of Information Security and Applications, 2024
The Internet of Things (IoT) is a key enabler for critical systems, but IoT devices are increasin... more The Internet of Things (IoT) is a key enabler for critical systems, but IoT devices are increasingly targeted by cyberattacks due to their diffusion and hardware and software limitations. This calls for designing and evaluating new effective approaches for protecting IoT systems at the network level. While recent proposals based on machine-and deep-learning provide effective solutions to the problem of attack-traffic classification, their adoption is severely challenged by the amount of labeled traffic they require to train the classification models. In fact, this results in the need for collecting and labeling large amounts of malicious traffic, which may be hindered by the nature of the malware possibly generating little and hard-to-capture network activity. To tackle this challenge, we adopt few-shot learning approaches for attack-traffic classification, with the objective to improve detection performance for attack classes with few labeled samples. We leverage advanced deep-learning architectures to perform feature extraction and provide an extensive empirical study—using recent and publicly available datasets—comparing the performance of an ample variety of solutions based on different learning paradigms, and exploring a number of design choices in depth (impact of embedding function, number of classes of attacks, or number of attack samples). In comparison to non-few-shot baselines, we achieve a relative improvement in the F1-score ranging from 8% to 27%.
Transactions on Emerging Telecommunications Technologies, 2024
The Fifth Generation (5G) of mobile communications is the most exciting emerging technology for r... more The Fifth Generation (5G) of mobile communications is the most exciting emerging technology for researchers and scientists to get the full benefit of a network system. However, 5G networks confront massive threats and vulnerabilities including protection, privacy, and secrecy. To face these challenges in the increasingly interconnected Internet of Things (IoT) scenario, we aim to leverage state-of-the-art technologies as Software Defined Networking (SDN) in conjunction with Network Function Virtualization (NFV), Blockchain, and Machine Learning (ML). Indeed, these technologies convey a robust and secure setting in the networking platform enabling to manage several criticalities related to security, privacy, flexibility, and performance. In light of these considerations, in this paper, we propose the "BlockSD-5GNet" architecture to efficiently improve the security of a 5G network and to exploit the combined advantages of Blockchain, SDN, NFV, and ML. In the proposed architecture, the SDN helps to manage the network by dividing it into data plane and control plane, while the Blockchain guarantees improved security and confidentiality. Therefore, the "BlockSD-5GNet" architecture can both secure sensitive data and attain reliable data transfer within and between the 5G network-infrastructure planes. Additionally, an ML module is integrated into the SDN controller to estimate network bandwidth and assist the administrator in taking effective decisions and satisfying high-bandwidth demand. We assess the performance of the "BlockSD-5GNet" architecture via an experimental evaluation performed in a simulation environment, and show the effectiveness of the proposed solution in comparison with baseline schemes. Finally, we also demonstrate the capability of different ML models in bandwidth prediction.
IEEE Open Journal of the Communications Society, 2024
Significant transformations in lifestyle have reshaped the Internet landscape, resulting in notab... more Significant transformations in lifestyle have reshaped the Internet landscape, resulting in notable shifts in both the magnitude of Internet traffic and the diversity of apps utilized. The increased adoption of communication-and-collaboration apps, also fueled by lockdowns in the COVID pandemic years, has heavily impacted the management of network infrastructures and their traffic. A notable characteristic of these apps is their multi-activity nature, e.g., they can be used for chat and (interactive) audio/video in the same usage session: predicting and managing the traffic they generate is an important but especially challenging task. In this study, we focus on real data from four popular apps belonging to the aforementioned category: Skype, Teams, Webex, and Zoom. First, we collect traffic data from these apps, reliably label it with both the app and the specific user activity and analyze it from the perspective of traffic prediction. Second, we design data-driven models to predict this traffic at the finest granularity (i.e. at packet level) employing four advanced multitask deep learning architectures and investigating three different training strategies. The trade-off between performance and complexity is explored as well. We publish the dataset and release our code as open source to foster the replicability of our analysis. Third, we leverage the packet-level prediction approach to perform aggregate prediction at different timescales. Fourth, our study pioneers the trustworthiness analysis of these predictors via the application of eXplainable Artificial Intelligence to (a) interpret their forecasting results and (b) evaluate their reliability, highlighting the relative importance of different parts of observed traffic and thus offering insights for future analyses and applications. The insights gained from the analysis provided with this work have implications for various network management tasks, including monitoring, planning, resource allocation, and enforcing security policies.
IEEE Communications Magazine, 2023
Traffic classification (TC) is pivotal for network traffic management and security. Over time, TC... more Traffic classification (TC) is pivotal for network traffic management and security. Over time, TC solutions leveraging Artificial Intelligence (AI) have undergone significant advancements, primarily fueled by Machine Learning (ML). This paper analyzes the history and current state of AI-powered TC on the Internet, highlighting unresolved research questions. Indeed, despite extensive research, key desiderata goals to product-line implementations remain. AI presents untapped potential for addressing the complex and evolving challenges of TC, drawing from successful applications in other domains. We identify novel ML topics and solutions that address unmet TC requirements, shaping a comprehensive research landscape for the TC future. We also discuss the interdependence of TC desiderata and identify obstacles hindering AI-powered next-generation solutions. Overcoming these roadblocks will unlock two intertwined visions for future networks: self-managed and human-centered networks.
Elsevier Computers and Security, 2023
The Internet of Things (IoT) is a key enabler in closing the loop in Cyber-Physical Systems, prov... more The Internet of Things (IoT) is a key enabler in closing the loop in Cyber-Physical Systems, providing "smartness" and thus additional value to each monitored/controlled physical asset. Unfortunately, these devices are more and more targeted by cyberattacks because of their diffusion and of the usually limited hardware and software resources. This calls for designing and evaluating new effective approaches for protecting IoT systems at the network level (Network Intrusion Detection Systems, NIDSs). These in turn are challenged by the heterogeneity of IoT devices and the growing volume of transmitted data. To tackle this challenge, we select a Deep Learning architecture to perform unsupervised early anomaly detection. With a data-driven approach, we explore in-depth multiple design choices and exploit the appealing structural properties of the selected architecture to enhance its performance. The experimental evaluation is performed on two recent and publicly available IoT datasets (IoT-23 and Kitsune). Finally, we adopt an adversarial approach to investigate the robustness of our solution in the presence of Label Flipping poisoning attacks. The experimental results highlight the improved performance of the proposed architecture, in comparison to both well-known baselines and previous proposals.
IEEE Transactions on Network and Service Management, 2023
The promise of Deep Learning (DL) in solving hard problems such as network Traffic Classification... more The promise of Deep Learning (DL) in solving hard problems such as network Traffic Classification (TC) is being held back by the severe lack of transparency and explainability of this kind of approaches. To cope with this strongly felt issue, the field of eXplainable Artificial Intelligence (XAI) has been recently founded, and is providing effective techniques and approaches. Accordingly, in this work we investigate interpretability via XAIbased techniques to understand and improve the behavior of state-of-the-art multimodal and multitask DL traffic classifiers. Using a publicly available security-related dataset (ISCX VPN-NONVPN), we explore and exploit XAI techniques to characterize the considered classifiers providing global interpretations (rather than sample-based ones), and define a novel classifier, DISTILLER-EVOLVED, optimized along three objectives: performance, reliability, feasibility. The proposed methodology proves as highly appealing, allowing to much simplify the architecture to get faster training time and shorter classification time, as fewer packets must be collected. This is at the expenses of negligible (or even positive) impact on classification performance, while understanding and controlling the interplay between inputs, model complexity, performance, and reliability.
Elsevier Computer Networks, 2022
The COVID-19 pandemic has reshaped Internet traffic due to the huge modifications imposed to life... more The COVID-19 pandemic has reshaped Internet traffic due to the huge modifications imposed to lifestyle of people resorting more and more to collaboration and communication apps to accomplish daily tasks. Accordingly, these dramatic changes call for novel traffic management solutions to adequately countermeasure such unexpected and massive changes in traffic characteristics. In this paper, we focus on communication and collaboration apps whose traffic experienced a sudden growth during the last two years. Specifically, we consider nine apps whose traffic we collect, reliably label, and publicly release as a new dataset (MIRAGE-COVID-CCMA-2022) to the scientific community. First, we investigate the capability of state-of-art single-modal and multimodal Deep Learning-based classifiers in telling the specific app, the activity performed by the user, or both. While we highlight that state-of-art solutions reports a more-than-satisfactory performance in addressing app classification (96%-98% Fmeasure), evident shortcomings stem out when tackling activity classification (56%-65% F-measure) when using approaches that leverage the transport-layer payload and/or per-packet information attainable from the initial part of the biflows. In line with these limitations, we design a novel set of inputs (namely Context Inputs) providing clues about the nature of a biflow by observing the biflows coexisting simultaneously. Based on these considerations, we propose Mimetic-All a novel early traffic classification multimodal solution that leverages Context Inputs as an additional modality, achieving ≥ 82% F-measure in activity classification. Also, capitalizing the multimodal nature of Mimetic-All, we evaluate different combinations of the inputs. Interestingly, experimental results witness that Mimetic-ConSeq-a variant that uses the Context Inputs but does not rely on payload information (thus gaining greater robustness to more opaque encryption sub-layers possibly going to be adopted in the future)-experiences only ≈ 1% F-measure drop in performance w.r.t. Mimetic-All and results in a shorter training time.
Journal of Network and Systems Management, 2022
Blockchain (BC) and Software-Defined Networking (SDN) are leading technologies which have recentl... more Blockchain (BC) and Software-Defined Networking (SDN) are leading technologies which have recently found applications in several network-related scenarios and have consequently experienced a growing interest in the research community. Indeed, current networks connect a massive number of objects over the Internet and in this complex scenario , to ensure security, privacy, confidentiality, and programmability, the utilization of BC and SDN have been successfully proposed. In this work, we provide a comprehensive survey regarding these two recent research trends and review the related state-of-the-art literature. We first describe the main features of each technology and discuss their most common and used variants. Furthermore, we envision the integration of such technologies to jointly take advantage of these latter efficiently. Indeed , we consider their group-wise utilization-named BC-SDN-based on the need for stronger security and privacy. Additionally, we cover the application fields of these technologies both individually and combined. Finally, we discuss the open issues of reviewed research and describe potential directions for future avenues regarding the integration of BC and SDN. To summarize, the contribution of the present survey spans from an overview of the literature background on BC and SDN to the discussion of the benefits and limitations of BC-SDN integration in different fields, which also raises open challenges and possible future avenues examined herein. To the best of our knowledge, compared to existing surveys, this is the first work that analyzes the aforementioned aspects in light of a broad BC-SDN integration, with a specific focus on security and privacy issues in actual utilization scenarios.
Elsevier Computer Networks, 2021
The prediction of network traffic characteristics helps in understanding this complex phenomenon ... more The prediction of network traffic characteristics helps in understanding this complex phenomenon and enables a number of practical applications, ranging from network planning and provisioning to management, with security implications as well. A significant corpus of work has so far focused on aggregated behavior, e.g., considering traffic volumes observed over a given time interval. Very limited attempts can instead be found tackling prediction at packet-level granularity. This much harder problem (whose solution extends trivially to the aggregated prediction) allows a finer-grained knowledge and wider possibilities of exploitation. The recent investigation and success of sophisticated Deep Learning algorithms is now providing mature tools to face this challenging but promising goal. In this work, we investigate and specialize a set of architectures selected among Convolutional, Recurrent, and Composite Neural Networks, to predict mobile-app traffic at the finest (packet-level) granularity. We discuss and experimentally evaluate the prediction effectiveness of the provided approaches also assessing the benefits of a number of design choices such as memory size or multimodality, investigating performance trends at packet level focusing on the head and the tail of biflows. We compare the results with both Markovian and classic Machine Learning approaches, showing increased performance with respect to state-of-the-art predictors (high-order Markov chains and Random Forest Regressor). For the sake of reproducibility and relevance to modern traffic, all evaluations are conducted leveraging two real human-generated mobile traffic datasets including different categories of mobile apps. The experimental results witness remarkable variability in prediction performance among different apps categories. The work also provides valuable analysis results and tools to compare different predictors and strike the best balance among the performance measures.
International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), 2024
IEEE International Conference on Communication (IEEE ICC), 2023
The lifestyle change originated from the COVID-19 pandemic has caused a measurable impact on Inte... more The lifestyle change originated from the COVID-19 pandemic has caused a measurable impact on Internet traffic in terms of volume and application mix, with a sudden increase in usage of communication-and-collaboration apps. In this work, we focus on four of these apps (Skype, Teams, Webex, and Zoom), whose traffic we collect, reliably label at fine (i.e. per-activity) granularity, and analyze from the viewpoint of traffic prediction. The outcome of this analysis is informative for a number of network management tasks, including monitoring, planning, resource provisioning, and (security) policy enforcement. To this aim, we employ state-of-the-art multitask deep learning approaches to assess to which degree the traffic generated by these apps and their different use cases (i.e. activities: audio-call, video-call, and chat) can be forecast at packet level. The experimental analysis investigates the performance of the considered deep learning architectures, in terms of both traffic-prediction accuracy and complexity, and the related trade-off. Equally important, our work is a first attempt at interpreting the results obtained by these predictors via eXplainable Artificial Intelligence (XAI).
The 2nd IEEE International Workshop on "Distributed Intelligent Systems (DistInSys)", 2022
With the increasing popularity of mobile-app services, malicious software is increasing as well. ... more With the increasing popularity of mobile-app services, malicious software is increasing as well. Accordingly, the interest of the scientific community in Machine and Deep Learning solutions for detecting and classifying malware traffic is growing. In this work, we provide a fair assessment of the performance of a number of data-driven strategies to detect and classify Android malware traffic. Three models are taken into account (Decision Tree, Random Forest, and 1-D Convolutional Neural Network) considering both flat (i.e. non-hierarchical) and hierarchical approaches. The experimental analysis performed using a stateof-art dataset (CIC-AAGM2017) reports that Random Forest exhibits the best performance in a flat setup, while moving to a hierarchical approach could cause significant variation in precision and recall. Such results push for further investigating advanced hierarchical setups and learning schemes.
The 27th IEEE Symposium on Computers and Communications (IEEE ISCC 2022), 2022
The generation of synthetic network traffic is necessary to several fundamental networking activi... more The generation of synthetic network traffic is necessary to several fundamental networking activities, ranging from device testing to path monitoring, with implications on security and management. While literature focused on high-rate traffic generation, for many use cases accurate traffic generation is of importance instead. These scenarios have expanded with Network Function Virtualization, Software Defined Networking, and Cloud applications, which introduce further causes for alterations of generated traffic. Such causes are described and experimentally evaluated in this work, where the generation accuracy of D-ITG, an open-source software generator, is investigated in a virtualized environment. A definition of accuracy in terms of Mean Absolute Percentage Error of the sequences of Payload Lengths (PLs) and Inter-Departure Times (IDTs) is exploited to this end. The tool is found accurate for all PLs and for IDTs greater than one millisecond, and after the correction of a systematic error, also from 100 µs.
IEEE International Conference on Computer Communications 2022 (INFOCOM 2022) - The Tenth International Workshop on Security and Privacy in Big Data (BigSecurity), 2022
In recent years, Internet of Things (IoT) traffic has increased dramatically and is expected to g... more In recent years, Internet of Things (IoT) traffic has increased dramatically and is expected to grow further in the next future. Because of their vulnerabilities, IoT devices are often the target of cyber-attacks with dramatic consequences. For this reason, there is a strong need for powerful tools to guarantee a good level of security in IoT networks. Machine and deep learning approaches promise good performance for such a complex task. In this work, we employ state-of-art traffic classifiers based on deep learning and assess their effectiveness in accomplishing IoT attack classification. We aim to recognize different attack classes and distinguish them from benign network traffic. In more detail, we utilize effective and unbiased input data that allow fast (i.e. "early") detection of anomalies and we compare performance with that of traditional (i.e. "postmortem") machine learning classifiers. The experimental results highlight the need for advanced deep learning architectures fed with input data specifically tailored and designed for IoT attack classification. Furthermore, we perform an occlusion analysis to assess the influence on the performance of some network layer fields and the possible bias they may introduce.
IEEE International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 2021
The lockdowns and lifestyle changes during the COVID-19 pandemic have caused a measurable impact ... more The lockdowns and lifestyle changes during the COVID-19 pandemic have caused a measurable impact on Internet traffic in terms of volumes and application mix, with a sudden increase of usage of communication and collaboration apps. In this work, we focus on five such apps, whose traffic we collect, reliably label at fine granularity (per-activity), and analyze from the viewpoint of traffic classification. To this aim, we employ state-of-art deep learning approaches to assess to which degree the apps, their different use cases (activities), and the pairs app-activity can be told apart from each other. We investigate the early behavior of the biflows composing the traffic and the effect of tuning the dimension of the input, via a sensitivity analysis. The experimental analysis highlights the figures of the different architectures, in terms of both traffic-classification performance and complexity w.r.t. different classification tasks, and the related trade-off. The outcome of this analysis is informative for a number of network management tasks, including monitoring, planning, resource provisioning, and (security) policy enforcement.
IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI), 2021
In this work, we address the characterization and modeling of the network traffic generated by co... more In this work, we address the characterization and modeling of the network traffic generated by communication and collaboration apps which have been the object of recent traffic surge due to the COVID-19 pandemic spread. In detail, focusing on five of the top popular mobile apps (collected via the MIRAGE architecture) used for working/studying during the pandemic time frame, we provide characterization at trace and flow level, and modeling by means of Multimodal Markov Chains for both apps and related activities. The results highlight interesting peculiarities related to both the running applications and the specific activities performed. The outcome of this analysis constitutes the stepping stone toward a number of tasks related to network management and traffic analysis, such as identification/classification and prediction, and modern IT management in general.
IEEE International Conference on Communications (ICC), 2021
Traffic Classification (TC), i.e. the collection of procedures for inferring applications and/or ... more Traffic Classification (TC), i.e. the collection of procedures for inferring applications and/or services generating network traffic, represents the workhorse for service management and the enabler for valuable profiling information. Sadly, the growing trend toward encrypted protocols (e.g. TLS) and the evolving nature of network traffic make TC design solutions based on payload-inspection and machine learning, respectively, unsuitable. Conversely, Deep Learning (DL) is currently foreseen as a viable means to design traffic classifiers based on automatically-extracted features, reflecting the complex patterns distilled from the multifaceted (encrypted) traffic nature, implicitly carrying information in "multimodal" fashion. To this end, in this paper a novel multimodal DL approach for multitask TC is explored. The latter is able to capitalize traffic data heterogeneity (by learning both intra-and inter-modality dependencies), overcome performance limitations of existing (myopic) single-modality DL-based TC proposals, and solve different traffic categorization problems associated with different providers' desiderata. Based on a real dataset of encrypted traffic, we report performance gains of our proposal over (a) state-of-art multitask DL architectures and (b) multitask extensions of single-task DL baselines (both based on single-modality philosophy).
4th IEEE International Conference on Computing, Communication and Security (ICCCS), 2019
Network traffic analysis, i.e. the umbrella of procedures for distilling information from network... more Network traffic analysis, i.e. the umbrella of procedures for distilling information from network traffic, represents the enabler for highly-valuable profiling information, other than being the workhorse for several key network management tasks. While it is currently being revolutionized in its nature by the rising share of traffic generated by mobile and hand-held devices, existing design solutions are mainly evaluated on private traffic traces, and only a few public datasets are available, thus clearly limiting repeatability and further advances on the topic. To this end, this paper introduces and describes MIRAGE, a reproducible architecture for mobile-app traffic capture and ground-truth creation. The outcome of this system is MIRAGE-2019, a human-generated dataset for mobile traffic analysis (with associated ground-truth) having the goal of advancing the state-of-the-art in mobile app traffic analysis. A first statistical characterization of the mobile-app traffic in the dataset is provided in this paper. Still, MIRAGE is expected to be capitalized by the networking community for different tasks related to mobile traffic analysis.
IEEE/ACM Network Traffic Measurement and Analysis Conference (TMA 2019), 2019
The spread of handheld devices has led to the unprecedented growth of traffic volumes traversing ... more The spread of handheld devices has led to the unprecedented growth of traffic volumes traversing both local networks and the Internet, appointing mobile traffic classification as a key tool for gathering highly-valuable profiling information, other than traffic engineering and service management. However, the nature of mobile traffic severely challenges state-of-art Machine-Learning (ML) approaches, since the quickly evolving and expanding set of apps generating traffic hinders ML-based approaches, that require domain-expert design. Deep Learning (DL) represents a promising solution to this issue, but results in higher completion times, in turn suggesting the application of the Big-Data (BD) paradigm. In this paper, we investigate for the first time BD-enabled classification of encrypted mobile traffic using DL from a general standpoint, (a) defining general design guidelines , (b) leveraging a public-cloud platform, and (c) resorting to a realistic experimental setup. We found that, while BD represents a transparent accelerator for some tasks, this is not the case for the training phase of DL architectures for traffic classification, requiring a specific BD-informed design. The experimental setup is built upon a three-dimensional investigation path in the BD adoption, namely: (i) completion time, (ii) deployment costs, and (iii) classification performance, highlighting relevant non-trivial trade-offs.
IEEE Open Journal of the Communications Society, 2024
In today's digital landscape, critical services are increasingly dependent on network connectivit... more In today's digital landscape, critical services are increasingly dependent on network connectivity, thus cybersecurity has become paramount. Indeed, the constant escalation of cyberattacks, including zeroday exploits, poses a significant threat. While Network Intrusion Detection Systems (NIDSs) leveraging machine-learning and deep-learning models have proven effective in recent studies, they encounter limitations such as the need for abundant samples of malicious traffic and full retraining upon encountering new attacks. These limitations hinder their adaptability in real-world scenarios. To address these challenges, we design a novel NIDS capable of promptly adapting to classify new attacks and provide timely predictions. Our proposal for attack-traffic classification adopts Few-Shot Class-Incremental Learning (FSCIL) and is based on the Rethinking Few-Shot (RFS) approach, which we experimentally prove to overcome other FSCIL state-of-the-art alternatives based on either meta-learning or transfer learning. We evaluate the proposed NIDS across a wide array of cyberattacks whose traffic is collected in recent publicly available datasets to demonstrate its robustness across diverse network-attack scenarios, including malicious activities in an Internet-of-Things context and cyberattacks targeting servers. We validate various design choices as well, involving the number of traffic samples per attack available, the impact of the features used to represent the traffic objects, and the time to deliver the classification verdict. Experimental results witness that our proposed NIDS effectively retains previously acquired knowledge (with over 94% F1-score) while adapting to new attacks with only few samples available (with over 98% F1-score). Thus, it outperforms non-FSCIL state of the art in terms of classification effectiveness and adaptation time. Moreover, our NIDS exhibits high performance even with traffic collected within short time frames, achieving 95% F1score while reducing the time-to-insight. Finally, we identify possible limitations likely arising in specific application contexts and envision promising research avenues to mitigate them.
Journal of Information Security and Applications, 2024
The Internet of Things (IoT) is a key enabler for critical systems, but IoT devices are increasin... more The Internet of Things (IoT) is a key enabler for critical systems, but IoT devices are increasingly targeted by cyberattacks due to their diffusion and hardware and software limitations. This calls for designing and evaluating new effective approaches for protecting IoT systems at the network level. While recent proposals based on machine-and deep-learning provide effective solutions to the problem of attack-traffic classification, their adoption is severely challenged by the amount of labeled traffic they require to train the classification models. In fact, this results in the need for collecting and labeling large amounts of malicious traffic, which may be hindered by the nature of the malware possibly generating little and hard-to-capture network activity. To tackle this challenge, we adopt few-shot learning approaches for attack-traffic classification, with the objective to improve detection performance for attack classes with few labeled samples. We leverage advanced deep-learning architectures to perform feature extraction and provide an extensive empirical study—using recent and publicly available datasets—comparing the performance of an ample variety of solutions based on different learning paradigms, and exploring a number of design choices in depth (impact of embedding function, number of classes of attacks, or number of attack samples). In comparison to non-few-shot baselines, we achieve a relative improvement in the F1-score ranging from 8% to 27%.
Transactions on Emerging Telecommunications Technologies, 2024
The Fifth Generation (5G) of mobile communications is the most exciting emerging technology for r... more The Fifth Generation (5G) of mobile communications is the most exciting emerging technology for researchers and scientists to get the full benefit of a network system. However, 5G networks confront massive threats and vulnerabilities including protection, privacy, and secrecy. To face these challenges in the increasingly interconnected Internet of Things (IoT) scenario, we aim to leverage state-of-the-art technologies as Software Defined Networking (SDN) in conjunction with Network Function Virtualization (NFV), Blockchain, and Machine Learning (ML). Indeed, these technologies convey a robust and secure setting in the networking platform enabling to manage several criticalities related to security, privacy, flexibility, and performance. In light of these considerations, in this paper, we propose the "BlockSD-5GNet" architecture to efficiently improve the security of a 5G network and to exploit the combined advantages of Blockchain, SDN, NFV, and ML. In the proposed architecture, the SDN helps to manage the network by dividing it into data plane and control plane, while the Blockchain guarantees improved security and confidentiality. Therefore, the "BlockSD-5GNet" architecture can both secure sensitive data and attain reliable data transfer within and between the 5G network-infrastructure planes. Additionally, an ML module is integrated into the SDN controller to estimate network bandwidth and assist the administrator in taking effective decisions and satisfying high-bandwidth demand. We assess the performance of the "BlockSD-5GNet" architecture via an experimental evaluation performed in a simulation environment, and show the effectiveness of the proposed solution in comparison with baseline schemes. Finally, we also demonstrate the capability of different ML models in bandwidth prediction.
IEEE Open Journal of the Communications Society, 2024
Significant transformations in lifestyle have reshaped the Internet landscape, resulting in notab... more Significant transformations in lifestyle have reshaped the Internet landscape, resulting in notable shifts in both the magnitude of Internet traffic and the diversity of apps utilized. The increased adoption of communication-and-collaboration apps, also fueled by lockdowns in the COVID pandemic years, has heavily impacted the management of network infrastructures and their traffic. A notable characteristic of these apps is their multi-activity nature, e.g., they can be used for chat and (interactive) audio/video in the same usage session: predicting and managing the traffic they generate is an important but especially challenging task. In this study, we focus on real data from four popular apps belonging to the aforementioned category: Skype, Teams, Webex, and Zoom. First, we collect traffic data from these apps, reliably label it with both the app and the specific user activity and analyze it from the perspective of traffic prediction. Second, we design data-driven models to predict this traffic at the finest granularity (i.e. at packet level) employing four advanced multitask deep learning architectures and investigating three different training strategies. The trade-off between performance and complexity is explored as well. We publish the dataset and release our code as open source to foster the replicability of our analysis. Third, we leverage the packet-level prediction approach to perform aggregate prediction at different timescales. Fourth, our study pioneers the trustworthiness analysis of these predictors via the application of eXplainable Artificial Intelligence to (a) interpret their forecasting results and (b) evaluate their reliability, highlighting the relative importance of different parts of observed traffic and thus offering insights for future analyses and applications. The insights gained from the analysis provided with this work have implications for various network management tasks, including monitoring, planning, resource allocation, and enforcing security policies.
IEEE Communications Magazine, 2023
Traffic classification (TC) is pivotal for network traffic management and security. Over time, TC... more Traffic classification (TC) is pivotal for network traffic management and security. Over time, TC solutions leveraging Artificial Intelligence (AI) have undergone significant advancements, primarily fueled by Machine Learning (ML). This paper analyzes the history and current state of AI-powered TC on the Internet, highlighting unresolved research questions. Indeed, despite extensive research, key desiderata goals to product-line implementations remain. AI presents untapped potential for addressing the complex and evolving challenges of TC, drawing from successful applications in other domains. We identify novel ML topics and solutions that address unmet TC requirements, shaping a comprehensive research landscape for the TC future. We also discuss the interdependence of TC desiderata and identify obstacles hindering AI-powered next-generation solutions. Overcoming these roadblocks will unlock two intertwined visions for future networks: self-managed and human-centered networks.
Elsevier Computers and Security, 2023
The Internet of Things (IoT) is a key enabler in closing the loop in Cyber-Physical Systems, prov... more The Internet of Things (IoT) is a key enabler in closing the loop in Cyber-Physical Systems, providing "smartness" and thus additional value to each monitored/controlled physical asset. Unfortunately, these devices are more and more targeted by cyberattacks because of their diffusion and of the usually limited hardware and software resources. This calls for designing and evaluating new effective approaches for protecting IoT systems at the network level (Network Intrusion Detection Systems, NIDSs). These in turn are challenged by the heterogeneity of IoT devices and the growing volume of transmitted data. To tackle this challenge, we select a Deep Learning architecture to perform unsupervised early anomaly detection. With a data-driven approach, we explore in-depth multiple design choices and exploit the appealing structural properties of the selected architecture to enhance its performance. The experimental evaluation is performed on two recent and publicly available IoT datasets (IoT-23 and Kitsune). Finally, we adopt an adversarial approach to investigate the robustness of our solution in the presence of Label Flipping poisoning attacks. The experimental results highlight the improved performance of the proposed architecture, in comparison to both well-known baselines and previous proposals.
IEEE Transactions on Network and Service Management, 2023
The promise of Deep Learning (DL) in solving hard problems such as network Traffic Classification... more The promise of Deep Learning (DL) in solving hard problems such as network Traffic Classification (TC) is being held back by the severe lack of transparency and explainability of this kind of approaches. To cope with this strongly felt issue, the field of eXplainable Artificial Intelligence (XAI) has been recently founded, and is providing effective techniques and approaches. Accordingly, in this work we investigate interpretability via XAIbased techniques to understand and improve the behavior of state-of-the-art multimodal and multitask DL traffic classifiers. Using a publicly available security-related dataset (ISCX VPN-NONVPN), we explore and exploit XAI techniques to characterize the considered classifiers providing global interpretations (rather than sample-based ones), and define a novel classifier, DISTILLER-EVOLVED, optimized along three objectives: performance, reliability, feasibility. The proposed methodology proves as highly appealing, allowing to much simplify the architecture to get faster training time and shorter classification time, as fewer packets must be collected. This is at the expenses of negligible (or even positive) impact on classification performance, while understanding and controlling the interplay between inputs, model complexity, performance, and reliability.
Elsevier Computer Networks, 2022
The COVID-19 pandemic has reshaped Internet traffic due to the huge modifications imposed to life... more The COVID-19 pandemic has reshaped Internet traffic due to the huge modifications imposed to lifestyle of people resorting more and more to collaboration and communication apps to accomplish daily tasks. Accordingly, these dramatic changes call for novel traffic management solutions to adequately countermeasure such unexpected and massive changes in traffic characteristics. In this paper, we focus on communication and collaboration apps whose traffic experienced a sudden growth during the last two years. Specifically, we consider nine apps whose traffic we collect, reliably label, and publicly release as a new dataset (MIRAGE-COVID-CCMA-2022) to the scientific community. First, we investigate the capability of state-of-art single-modal and multimodal Deep Learning-based classifiers in telling the specific app, the activity performed by the user, or both. While we highlight that state-of-art solutions reports a more-than-satisfactory performance in addressing app classification (96%-98% Fmeasure), evident shortcomings stem out when tackling activity classification (56%-65% F-measure) when using approaches that leverage the transport-layer payload and/or per-packet information attainable from the initial part of the biflows. In line with these limitations, we design a novel set of inputs (namely Context Inputs) providing clues about the nature of a biflow by observing the biflows coexisting simultaneously. Based on these considerations, we propose Mimetic-All a novel early traffic classification multimodal solution that leverages Context Inputs as an additional modality, achieving ≥ 82% F-measure in activity classification. Also, capitalizing the multimodal nature of Mimetic-All, we evaluate different combinations of the inputs. Interestingly, experimental results witness that Mimetic-ConSeq-a variant that uses the Context Inputs but does not rely on payload information (thus gaining greater robustness to more opaque encryption sub-layers possibly going to be adopted in the future)-experiences only ≈ 1% F-measure drop in performance w.r.t. Mimetic-All and results in a shorter training time.
Journal of Network and Systems Management, 2022
Blockchain (BC) and Software-Defined Networking (SDN) are leading technologies which have recentl... more Blockchain (BC) and Software-Defined Networking (SDN) are leading technologies which have recently found applications in several network-related scenarios and have consequently experienced a growing interest in the research community. Indeed, current networks connect a massive number of objects over the Internet and in this complex scenario , to ensure security, privacy, confidentiality, and programmability, the utilization of BC and SDN have been successfully proposed. In this work, we provide a comprehensive survey regarding these two recent research trends and review the related state-of-the-art literature. We first describe the main features of each technology and discuss their most common and used variants. Furthermore, we envision the integration of such technologies to jointly take advantage of these latter efficiently. Indeed , we consider their group-wise utilization-named BC-SDN-based on the need for stronger security and privacy. Additionally, we cover the application fields of these technologies both individually and combined. Finally, we discuss the open issues of reviewed research and describe potential directions for future avenues regarding the integration of BC and SDN. To summarize, the contribution of the present survey spans from an overview of the literature background on BC and SDN to the discussion of the benefits and limitations of BC-SDN integration in different fields, which also raises open challenges and possible future avenues examined herein. To the best of our knowledge, compared to existing surveys, this is the first work that analyzes the aforementioned aspects in light of a broad BC-SDN integration, with a specific focus on security and privacy issues in actual utilization scenarios.
Elsevier Computer Networks, 2021
The prediction of network traffic characteristics helps in understanding this complex phenomenon ... more The prediction of network traffic characteristics helps in understanding this complex phenomenon and enables a number of practical applications, ranging from network planning and provisioning to management, with security implications as well. A significant corpus of work has so far focused on aggregated behavior, e.g., considering traffic volumes observed over a given time interval. Very limited attempts can instead be found tackling prediction at packet-level granularity. This much harder problem (whose solution extends trivially to the aggregated prediction) allows a finer-grained knowledge and wider possibilities of exploitation. The recent investigation and success of sophisticated Deep Learning algorithms is now providing mature tools to face this challenging but promising goal. In this work, we investigate and specialize a set of architectures selected among Convolutional, Recurrent, and Composite Neural Networks, to predict mobile-app traffic at the finest (packet-level) granularity. We discuss and experimentally evaluate the prediction effectiveness of the provided approaches also assessing the benefits of a number of design choices such as memory size or multimodality, investigating performance trends at packet level focusing on the head and the tail of biflows. We compare the results with both Markovian and classic Machine Learning approaches, showing increased performance with respect to state-of-the-art predictors (high-order Markov chains and Random Forest Regressor). For the sake of reproducibility and relevance to modern traffic, all evaluations are conducted leveraging two real human-generated mobile traffic datasets including different categories of mobile apps. The experimental results witness remarkable variability in prediction performance among different apps categories. The work also provides valuable analysis results and tools to compare different predictors and strike the best balance among the performance measures.
IEEE Transactions on Network and Service Management, 2021
The increasing diffusion of mobile devices has dramatically changed the network traffic landscape... more The increasing diffusion of mobile devices has dramatically changed the network traffic landscape, with Traffic Classification (TC) surging into a fundamental role while facing new and unprecedented challenges. The recent and appealing adoption of Deep Learning (DL) techniques has risen as the solution overcoming the performance of ML techniques based on tedious and time-consuming handcrafted feature design. Still, the black-box nature of DL models prevents its practical and trustful adoption in critical scenarios where the reliability/interpretation of results/policies is of key importance. To cope with these limitations, eXplainable Artificial Intelligence (XAI) techniques have recently acquired the interest of the community. Accordingly, in this work we investigate trustworthiness and interpretability via XAI-based techniques to understand, interpret and improve the behavior of state-of-the-art multimodal DL traffic classifiers. The proposed methodology, as opposed to common results seen in XAI, attempts to provide global interpretation, rather than sample-based ones. Results, based on an open dataset, allow to complement the above findings with domain knowledge.
IEEE Transactions on Network and Service Management, 2021
Modeling network traffic is an endeavor actively carried on since early digital communications, s... more Modeling network traffic is an endeavor actively carried on since early digital communications, supporting a number of practical applications, that range from network planning and provisioning to security. Accordingly, many theoretical and empirical approaches have been proposed in this long-standing research, most notably, Machine Learning (ML) ones. Indeed, recent interest from network equipment vendors is sparking around the evaluation of solid information-theoretical modeling approaches complementary to ML ones, especially applied to new network traffic profiles stemming from the massive diffusion of mobile apps. To cater to these needs, we analyze mobile-app traffic available in the public dataset MIRAGE-2019 adopting two related modeling approaches based on the well-known methodological toolset of Markov models (namely, Markov Chains and Hidden Markov Models). We propose a novel heuristic to reconstruct application-layer messages in the common case of encrypted traffic. We discuss and experimentally evaluate the suitability of the provided modeling approaches for different tasks: characterization of network traffic (at different granular-ities, such as application, application category, and application version), and prediction of network traffic at both packet and message level. We also compare the results with several ML approaches, showing performance comparable to a state-of-the-art ML predictor (Random Forest Regressor). Also, with this work we provide a viable and theoretically sound traffic-analysis toolset to help improving ML evaluation (and possibly its design), and a sensible and interpretable baseline.
Elsevier Journal of Network and Computer Applications, 2021
Traffic classification, i.e. the inference of applications and/or services from their network tra... more Traffic classification, i.e. the inference of applications and/or services from their network traffic, represents the workhorse for service management and the enabler for valuable profiling information. The growing trend toward encrypted protocols and the fast-evolving nature of network traffic are obsoleting the traffic-classification design solutions based on payload-inspection or machine learning. Conversely, deep learning is currently foreseen as a viable means to design traffic classifiers based on automatically-extracted features. These reflect the complex patterns distilled from the multifaceted (encrypted) traffic, that implicitly carries information in "multimodal" fashion, and can be also used in application scenarios with diversified network visibility for (simul-taneously) tackling multiple classification tasks. To this end, in this paper a novel multimodal multitask deep learning approach for traffic classification is proposed, leading to the Distiller classifier. The latter is able to capitalize traffic-data heterogeneity (by learning both intra-and inter-modality dependencies), overcome performance limitations of existing (myopic) single-modal deep learning-based traffic classification proposals, and simultaneously solve different traffic categorization problems associated to different providers' desiderata. Based on a public dataset of encrypted traffic, we evaluate Distiller in a fair comparison with state-of-the-art deep learning architectures proposed for encrypted traffic classification (and based on single-modality philosophy). Results show the gains of our proposal over both multitask extensions of single-task baselines and native multitask architectures.
Journal of Information Security and Applications, 2020
With the increasing utilization of the Internet and its provided services, an increase in cyber-a... more With the increasing utilization of the Internet and its provided services, an increase in cyber-attacks to exploit the information occurs. A technology to store and maintain user's information that is mostly used for its simplicity and low-cost services is cloud computing (CC). Also, a new model of computing that is noteworthy today is mobile cloud computing (MCC) that is used to reduce the limitations of mobile devices by allowing them to offload certain computations to the remote cloud. The cloud environment may consist of critical or essential information of an organization; therefore, to prevent this environment from possible attacks a security solution is needed. An intrusion detection system (IDS) is a solution to these security issues. An IDS is a hardware or software device that can examine all inside and outside network activities and recognize doubtful patterns that may demonstrate a network attack and automatically alert the network (or system) administrator. Because of the ability of an IDS to detect known/unknown (inside/outside) attacks, it is an excellent choice for securing cloud computing. Various methods are used in an intrusion detection system to recognize attacks more accurately. Unlike survey papers presented so far, this paper aims to present a comprehensive survey of intrusion detection systems that use computational intelligence (CI) methods in a (mobile) cloud environment. We firstly provide an overview of CC and MCC paradigms and service models, also reviewing security threats in these contexts. Previous literature is critically surveyed, highlighting the advantages and limitations of previous work. Then we define a taxonomy for IDS and classify CI-based techniques into single and hybrid methods. Finally, we highlight open issues and future directions for research on this topic.
Elsevier Neucomputing, 2020
Traffic Classification (TC), consisting in how to infer applications generating network traffic, ... more Traffic Classification (TC), consisting in how to infer applications generating network traffic, is currently the enabler for valuable profiling information, other than being the workhorse for service differentiation/blocking. Further, TC is fostered by the blooming of mobile (mostly encrypted) traffic volumes, fueled by the huge adoption of hand-held devices. While researchers and network operators still rely on machine learning to pursue accurate inference, we envision Deep Learning (DL) paradigm as the stepping stone toward the design of practical (and effective) mobile traffic classifiers based on automatically-extracted features, able to operate with encrypted traffic, and reflecting complex traffic patterns. In this context, the paper contribution is four-fold. First, it provides a taxonomy of the key network traffic analysis subjects where DL is foreseen as attractive. Secondly, it delves into the non-trivial adoption of DL to mobile TC, surfacing potential gains. Thirdly, to capitalize such gains, it proposes and validates a general framework for DL-based encrypted TC. Two concrete instances originating from our framework are then experimentally evaluated on three mobile datasets of human users' activity. Lastly, our framework is leveraged to point to future research perspectives.
Elsevier Computer Networks, 2019
Mobile Traffic Classification (TC) has become nowadays the enabler for valuable profiling informa... more Mobile Traffic Classification (TC) has become nowadays the enabler for valuable profiling information, other than being the workhorse for service differentiation or blocking. Nonetheless, a main hindrance in the design of accurate classifiers is the adoption of encrypted protocols, compromising the effectiveness of deep packet inspection. Also, the evolving nature of mobile network traffic makes solutions with Machine Learning (ML), based on manually-and expert-originated features, unable to keep its pace. These limitations clear the way to Deep Learning (DL) as a viable strategy to design traffic classifiers based on automatically-extracted features, reflecting the complex patterns distilled from the multifaceted traffic nature, implicitly carrying information in "multi-modal" fashion. Multi-modality in TC allows to inspect the traffic from complementary views, thus providing an effective solution to the mobile scenario. Accordingly, a novel multimodal DL framework for encrypted TC is proposed, named MIMETIC, able to capitalize traffic data heterogeneity (by learning both intra-and inter-modality dependences), overcome performance limitations of existing (myopic) single-modality DL-based TC proposals, and support the challenging mobile scenario. Using three (human-generated) datasets of mobile encrypted traffic, we demonstrate performance improvement of MIMETIC over (a) single-modality DL-based counterparts, (b) state-of-the-art ML-based (mobile) traffic classifiers, and (c) classifier fusion techniques.
Journal of information security and applications, Jun 1, 2024
The Internet of Things (IoT) is a key enabler for critical systems, but IoT devices are increasin... more The Internet of Things (IoT) is a key enabler for critical systems, but IoT devices are increasingly targeted by cyberattacks due to their diffusion and hardware and software limitations. This calls for designing and evaluating new effective approaches for protecting IoT systems at the network level. While recent proposals based on machine-and deep-learning provide effective solutions to the problem of attack-traffic classification, their adoption is severely challenged by the amount of labeled traffic they require to train the classification models. In fact, this results in the need for collecting and labeling large amounts of malicious traffic, which may be hindered by the nature of the malware possibly generating little and hard-to-capture network activity. To tackle this challenge, we adopt few-shot learning approaches for attack-traffic classification, with the objective to improve detection performance for attack classes with few labeled samples. We leverage advanced deep-learning architectures to perform feature extraction and provide an extensive empirical study—using recent and publicly available datasets—comparing the performance of an ample variety of solutions based on different learning paradigms, and exploring a number of design choices in depth (impact of embedding function, number of classes of attacks, or number of attack samples). In comparison to non-few-shot baselines, we achieve a relative improvement in the F1-score ranging from 8% to 27%.
Transactions on emerging telecommunications technologies, Apr 1, 2024
The Fifth Generation (5G) of mobile communications is the most exciting emerging technology for r... more The Fifth Generation (5G) of mobile communications is the most exciting emerging technology for researchers and scientists to get the full benefit of a network system. However, 5G networks confront massive threats and vulnerabilities including protection, privacy, and secrecy. To face these challenges in the increasingly interconnected Internet of Things (IoT) scenario, we aim to leverage state-of-the-art technologies as Software Defined Networking (SDN) in conjunction with Network Function Virtualization (NFV), Blockchain, and Machine Learning (ML). Indeed, these technologies convey a robust and secure setting in the networking platform enabling to manage several criticalities related to security, privacy, flexibility, and performance. In light of these considerations, in this paper, we propose the "BlockSD-5GNet" architecture to efficiently improve the security of a 5G network and to exploit the combined advantages of Blockchain, SDN, NFV, and ML. In the proposed architecture, the SDN helps to manage the network by dividing it into data plane and control plane, while the Blockchain guarantees improved security and confidentiality. Therefore, the "BlockSD-5GNet" architecture can both secure sensitive data and attain reliable data transfer within and between the 5G network-infrastructure planes. Additionally, an ML module is integrated into the SDN controller to estimate network bandwidth and assist the administrator in taking effective decisions and satisfying high-bandwidth demand. We assess the performance of the "BlockSD-5GNet" architecture via an experimental evaluation performed in a simulation environment, and show the effectiveness of the proposed solution in comparison with baseline schemes. Finally, we also demonstrate the capability of different ML models in bandwidth prediction.
The lifestyle change originated from the COVID-19 pandemic has caused a measurable impact on Inte... more The lifestyle change originated from the COVID-19 pandemic has caused a measurable impact on Internet traffic in terms of volume and application mix, with a sudden increase in usage of communication-and-collaboration apps. In this work, we focus on four of these apps (Skype, Teams, Webex, and Zoom), whose traffic we collect, reliably label at fine (i.e. per-activity) granularity, and analyze from the viewpoint of traffic prediction. The outcome of this analysis is informative for a number of network management tasks, including monitoring, planning, resource provisioning, and (security) policy enforcement. To this aim, we employ state-of-the-art multitask deep learning approaches to assess to which degree the traffic generated by these apps and their different use cases (i.e. activities: audio-call, video-call, and chat) can be forecast at packet level. The experimental analysis investigates the performance of the considered deep learning architectures, in terms of both traffic-prediction accuracy and complexity, and the related trade-off. Equally important, our work is a first attempt at interpreting the results obtained by these predictors via eXplainable Artificial Intelligence (XAI).
IEEE Communications Magazine, Dec 31, 2022
IEEE open journal of the Communications Society, 2024
arXiv (Cornell University), Aug 3, 2022
Computers & Security
The Internet of Things (IoT) is a key enabler in closing the loop in Cyber-Physical Systems, prov... more The Internet of Things (IoT) is a key enabler in closing the loop in Cyber-Physical Systems, providing "smartness" and thus additional value to each monitored/controlled physical asset. Unfortunately, these devices are more and more targeted by cyberattacks because of their diffusion and of the usually limited hardware and software resources. This calls for designing and evaluating new effective approaches for protecting IoT systems at the network level (Network Intrusion Detection Systems, NIDSs). These in turn are challenged by the heterogeneity of IoT devices and the growing volume of transmitted data. To tackle this challenge, we select a Deep Learning architecture to perform unsupervised early anomaly detection. With a data-driven approach, we explore in-depth multiple design choices and exploit the appealing structural properties of the selected architecture to enhance its performance. The experimental evaluation is performed on two recent and publicly available IoT datasets (IoT-23 and Kitsune). Finally, we adopt an adversarial approach to investigate the robustness of our solution in the presence of Label Flipping poisoning attacks. The experimental results highlight the improved performance of the proposed architecture, in comparison to both well-known baselines and previous proposals.
IEEE Transactions on Network and Service Management, 2023
The promise of Deep Learning (DL) in solving hard problems such as network Traffic Classification... more The promise of Deep Learning (DL) in solving hard problems such as network Traffic Classification (TC) is being held back by the severe lack of transparency and explainability of this kind of approaches. To cope with this strongly felt issue, the field of eXplainable Artificial Intelligence (XAI) has been recently founded, and is providing effective techniques and approaches. Accordingly, in this work we investigate interpretability via XAIbased techniques to understand and improve the behavior of state-of-the-art multimodal and multitask DL traffic classifiers. Using a publicly available security-related dataset (ISCX VPN-NONVPN), we explore and exploit XAI techniques to characterize the considered classifiers providing global interpretations (rather than sample-based ones), and define a novel classifier, DISTILLER-EVOLVED, optimized along three objectives: performance, reliability, feasibility. The proposed methodology proves as highly appealing, allowing to much simplify the architecture to get faster training time and shorter classification time, as fewer packets must be collected. This is at the expenses of negligible (or even positive) impact on classification performance, while understanding and controlling the interplay between inputs, model complexity, performance, and reliability.
Computer Networks
The COVID-19 pandemic has reshaped Internet traffic due to the huge modifications imposed to life... more The COVID-19 pandemic has reshaped Internet traffic due to the huge modifications imposed to lifestyle of people resorting more and more to collaboration and communication apps to accomplish daily tasks. Accordingly, these dramatic changes call for novel traffic management solutions to adequately countermeasure such unexpected and massive changes in traffic characteristics. In this paper, we focus on communication and collaboration apps whose traffic experienced a sudden growth during the last two years. Specifically, we consider nine apps whose traffic we collect, reliably label, and publicly release as a new dataset (MIRAGE-COVID-CCMA-2022) to the scientific community. First, we investigate the capability of state-of-art single-modal and multimodal Deep Learning-based classifiers in telling the specific app, the activity performed by the user, or both. While we highlight that state-of-art solutions reports a more-than-satisfactory performance in addressing app classification (96%-98% Fmeasure), evident shortcomings stem out when tackling activity classification (56%-65% F-measure) when using approaches that leverage the transport-layer payload and/or per-packet information attainable from the initial part of the biflows. In line with these limitations, we design a novel set of inputs (namely Context Inputs) providing clues about the nature of a biflow by observing the biflows coexisting simultaneously. Based on these considerations, we propose Mimetic-All a novel early traffic classification multimodal solution that leverages Context Inputs as an additional modality, achieving ≥ 82% F-measure in activity classification. Also, capitalizing the multimodal nature of Mimetic-All, we evaluate different combinations of the inputs. Interestingly, experimental results witness that Mimetic-ConSeq-a variant that uses the Context Inputs but does not rely on payload information (thus gaining greater robustness to more opaque encryption sub-layers possibly going to be adopted in the future)-experiences only ≈ 1% F-measure drop in performance w.r.t. Mimetic-All and results in a shorter training time.
2022 IEEE Symposium on Computers and Communications (ISCC)
With the increasing popularity of mobile-app services, malicious software is increasing as well. ... more With the increasing popularity of mobile-app services, malicious software is increasing as well. Accordingly, the interest of the scientific community in Machine and Deep Learning solutions for detecting and classifying malware traffic is growing. In this work, we provide a fair assessment of the performance of a number of data-driven strategies to detect and classify Android malware traffic. Three models are taken into account (Decision Tree, Random Forest, and 1-D Convolutional Neural Network) considering both flat (i.e. non-hierarchical) and hierarchical approaches. The experimental analysis performed using a stateof-art dataset (CIC-AAGM2017) reports that Random Forest exhibits the best performance in a flat setup, while moving to a hierarchical approach could cause significant variation in precision and recall. Such results push for further investigating advanced hierarchical setups and learning schemes.
2022 IEEE Symposium on Computers and Communications (ISCC)
The generation of synthetic network traffic is necessary to several fundamental networking activi... more The generation of synthetic network traffic is necessary to several fundamental networking activities, ranging from device testing to path monitoring, with implications on security and management. While literature focused on high-rate traffic generation, for many use cases accurate traffic generation is of importance instead. These scenarios have expanded with Network Function Virtualization, Software Defined Networking, and Cloud applications, which introduce further causes for alterations of generated traffic. Such causes are described and experimentally evaluated in this work, where the generation accuracy of D-ITG, an open-source software generator, is investigated in a virtualized environment. A definition of accuracy in terms of Mean Absolute Percentage Error of the sequences of Payload Lengths (PLs) and Inter-Departure Times (IDTs) is exploited to this end. The tool is found accurate for all PLs and for IDTs greater than one millisecond, and after the correction of a systematic error, also from 100 µs.
Journal of Network and Systems Management
Blockchain (BC) and software-defined networking (SDN) are leading technologies which have recentl... more Blockchain (BC) and software-defined networking (SDN) are leading technologies which have recently found applications in several network-related scenarios and have consequently experienced a growing interest in the research community. Indeed, current networks connect a massive number of objects over the Internet and in this complex scenario, to ensure security, privacy, confidentiality, and programmability, the utilization of BC and SDN have been successfully proposed. In this work, we provide a comprehensive survey regarding these two recent research trends and review the related state-of-the-art literature. We first describe the main features of each technology and discuss their most common and used variants. Furthermore, we envision the integration of such technologies to jointly take advantage of these latter efficiently. Indeed, we consider their group-wise utilization—named BC–SDN—based on the need for stronger security and privacy. Additionally, we cover the application field...
IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
In recent years, Internet of Things (IoT) traffic has increased dramatically and is expected to g... more In recent years, Internet of Things (IoT) traffic has increased dramatically and is expected to grow further in the next future. Because of their vulnerabilities, IoT devices are often the target of cyber-attacks with dramatic consequences. For this reason, there is a strong need for powerful tools to guarantee a good level of security in IoT networks. Machine and deep learning approaches promise good performance for such a complex task. In this work, we employ state-of-art traffic classifiers based on deep learning and assess their effectiveness in accomplishing IoT attack classification. We aim to recognize different attack classes and distinguish them from benign network traffic. In more detail, we utilize effective and unbiased input data that allow fast (i.e. "early") detection of anomalies and we compare performance with that of traditional (i.e. "postmortem") machine learning classifiers. The experimental results highlight the need for advanced deep learning architectures fed with input data specifically tailored and designed for IoT attack classification. Furthermore, we perform an occlusion analysis to assess the influence on the performance of some network layer fields and the possible bias they may introduce.
2021 IEEE 26th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD)
The lockdowns and lifestyle changes during the COVID-19 pandemic have caused a measurable impact ... more The lockdowns and lifestyle changes during the COVID-19 pandemic have caused a measurable impact on Internet traffic in terms of volumes and application mix, with a sudden increase of usage of communication and collaboration apps. In this work, we focus on five such apps, whose traffic we collect, reliably label at fine granularity (per-activity), and analyze from the viewpoint of traffic classification. To this aim, we employ state-of-art deep learning approaches to assess to which degree the apps, their different use cases (activities), and the pairs app-activity can be told apart from each other. We investigate the early behavior of the biflows composing the traffic and the effect of tuning the dimension of the input, via a sensitivity analysis. The experimental analysis highlights the figures of the different architectures, in terms of both traffic-classification performance and complexity w.r.t. different classification tasks, and the related trade-off. The outcome of this analysis is informative for a number of network management tasks, including monitoring, planning, resource provisioning, and (security) policy enforcement.
The process of associating (labeling) network traffic with specific applications or application t... more The process of associating (labeling) network traffic with specific applications or application types, known as Traffic Classification (TC), is increasingly challenged by the growing usage of smartphones, which is profoundly changing the kind of traffic that travels over home and enterprise networks and the Internet. TC comes with its own challenges and requirements that are even exacerbated in a mobile-traffic context, such as: (a) the adoption of encrypted protocols (b) a large number of apps to discriminate from, (c) the dynamic nature of network traffic and, more importantly, (d) the lack of a satisfactory flow-level Ground Truth (GT) to train the classification algorithms on, and test and compare them against. For this reason, this work proposes a novel self-supervised TC architecture composed of two main blocks: (i) an automatic GT generation tool and (ii) a Multi-Classifier System (MCS). The first block automatically produces a corpus of traffic traces with flow-level labeling, the label being the package name and version (uniquely identifying the mobile app); this is exploited to rapidly train (or re-train), in a supervised way, the proposed MCS, which is then employed on classification of true (human-generated) mobile traffic. In more detail, in the first block of the proposed system each app package of interest is automatically installed and run on a (physical- or virtual-) device connected to a network where all traffic generated or received by the device can be captured. Then the Graphical User Interface (GUI) of the app is explored, generating events as taps and keystrokes, causing the generation of network traffic. The GUI explorer is based on Android GUI Ripper, a tool implementing both Random and Active Learning techniques. The device is instrumented with a logger that records all network-related system calls originated by the exercised app to properly associate traffic flows with originating process names, thus avoiding mislabeling traffic from other apps or from the operating system. The traffic generated by the device is captured on a host (wireless access point) from which the device can also be controlled (e.g. via USB). The second block is represented by a MCS which intelligently-combines decisions from state-of-the-art (base) classifiers specifically devised for mobile- and encrypted-traffic classification. The MCS is intended to overcome the deficiencies of each single classifier (not improvable over a certain bound, despite efforts in “tuning”) and provide improved performance with respect to any of the base classifiers. The proposed MCS is not restricted to a specific set of classification algorithms and also allows for modularity of classifiers' selection in the pool. Additionally, the MCS can adopt several types of combiners (based on both hard and soft approaches) developed in the literature constituting a wide spectrum of achievable performance, operational complexity, and training set requirements. Preliminary results show that our system is able to: (i) automatically run mobile apps making them generate sufficient traffic to train a MCS; (ii) obtain promising results in terms of classification accuracy of new mobile apps traffic.
2019 Network Traffic Measurement and Analysis Conference (TMA), 2019
ICC 2021 - IEEE International Conference on Communications, 2021