Leonardo Aniello | Università degli Studi "La Sapienza" di Roma (original) (raw)
Papers by Leonardo Aniello
Proceedings of the 28th Annual ACM Symposium on Applied Computing - SAC '13, 2013
Future Generation Computer Systems, 2015
ABSTRACT Today’s business workflows are very likely to include batch computations that periodical... more ABSTRACT Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low.
Proceedings of the 7th ACM international conference on Distributed event-based systems - DEBS '13, 2013
Today we are witnessing a dramatic shift toward a datadriven economy, where the ability to effici... more Today we are witnessing a dramatic shift toward a datadriven economy, where the ability to efficiently and timely analyze huge amounts of data marks the difference between industrial success stories and catastrophic failures. In this scenario Storm, an open source distributed realtime computation system, represents a disruptive technology that is quickly gaining the favor of big players like Twitter and Groupon. A Storm application is modeled as a topology, i.e. a graph where nodes are operators and edges represent data flows among such operators. A key aspect in tuning Storm performance lies in the strategy used to deploy a topology, i.e. how Storm schedules the execution of each topology component on the available computing infrastructure. In this paper we propose two advanced generic schedulers for Storm that provide improved performance for a wide range of application topologies. The first scheduler works offline by analyzing the topology structure and adapting the deployment to it; the second scheduler enhance the previous approach by continuously monitoring system performance and rescheduling the deployment at run-time to improve overall performance. Experimental results show that these algorithms can produce schedules that achieve significantly better performances compared to those produced by Storm's default scheduler.
Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems - DEBS '14, 2014
While pub/sub communication middleware has become mainstream in many application domains, little ... more While pub/sub communication middleware has become mainstream in many application domains, little has been done to assess its weaknesses from a security standpoint. Complex attacks are usually planned by attackers by carefully analyzing the victim to identify those systems that, if successfully targeted, could provide the most effective result. In this paper we show that some pub/sub middleware are inherently vulnerable to a specific kind of preparatory attack, namely the Overlay Scan Attack, that a malicious user could exploit to infer the internal topology of a system, a sensible information that could be used to plan future attacks. The topology inference is performed by only using the standard primitives provided by the pub/sub middleware and assuming minimal knowledge on the target system. The practicality of this attack has been shown both in a simulated environment and through a test performed on a SIENA pub/sub deployment.
Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infra... more Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infrastructures, are more and more vulnerable to cyber threats, due to the adoption of commodity computing facilities. Despite the use of several monitoring tools, recent attacks have proven that current defensive mechanisms for CIs are not effective enough against most advanced threats. In this paper we explore the idea of a framework leveraging multiple data sources to improve protection capabilities of CIs. Challenges and opportunities are discussed along three main research directions: i) use of distinct and heterogeneous data sources, ii) monitoring with adaptive granularity, and iii) attack modeling and runtime combination of multiple data analysis techniques.
Proceedings of the 2nd International Workshop on Dependability Issues in Cloud Computing - DISCCO '13, 2013
Data Centers are evolving to adapt to emerging IT trends such as Big Data and Cloud Computing, wh... more Data Centers are evolving to adapt to emerging IT trends such as Big Data and Cloud Computing, which push for increased scalability and improved service availability. Among the side effects of this kind of evolution, the proliferation of new security breaches represents a major issue that usually does not get properly addressed since the focus tends to be kept on developing an innovative high-performance technology rather than making it secure. Consequently, new distributed applications deployed on Data Centers turn out to be vulnerable to malicious attacks. This paper analyzes the vulnerabilities of the gossip-based membership protocol used by Cassandra, a well-known distributed NoSQL Database. Cassandra is being widely employed as storage service in applications where very large data volumes have to be managed. An attack exploiting such weaknesses is presented, which impacts on Cassandra's availability by affecting both the latency and the successful outcome of requests. A lightweight solution is also proposed that prevents this threat from succeeding at the price of a negligible overhead.
Organizations must protect their information systems from a variety of threats. Usually they empl... more Organizations must protect their information systems from a variety of threats. Usually they employ isolated defenses such as firewalls, intrusion detection and fraud monitoring systems, without cooperating with the external world. Organizations belonging to the same markets (e.g., financial organizations, telco providers) typically suffer from the same cyber crimes. Sharing and correlating information could help them in early detecting those crimes and mitigating the damages.
Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infra... more Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infrastructures, are more and more vulnerable to cyber threats, due to the adoption of commodity computing facilities. Despite the use of several monitoring tools, recent attacks have proven that current defensive mechanisms for CIs are not effective enough against most advanced threats. In this paper we explore the idea of a framework leveraging multiple data sources to improve protection capabilities of CIs. Challenges and opportunities are discussed along three main research directions: i) use of distinct and heterogeneous data sources, ii) monitoring with adaptive granularity, and iii) attack modeling and runtime combination of multiple data analysis techniques.
Proceedings of the 13th European Workshop on Dependable Computing - EWDC '11, 2011
Large enterprises are nowadays complex interconnected software systems spanning over several doma... more Large enterprises are nowadays complex interconnected software systems spanning over several domains. This new dimension makes difficult for enterprises the task of enabling efficient security defenses. This paper addresses the problem of detecting inter-domain stealthy port scans and proposes an architecture of an Intrusion Detection System which uses, for such purpose, an open source Complex Event Processing engine named Esper. Esper provides low cost of ownership and high flexibility. The architecture consists of software sensors deployed at different enterprise domains. Each sensor sends events to the Esper event processor for correlation. We implemented an algorithm for the detection of interdomain SYN port scans named Rank-based SYN (R-SYN) port scan detection algorithm. It combines and adapts three detection techniques in order to obtain a unique global statement about the malicious behavior of host activities. An evaluation of the accuracy of our approach has been carried out using several traces, some of which including original traffic dumps, some others altered by injecting packets that simulate port scan activities. Accuracy results show that our algorithm is able to produce a list of scanners characterized by high detection and low false positive rates.
Lecture Notes in Computer Science, 2014
Collaborative Financial Infrastructure Protection, 2012
Abstract We introduce Agilis—a lightweight collaborative event processing platform that can be de... more Abstract We introduce Agilis—a lightweight collaborative event processing platform that can be deployed in a Semantic Room to facilitate sharing and correlating event data generated in real time by multiple widely distributed sources. Agilis aims to balance simplicity of use and robustness on the one hand, and scalable performance in large-scale settings on the other. To this end, Agilis is built upon the open source Hadoop's MapReduce infrastructure augmented with a RAM-based data store and several locality-oriented optimizations to ...
Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems - DEBS '14, 2014
In this tutorial we present the results of recent research about the cloud enablement of data str... more In this tutorial we present the results of recent research about the cloud enablement of data streaming systems. We illustrate, based on both industrial as well as academic prototypes, new emerging uses cases and research trends. Specifically, we focus on novel approaches for (1) scalability and fault tolerance in large scale distributed streaming systems. In general, new fault tolerance mechanisms strive to be more robust and at the same time introduce less overhead. Novel load balancing approaches focus on elastic scaling over hundreds of instances based on the data and query workload. Finally, we present open challenges for the next generation of cloud-based data stream processing engines.
As cyber attacks become increasingly distributed and sophisticated, so must our defenses. Collabo... more As cyber attacks become increasingly distributed and sophisticated, so must our defenses. Collaborative processing of data produced by independent sources is advantageous for early and accurate detection of Internet-based threats, and instrumental for identifying complex attack patterns that target multiple administratively and geographically disjoint entities. In this paper, we introduce Agilis -a lightweight collaborative event processing platform for sharing and correlating event data generated in real time by multiple widely distributed sources. The primary goal of the Agilis design is to tread the balance between simplicity of use, robustness and scalability on one hand, and reasonable performance in large-scale settings on the other. To this end, Agilis is built upon the open-source Hadoop's MapReduce infrastructure, which we augmented with a RAM-based data store and various locality-oriented optimizations to improve responsiveness and reduce overheads. The processing logic is specified in a flexible high-level language, called Jaql, which supports data flows and SQL-like query constructs. We demonstrate the utility of the Agilis framework by showing how it facilitates the collaborative detection of two different exploits: stealthy inter-domain port scans used by hackers for reconnaissance, and a botnet-driven HTTP session hijacking attack. We evaluate the performance of Agilis in both scenarios, and, in the case of interdomain port scans, compare it to a centralized high-end event processing system called ESPER. Our results show that while Agilis is slower than ESPER in a local area network, its relative performance improves substantially as we move towards larger scale distributed deployments.
Collaborative Financial Infrastructure Protection: Tools, Abstractions, and Middleware, Jan 10, 2012
This chapter focuses on attack strategies that can be (and have been) used against financial IT i... more This chapter focuses on attack strategies that can be (and have been) used against financial IT infrastructures. The first section presents an overview and a classification of the different kinds of frauds and attacks carried out against financial institutions and their IT infrastructures. We then restrict our focus by analyzing in detail five attack scenarios, selected among the ones presented in the previous section. These attack scenarios are: Man in the Middle (and its variant, Man in the Browser), distributed denial of service ( ...
Collaborative Financial Infrastructure Protection: Tools, Abstractions, and Middleware, Jan 10, 2012
This chapter describes a specific instance of a Semantic Room that makes use of the well-known ce... more This chapter describes a specific instance of a Semantic Room that makes use of the well-known centralized complex event processing engine Esper in order to effectively detect inter-domain malicious port scan activities. The Esper engine is deployed by the SR administrator and correlates a massive amount of network traffic data exhibiting the evidence of distributed port scans. The chapter presents two interdomain SYN scan detection algorithms that have been designed and implemented in Esper and then ...
Computer Safety, Reliability, and Security, 2011
We describe an Internet-based collaborative environment that protects geographically dispersed or... more We describe an Internet-based collaborative environment that protects geographically dispersed organizations of a critical infrastructure (eg, financial institutions, telco providers) from coordinated cyber attacks. A specific instance of a collaborative environment for detecting malicious inter-domain port scans is introduced. This instance uses the open source Complex Event Processing (CEP) engine ESPER to correlate massive amounts of network traffic data exhibiting the evidence of those scans. The paper presents two inter- ...
Proceedings of the 28th Annual ACM Symposium on Applied Computing - SAC '13, 2013
Future Generation Computer Systems, 2015
ABSTRACT Today’s business workflows are very likely to include batch computations that periodical... more ABSTRACT Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low.
Proceedings of the 7th ACM international conference on Distributed event-based systems - DEBS '13, 2013
Today we are witnessing a dramatic shift toward a datadriven economy, where the ability to effici... more Today we are witnessing a dramatic shift toward a datadriven economy, where the ability to efficiently and timely analyze huge amounts of data marks the difference between industrial success stories and catastrophic failures. In this scenario Storm, an open source distributed realtime computation system, represents a disruptive technology that is quickly gaining the favor of big players like Twitter and Groupon. A Storm application is modeled as a topology, i.e. a graph where nodes are operators and edges represent data flows among such operators. A key aspect in tuning Storm performance lies in the strategy used to deploy a topology, i.e. how Storm schedules the execution of each topology component on the available computing infrastructure. In this paper we propose two advanced generic schedulers for Storm that provide improved performance for a wide range of application topologies. The first scheduler works offline by analyzing the topology structure and adapting the deployment to it; the second scheduler enhance the previous approach by continuously monitoring system performance and rescheduling the deployment at run-time to improve overall performance. Experimental results show that these algorithms can produce schedules that achieve significantly better performances compared to those produced by Storm's default scheduler.
Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems - DEBS '14, 2014
While pub/sub communication middleware has become mainstream in many application domains, little ... more While pub/sub communication middleware has become mainstream in many application domains, little has been done to assess its weaknesses from a security standpoint. Complex attacks are usually planned by attackers by carefully analyzing the victim to identify those systems that, if successfully targeted, could provide the most effective result. In this paper we show that some pub/sub middleware are inherently vulnerable to a specific kind of preparatory attack, namely the Overlay Scan Attack, that a malicious user could exploit to infer the internal topology of a system, a sensible information that could be used to plan future attacks. The topology inference is performed by only using the standard primitives provided by the pub/sub middleware and assuming minimal knowledge on the target system. The practicality of this attack has been shown both in a simulated environment and through a test performed on a SIENA pub/sub deployment.
Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infra... more Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infrastructures, are more and more vulnerable to cyber threats, due to the adoption of commodity computing facilities. Despite the use of several monitoring tools, recent attacks have proven that current defensive mechanisms for CIs are not effective enough against most advanced threats. In this paper we explore the idea of a framework leveraging multiple data sources to improve protection capabilities of CIs. Challenges and opportunities are discussed along three main research directions: i) use of distinct and heterogeneous data sources, ii) monitoring with adaptive granularity, and iii) attack modeling and runtime combination of multiple data analysis techniques.
Proceedings of the 2nd International Workshop on Dependability Issues in Cloud Computing - DISCCO '13, 2013
Data Centers are evolving to adapt to emerging IT trends such as Big Data and Cloud Computing, wh... more Data Centers are evolving to adapt to emerging IT trends such as Big Data and Cloud Computing, which push for increased scalability and improved service availability. Among the side effects of this kind of evolution, the proliferation of new security breaches represents a major issue that usually does not get properly addressed since the focus tends to be kept on developing an innovative high-performance technology rather than making it secure. Consequently, new distributed applications deployed on Data Centers turn out to be vulnerable to malicious attacks. This paper analyzes the vulnerabilities of the gossip-based membership protocol used by Cassandra, a well-known distributed NoSQL Database. Cassandra is being widely employed as storage service in applications where very large data volumes have to be managed. An attack exploiting such weaknesses is presented, which impacts on Cassandra's availability by affecting both the latency and the successful outcome of requests. A lightweight solution is also proposed that prevents this threat from succeeding at the price of a negligible overhead.
Organizations must protect their information systems from a variety of threats. Usually they empl... more Organizations must protect their information systems from a variety of threats. Usually they employ isolated defenses such as firewalls, intrusion detection and fraud monitoring systems, without cooperating with the external world. Organizations belonging to the same markets (e.g., financial organizations, telco providers) typically suffer from the same cyber crimes. Sharing and correlating information could help them in early detecting those crimes and mitigating the damages.
Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infra... more Critical Infrastructures (CIs), such as smart power grids, transport systems, and financial infrastructures, are more and more vulnerable to cyber threats, due to the adoption of commodity computing facilities. Despite the use of several monitoring tools, recent attacks have proven that current defensive mechanisms for CIs are not effective enough against most advanced threats. In this paper we explore the idea of a framework leveraging multiple data sources to improve protection capabilities of CIs. Challenges and opportunities are discussed along three main research directions: i) use of distinct and heterogeneous data sources, ii) monitoring with adaptive granularity, and iii) attack modeling and runtime combination of multiple data analysis techniques.
Proceedings of the 13th European Workshop on Dependable Computing - EWDC '11, 2011
Large enterprises are nowadays complex interconnected software systems spanning over several doma... more Large enterprises are nowadays complex interconnected software systems spanning over several domains. This new dimension makes difficult for enterprises the task of enabling efficient security defenses. This paper addresses the problem of detecting inter-domain stealthy port scans and proposes an architecture of an Intrusion Detection System which uses, for such purpose, an open source Complex Event Processing engine named Esper. Esper provides low cost of ownership and high flexibility. The architecture consists of software sensors deployed at different enterprise domains. Each sensor sends events to the Esper event processor for correlation. We implemented an algorithm for the detection of interdomain SYN port scans named Rank-based SYN (R-SYN) port scan detection algorithm. It combines and adapts three detection techniques in order to obtain a unique global statement about the malicious behavior of host activities. An evaluation of the accuracy of our approach has been carried out using several traces, some of which including original traffic dumps, some others altered by injecting packets that simulate port scan activities. Accuracy results show that our algorithm is able to produce a list of scanners characterized by high detection and low false positive rates.
Lecture Notes in Computer Science, 2014
Collaborative Financial Infrastructure Protection, 2012
Abstract We introduce Agilis—a lightweight collaborative event processing platform that can be de... more Abstract We introduce Agilis—a lightweight collaborative event processing platform that can be deployed in a Semantic Room to facilitate sharing and correlating event data generated in real time by multiple widely distributed sources. Agilis aims to balance simplicity of use and robustness on the one hand, and scalable performance in large-scale settings on the other. To this end, Agilis is built upon the open source Hadoop's MapReduce infrastructure augmented with a RAM-based data store and several locality-oriented optimizations to ...
Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems - DEBS '14, 2014
In this tutorial we present the results of recent research about the cloud enablement of data str... more In this tutorial we present the results of recent research about the cloud enablement of data streaming systems. We illustrate, based on both industrial as well as academic prototypes, new emerging uses cases and research trends. Specifically, we focus on novel approaches for (1) scalability and fault tolerance in large scale distributed streaming systems. In general, new fault tolerance mechanisms strive to be more robust and at the same time introduce less overhead. Novel load balancing approaches focus on elastic scaling over hundreds of instances based on the data and query workload. Finally, we present open challenges for the next generation of cloud-based data stream processing engines.
As cyber attacks become increasingly distributed and sophisticated, so must our defenses. Collabo... more As cyber attacks become increasingly distributed and sophisticated, so must our defenses. Collaborative processing of data produced by independent sources is advantageous for early and accurate detection of Internet-based threats, and instrumental for identifying complex attack patterns that target multiple administratively and geographically disjoint entities. In this paper, we introduce Agilis -a lightweight collaborative event processing platform for sharing and correlating event data generated in real time by multiple widely distributed sources. The primary goal of the Agilis design is to tread the balance between simplicity of use, robustness and scalability on one hand, and reasonable performance in large-scale settings on the other. To this end, Agilis is built upon the open-source Hadoop's MapReduce infrastructure, which we augmented with a RAM-based data store and various locality-oriented optimizations to improve responsiveness and reduce overheads. The processing logic is specified in a flexible high-level language, called Jaql, which supports data flows and SQL-like query constructs. We demonstrate the utility of the Agilis framework by showing how it facilitates the collaborative detection of two different exploits: stealthy inter-domain port scans used by hackers for reconnaissance, and a botnet-driven HTTP session hijacking attack. We evaluate the performance of Agilis in both scenarios, and, in the case of interdomain port scans, compare it to a centralized high-end event processing system called ESPER. Our results show that while Agilis is slower than ESPER in a local area network, its relative performance improves substantially as we move towards larger scale distributed deployments.
Collaborative Financial Infrastructure Protection: Tools, Abstractions, and Middleware, Jan 10, 2012
This chapter focuses on attack strategies that can be (and have been) used against financial IT i... more This chapter focuses on attack strategies that can be (and have been) used against financial IT infrastructures. The first section presents an overview and a classification of the different kinds of frauds and attacks carried out against financial institutions and their IT infrastructures. We then restrict our focus by analyzing in detail five attack scenarios, selected among the ones presented in the previous section. These attack scenarios are: Man in the Middle (and its variant, Man in the Browser), distributed denial of service ( ...
Collaborative Financial Infrastructure Protection: Tools, Abstractions, and Middleware, Jan 10, 2012
This chapter describes a specific instance of a Semantic Room that makes use of the well-known ce... more This chapter describes a specific instance of a Semantic Room that makes use of the well-known centralized complex event processing engine Esper in order to effectively detect inter-domain malicious port scan activities. The Esper engine is deployed by the SR administrator and correlates a massive amount of network traffic data exhibiting the evidence of distributed port scans. The chapter presents two interdomain SYN scan detection algorithms that have been designed and implemented in Esper and then ...
Computer Safety, Reliability, and Security, 2011
We describe an Internet-based collaborative environment that protects geographically dispersed or... more We describe an Internet-based collaborative environment that protects geographically dispersed organizations of a critical infrastructure (eg, financial institutions, telco providers) from coordinated cyber attacks. A specific instance of a collaborative environment for detecting malicious inter-domain port scans is introduced. This instance uses the open source Complex Event Processing (CEP) engine ESPER to correlate massive amounts of network traffic data exhibiting the evidence of those scans. The paper presents two inter- ...