Nedeljko Vasic - Academia.edu (original) (raw)
Papers by Nedeljko Vasic
ABSTRACT The power consumption of the Internet and datacenter networks is already significant due... more ABSTRACT The power consumption of the Internet and datacenter networks is already significant due to a large degree of redundancy and high idle power consumption of network elements. Therefore, dynamically matching network resources to the load is highly desirable. However, this is difficult because the traffic changes quicker than it is possible to compute the minimal network subset to carry the traffic demand. We achieve responsiveness by decoupling the decisions taken by routing and online traffic engineering (TE), and propose Energy-Proportional Networks (EPN) -- networks which use the minimum amount of energy to carry the required traffic. EPN computes three sets of routing tables: i) always-on, ii) on-demand, and iii) failover. A simple energy-aware TE algorithm (in)activates network elements to achieve the goal of energy-proportionality. Our evaluation on ISP and datacenter topologies shows that EPN achieves the goal of energy-proportionality and saves up to 42% of power, without sacrificing responsiveness. Further, using a Click testbed we show that it is possible to: 1) quickly and efficiently use the EPN paths at runtime for energy-saving, 2) quickly tolerate faults. Finally, two representative applications running over EPN-chosen paths demonstrate EPN's marginal impact on application-level throughput and latency.
Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking - HotSDN '13, 2013
Tolerating and recovering from link and switch failures are fundamental requirements of most netw... more Tolerating and recovering from link and switch failures are fundamental requirements of most networks, including Software-Defined Networks (SDNs). However, instead of traditional behaviors such as network-wide routing reconvergence, failure recovery in an SDN is determined by the specific software logic running at the controller. While this admits more freedom to respond to a failure event, it ultimately means that each controller application must include its own recovery logic, which makes the code more difficult to write and potentially more error-prone. In this paper, we propose a runtime system that automates failure recovery and enables network developers to write simpler, failure-agnostic code. To this end, upon detecting a failure, our approach first spawns a new controller instance that runs in an emulated environment consisting of the network topology excluding the failed elements. Then, it quickly replays inputs observed by the controller before the failure occurred, leading the emulated network into the forwarding state that accounts for the failed elements. Finally, it recovers the network by installing the difference ruleset between emulated and current forwarding states.
Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking - HotSDN '13, 2013
This paper demonstrates a new class of bugs that is likely to occur in enterprise OpenFlow deploy... more This paper demonstrates a new class of bugs that is likely to occur in enterprise OpenFlow deployments. In particular, step-by-step, reactive establishment of paths can cause network-wide inconsistencies or performance-and spacerelated inefficiencies. The cause for this behavior is inconsistent packet processing: as the packets travel through the network they do not encounter consistent state at the Open-Flow controller. To mitigate this problem, we propose to use transactional semantics at the controller to achieve consistent packet processing. We detail the challenges in achieving this goal (including the inability to directly apply database techniques), as well as a potentially promising approach. In particular, we envision the use of multi-commit transactions that could provide the necessary serialization and isolation properties without excessively reducing network performance.
2009 First International Communication Systems and Networks and Workshops, 2009
In this paper we describe UNO, a framework for finegrain explicit feedback congestion control pro... more In this paper we describe UNO, a framework for finegrain explicit feedback congestion control protocols that uses only 1 or 2 existing ECN bits, thus making algorithms that use more than 2 bits for encoding the load factor and the RTT immediately deployable. UNO accomplishes this task by changing the way load and RTT information is encoded in packets in a way that is similar to some existing schemes for encoding bottleneck link load. UNO leverages the values present in the IP identification field and trades-off a small amount of time (to send several packets) for space to emulate the existence of several extra bits within the IP header. The results from extensive ns2 simulations over various bandwidth and delay scenarios are encouraging. By using only one ECN bit we achieve substantially lower convergence times and better link utilization than the existing deployable protocols, with similar low queue size and negligible packet loss. With 2 ECN bits, we achieve very good fairness for flows with different RTTs, while keeping all the good characteristics of the 1-bit protocol and providing functionality that did not previously exist.
Proceeding of the 7th international conference on Autonomic computing - ICAC '10, 2010
ABSTRACT Increasing heat dissipation density is becoming a limiting factor in air-cooled data cen... more ABSTRACT Increasing heat dissipation density is becoming a limiting factor in air-cooled data centers. The main control objec-tive in data center thermal management is to keep the tem-perature of all the data processing equipment below a cer-tain threshold and at the same time ...
2010 Second International Conference on COMmunication Systems and NETworks (COMSNETS 2010), 2010
A major challenge for real-time streaming overlays is to distribute high bit-rate streams with un... more A major challenge for real-time streaming overlays is to distribute high bit-rate streams with uninterrupted playback. Hosts usually have sufficient inbound bandwidth to support streaming, but due to the prevalence of asymmetric links in broadband networks, the bottleneck is the aggregate, overlaywide outbound bandwidth. If this bandwidth is less than what is required to forward the stream to the overlay members, then a large number of users potentially experience poor playback. We argue that for successful streaming in bandwidth constrained situations overlays need to be able to adapt to the aggregate available bandwidth. We present four bandwidth adaptation policies for tree-based streaming overlays and evaluate their efficiency using a large-scale emulation testbed with realistic broadband link characteristics.
Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking - e-Energy '10, 2010
Power consumption of the Information and Communication Technology sector (ICT) has recently becom... more Power consumption of the Information and Communication Technology sector (ICT) has recently become a key challenge. In particular, actions to improve energy-efficiency of Internet Service Providers (ISPs) are becoming imperative. To this purpose, in this paper we focus on reducing the power consumption of access nodes in an ISP network, by controlling the amount of service capacity each network device has to offer to meet the actual traffic demand. More specifically, we propose a Green router (G-router) implementing a congestion control technique named Active Window Management (AWM) coupled with a new capacity scaling algorithm named Energy Aware service Rate Tuner Handling (EARTH). The AWM characteristics allow to detect whether a waste of energy is playing out, whereas EARTH is aimed at invoking power management primitives at the hardware level to precisely control the current capacity of access nodes and consequently their power consumption. We test the benefits of the AWM-EARTH mechanism on a realistic scenario. Results show that the capacity scaling technique can save up to 70% of power consumption, while guaranteeing Quality of Service and traffic demand constraints.
Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12, 2012
Effective resource management of virtualized environments is a challenging task. State-of-the-art... more Effective resource management of virtualized environments is a challenging task. State-of-the-art management systems either rely on analytical models or evaluate resource allocations by running actual experiments. However, both approaches incur a significant overhead once the workload changes. The former needs to recalibrate and re-validate models, whereas the latter has to run a new set of experiments to select a new resource allocation. During the adaptation period, the system may run with an inefficient configuration. In this paper, we propose DejaVu-a framework that (1) minimizes the resource management overhead by identifying a small set of workload classes for which it needs to evaluate resource allocation decisions, (2) quickly adapts to workload changes by classifying workloads using signatures and caching their preferred resource allocations at runtime, and (3) deals with interference by estimating an "interference index". We evaluate DejaVu by running representative network services on Amazon EC2. DejaVu achieves more than 10x speedup in adaptation time for each workload change relative to the state-of-the-art. By enabling quick adaptation, DejaVu saves up to 60% of the service provisioning cost. Finally, DejaVu is easily deployable as it does not require any extensive instrumentation or human intervention.
Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies on - CoNEXT '11, 2011
The power consumption of the Internet and datacenter networks is already significant, and threate... more The power consumption of the Internet and datacenter networks is already significant, and threatens to shortly hit the power delivery limits while the hardware is trying to sustain ever-increasing traffic requirements. Existing energyreduction approaches in this domain advocate recomputing network configuration with each substantial change in demand. Unfortunately, computing the minimum network subset is computationally hard and does not scale. Thus, the network is forced to operate with diminished performance during the recomputation periods. In this paper, we propose REsPoNse, a framework which overcomes the optimalityscalability trade-off. The insight in REsPoNse is to identify a few energy-critical paths off-line, install them into network elements, and use a simple online element to redirect the traffic in a way that enables large parts of the network to enter a low-power state. We evaluate REsPoNse with real network data and demonstrate that it achieves the same energy savings as the existing approaches, with marginal impact on network scalability and application performance.
Distributed systems are difficult to design and develop. The difficulties arise both in basic saf... more Distributed systems are difficult to design and develop. The difficulties arise both in basic safety correctness properties, as well as in achieving high performance. As a result of this complexity, the implementation of a distributed system often contains the basic algorithm coupled with an embedded strategy for making choices, such as the choice of a node to interact with. This paper proposes a programming model for distributed systems where 1) the application explicitly exposes the choices (decisions) that it needs to make as well as the objectives that it needs to maximize; 2) the application and the runtime system cooperate to maintain a predictive model of the distributed system and its environment; and 3) the runtime uses the predictive model to resolve the choices so as to maximize the objectives. We claim that this programming model results in simpler source code and lower development effort, and that it can lead to increased performance and robustness to various deployment settings. Our initial results of applying this model to a sample application are encouraging.
ABSTRACT The power consumption of the Internet and datacenter networks is already significant due... more ABSTRACT The power consumption of the Internet and datacenter networks is already significant due to a large degree of redundancy and high idle power consumption of network elements. Therefore, dynamically matching network resources to the load is highly desirable. However, this is difficult because the traffic changes quicker than it is possible to compute the minimal network subset to carry the traffic demand. We achieve responsiveness by decoupling the decisions taken by routing and online traffic engineering (TE), and propose Energy-Proportional Networks (EPN) -- networks which use the minimum amount of energy to carry the required traffic. EPN computes three sets of routing tables: i) always-on, ii) on-demand, and iii) failover. A simple energy-aware TE algorithm (in)activates network elements to achieve the goal of energy-proportionality. Our evaluation on ISP and datacenter topologies shows that EPN achieves the goal of energy-proportionality and saves up to 42% of power, without sacrificing responsiveness. Further, using a Click testbed we show that it is possible to: 1) quickly and efficiently use the EPN paths at runtime for energy-saving, 2) quickly tolerate faults. Finally, two representative applications running over EPN-chosen paths demonstrate EPN's marginal impact on application-level throughput and latency.
Proceedings of the 1st workshop on Automated control for datacenters and clouds - ACDC '09, 2009
Power consumption has become a critical issue in large scale clusters. Existing solutions for add... more Power consumption has become a critical issue in large scale clusters. Existing solutions for addressing the servers' energy consumption suggest "shrinking" the set of active machines, at least until the more power-proportional hardware devices become available. This paper demonstrates that leveraging the sleeping state, however, may lead to unacceptably poor performance and low data availability if the distributed services are not aware of the power management's actions. Therefore, we present an architecture for cluster services in which the deployed services overcome this problem by actively participating in any action taken by the power management. We propose, implement, and evaluate modifications for the Hadoop Distributed File System and the MapReduce clone that make them capable of operating efficiently under limited power budgets.
ABSTRACT The power consumption of the Internet and datacenter networks is already significant due... more ABSTRACT The power consumption of the Internet and datacenter networks is already significant due to a large degree of redundancy and high idle power consumption of network elements. Therefore, dynamically matching network resources to the load is highly desirable. However, this is difficult because the traffic changes quicker than it is possible to compute the minimal network subset to carry the traffic demand. We achieve responsiveness by decoupling the decisions taken by routing and online traffic engineering (TE), and propose Energy-Proportional Networks (EPN) -- networks which use the minimum amount of energy to carry the required traffic. EPN computes three sets of routing tables: i) always-on, ii) on-demand, and iii) failover. A simple energy-aware TE algorithm (in)activates network elements to achieve the goal of energy-proportionality. Our evaluation on ISP and datacenter topologies shows that EPN achieves the goal of energy-proportionality and saves up to 42% of power, without sacrificing responsiveness. Further, using a Click testbed we show that it is possible to: 1) quickly and efficiently use the EPN paths at runtime for energy-saving, 2) quickly tolerate faults. Finally, two representative applications running over EPN-chosen paths demonstrate EPN's marginal impact on application-level throughput and latency.
Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking - HotSDN '13, 2013
Tolerating and recovering from link and switch failures are fundamental requirements of most netw... more Tolerating and recovering from link and switch failures are fundamental requirements of most networks, including Software-Defined Networks (SDNs). However, instead of traditional behaviors such as network-wide routing reconvergence, failure recovery in an SDN is determined by the specific software logic running at the controller. While this admits more freedom to respond to a failure event, it ultimately means that each controller application must include its own recovery logic, which makes the code more difficult to write and potentially more error-prone. In this paper, we propose a runtime system that automates failure recovery and enables network developers to write simpler, failure-agnostic code. To this end, upon detecting a failure, our approach first spawns a new controller instance that runs in an emulated environment consisting of the network topology excluding the failed elements. Then, it quickly replays inputs observed by the controller before the failure occurred, leading the emulated network into the forwarding state that accounts for the failed elements. Finally, it recovers the network by installing the difference ruleset between emulated and current forwarding states.
Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking - HotSDN '13, 2013
This paper demonstrates a new class of bugs that is likely to occur in enterprise OpenFlow deploy... more This paper demonstrates a new class of bugs that is likely to occur in enterprise OpenFlow deployments. In particular, step-by-step, reactive establishment of paths can cause network-wide inconsistencies or performance-and spacerelated inefficiencies. The cause for this behavior is inconsistent packet processing: as the packets travel through the network they do not encounter consistent state at the Open-Flow controller. To mitigate this problem, we propose to use transactional semantics at the controller to achieve consistent packet processing. We detail the challenges in achieving this goal (including the inability to directly apply database techniques), as well as a potentially promising approach. In particular, we envision the use of multi-commit transactions that could provide the necessary serialization and isolation properties without excessively reducing network performance.
2009 First International Communication Systems and Networks and Workshops, 2009
In this paper we describe UNO, a framework for finegrain explicit feedback congestion control pro... more In this paper we describe UNO, a framework for finegrain explicit feedback congestion control protocols that uses only 1 or 2 existing ECN bits, thus making algorithms that use more than 2 bits for encoding the load factor and the RTT immediately deployable. UNO accomplishes this task by changing the way load and RTT information is encoded in packets in a way that is similar to some existing schemes for encoding bottleneck link load. UNO leverages the values present in the IP identification field and trades-off a small amount of time (to send several packets) for space to emulate the existence of several extra bits within the IP header. The results from extensive ns2 simulations over various bandwidth and delay scenarios are encouraging. By using only one ECN bit we achieve substantially lower convergence times and better link utilization than the existing deployable protocols, with similar low queue size and negligible packet loss. With 2 ECN bits, we achieve very good fairness for flows with different RTTs, while keeping all the good characteristics of the 1-bit protocol and providing functionality that did not previously exist.
Proceeding of the 7th international conference on Autonomic computing - ICAC '10, 2010
ABSTRACT Increasing heat dissipation density is becoming a limiting factor in air-cooled data cen... more ABSTRACT Increasing heat dissipation density is becoming a limiting factor in air-cooled data centers. The main control objec-tive in data center thermal management is to keep the tem-perature of all the data processing equipment below a cer-tain threshold and at the same time ...
2010 Second International Conference on COMmunication Systems and NETworks (COMSNETS 2010), 2010
A major challenge for real-time streaming overlays is to distribute high bit-rate streams with un... more A major challenge for real-time streaming overlays is to distribute high bit-rate streams with uninterrupted playback. Hosts usually have sufficient inbound bandwidth to support streaming, but due to the prevalence of asymmetric links in broadband networks, the bottleneck is the aggregate, overlaywide outbound bandwidth. If this bandwidth is less than what is required to forward the stream to the overlay members, then a large number of users potentially experience poor playback. We argue that for successful streaming in bandwidth constrained situations overlays need to be able to adapt to the aggregate available bandwidth. We present four bandwidth adaptation policies for tree-based streaming overlays and evaluate their efficiency using a large-scale emulation testbed with realistic broadband link characteristics.
Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking - e-Energy '10, 2010
Power consumption of the Information and Communication Technology sector (ICT) has recently becom... more Power consumption of the Information and Communication Technology sector (ICT) has recently become a key challenge. In particular, actions to improve energy-efficiency of Internet Service Providers (ISPs) are becoming imperative. To this purpose, in this paper we focus on reducing the power consumption of access nodes in an ISP network, by controlling the amount of service capacity each network device has to offer to meet the actual traffic demand. More specifically, we propose a Green router (G-router) implementing a congestion control technique named Active Window Management (AWM) coupled with a new capacity scaling algorithm named Energy Aware service Rate Tuner Handling (EARTH). The AWM characteristics allow to detect whether a waste of energy is playing out, whereas EARTH is aimed at invoking power management primitives at the hardware level to precisely control the current capacity of access nodes and consequently their power consumption. We test the benefits of the AWM-EARTH mechanism on a realistic scenario. Results show that the capacity scaling technique can save up to 70% of power consumption, while guaranteeing Quality of Service and traffic demand constraints.
Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12, 2012
Effective resource management of virtualized environments is a challenging task. State-of-the-art... more Effective resource management of virtualized environments is a challenging task. State-of-the-art management systems either rely on analytical models or evaluate resource allocations by running actual experiments. However, both approaches incur a significant overhead once the workload changes. The former needs to recalibrate and re-validate models, whereas the latter has to run a new set of experiments to select a new resource allocation. During the adaptation period, the system may run with an inefficient configuration. In this paper, we propose DejaVu-a framework that (1) minimizes the resource management overhead by identifying a small set of workload classes for which it needs to evaluate resource allocation decisions, (2) quickly adapts to workload changes by classifying workloads using signatures and caching their preferred resource allocations at runtime, and (3) deals with interference by estimating an "interference index". We evaluate DejaVu by running representative network services on Amazon EC2. DejaVu achieves more than 10x speedup in adaptation time for each workload change relative to the state-of-the-art. By enabling quick adaptation, DejaVu saves up to 60% of the service provisioning cost. Finally, DejaVu is easily deployable as it does not require any extensive instrumentation or human intervention.
Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies on - CoNEXT '11, 2011
The power consumption of the Internet and datacenter networks is already significant, and threate... more The power consumption of the Internet and datacenter networks is already significant, and threatens to shortly hit the power delivery limits while the hardware is trying to sustain ever-increasing traffic requirements. Existing energyreduction approaches in this domain advocate recomputing network configuration with each substantial change in demand. Unfortunately, computing the minimum network subset is computationally hard and does not scale. Thus, the network is forced to operate with diminished performance during the recomputation periods. In this paper, we propose REsPoNse, a framework which overcomes the optimalityscalability trade-off. The insight in REsPoNse is to identify a few energy-critical paths off-line, install them into network elements, and use a simple online element to redirect the traffic in a way that enables large parts of the network to enter a low-power state. We evaluate REsPoNse with real network data and demonstrate that it achieves the same energy savings as the existing approaches, with marginal impact on network scalability and application performance.
Distributed systems are difficult to design and develop. The difficulties arise both in basic saf... more Distributed systems are difficult to design and develop. The difficulties arise both in basic safety correctness properties, as well as in achieving high performance. As a result of this complexity, the implementation of a distributed system often contains the basic algorithm coupled with an embedded strategy for making choices, such as the choice of a node to interact with. This paper proposes a programming model for distributed systems where 1) the application explicitly exposes the choices (decisions) that it needs to make as well as the objectives that it needs to maximize; 2) the application and the runtime system cooperate to maintain a predictive model of the distributed system and its environment; and 3) the runtime uses the predictive model to resolve the choices so as to maximize the objectives. We claim that this programming model results in simpler source code and lower development effort, and that it can lead to increased performance and robustness to various deployment settings. Our initial results of applying this model to a sample application are encouraging.
ABSTRACT The power consumption of the Internet and datacenter networks is already significant due... more ABSTRACT The power consumption of the Internet and datacenter networks is already significant due to a large degree of redundancy and high idle power consumption of network elements. Therefore, dynamically matching network resources to the load is highly desirable. However, this is difficult because the traffic changes quicker than it is possible to compute the minimal network subset to carry the traffic demand. We achieve responsiveness by decoupling the decisions taken by routing and online traffic engineering (TE), and propose Energy-Proportional Networks (EPN) -- networks which use the minimum amount of energy to carry the required traffic. EPN computes three sets of routing tables: i) always-on, ii) on-demand, and iii) failover. A simple energy-aware TE algorithm (in)activates network elements to achieve the goal of energy-proportionality. Our evaluation on ISP and datacenter topologies shows that EPN achieves the goal of energy-proportionality and saves up to 42% of power, without sacrificing responsiveness. Further, using a Click testbed we show that it is possible to: 1) quickly and efficiently use the EPN paths at runtime for energy-saving, 2) quickly tolerate faults. Finally, two representative applications running over EPN-chosen paths demonstrate EPN's marginal impact on application-level throughput and latency.
Proceedings of the 1st workshop on Automated control for datacenters and clouds - ACDC '09, 2009
Power consumption has become a critical issue in large scale clusters. Existing solutions for add... more Power consumption has become a critical issue in large scale clusters. Existing solutions for addressing the servers' energy consumption suggest "shrinking" the set of active machines, at least until the more power-proportional hardware devices become available. This paper demonstrates that leveraging the sleeping state, however, may lead to unacceptably poor performance and low data availability if the distributed services are not aware of the power management's actions. Therefore, we present an architecture for cluster services in which the deployed services overcome this problem by actively participating in any action taken by the power management. We propose, implement, and evaluate modifications for the Hadoop Distributed File System and the MapReduce clone that make them capable of operating efficiently under limited power budgets.