Neural adaptive admission control framework: SLA-driven action termination for real-time application service management (original) (raw)

Neural Adaptive Control in Application Service Management Environment

We introduce a learning controller framework for adaptive control in application service management environments and explore its potential. Run-time metrics are collected by observing the enterprise system during its normal operation and load tests are persisted creating a knowledge base of real system states. Equipped with such knowledge the proposed framework associates system states and high/low service level agreement values with successful/ unsuccessful control actions. These associations are used to induce decision rules, which help generating training sets for a neural networks-based control decision module that operates in the application run-time. Control actions are executed in the background of the current system state, which is then again monitored and stored extending the system state repository/knowledge base, and evaluating the correctness of the control actions frequently. This incremental learning leads to evolving controller behavior by taking into account consequences of earlier actions in a particular situation, or other similar situations. Our tests demonstrate that this controller is able to adapt to changing run-time conditions and workloads based on SLA definitions and is able to control the instrumented system under overloading effectively.

Control strategies for adaptive resource allocation in cloud computing

IFAC-PapersOnLine, 2020

Using a compute infrastructure efficiently to execute jobs while respecting Service Level Agreements (SLAs) and thereby guaranteeing Quality of Service (QoS) poses a number of challenges. One such challenge lies in the fact that SLAs are set prior to the execution of a job, but the execution environment is subject to a number of possible disturbances, such as poor knowledge about actual resource necessity, demand peaks and hardware malfunctions, amongst others. Thus by using a fixed resource allocation, the manager of a shared computing environment risks violating user SLAs. Furthermore, the complexity of managing several workload executions increases with the number of workloads, implying the need for an automatic method to manage and control the execution of workloads. The execution time SLA is specially important in streaming scenarios such as web applications and continuous video processing, and is the focus of this paper. A method based on adaptive model predictive control (aMP...

Self-adaptive Resource Management System in IaaS Clouds

—Resource management in cloud infrastructures is one of the most challenging problems due to the heterogeneity of resources, variability of the workload and scale of data centers. Efficient management of physical and virtual resources can be achieved considering performance requirements of hosted applications and infrastructure costs. In this paper, we present a self-adaptive resource management system based on a hierarchical multi-agent based architecture. The system uses novel adaptive utilization threshold mechanism and benefits from reinforcement learning technique to dynamically adjust CPU and memory thresholds for each Physical Machine (PM). It periodically runs a Virtual Machine (VM) placement optimization algorithm to keep the total resource utilization of each PM within given thresholds for improving Service Level Agreement (SLA) compliance. Moreover, the algorithm consolidates VMs into the minimum number of active PMs in order to reduce the energy consumption. Experimental results on real workload traces show that our recourse management system provides substantial improvement over other approaches in terms of performance requirements, energy consumption and the number of VM migrations.