Dariusz Rafal Augustyn | Silesian University of Technology (original) (raw)
Papers by Dariusz Rafal Augustyn
Studia Informatica, 2010
Interceptor TestInterceptor register command
Journal of Computational Science
Scientific Data, Feb 7, 2023
the ability to uncover characteristics based on empirical measurement is an important step in und... more the ability to uncover characteristics based on empirical measurement is an important step in understanding the underlying system that gives rise to an observed time series. this is especially important for biological signals whose characteristic contributes to the underlying dynamics of the physiological processes. therefore, by studying such signals, the physiological systems that generate them can be better understood. The datasets presented consist of 33,000 time series of 15 dynamical systems (five chaotic and ten non-chaotic) of the first, second, or third order. Here, the order of a dynamical system means its dimension. The non-chaotic systems were divided into the following classes: periodic, quasi-periodic, and non-periodic. The aim is to propose datasets for machine learning methods, in particular deep learning techniques, to analyze unknown dynamical system characteristics based on obtained time series. In technical validation, three classifications experiments were conducted using two types of neural networks with long short-term memory modules and convolutional layers.
Lecture Notes in Computer Science, 2022
Applied Sciences
In this paper, the authors, based on a case study of the Polish healthcare IT system being deploy... more In this paper, the authors, based on a case study of the Polish healthcare IT system being deployed to the cloud, show the possibilities for limiting the computing resources consumption of rarely used services. The architecture of today’s developed application systems is often based on the architectural style of microservices, where individual groups of services are deployed independently of each other. This is also the case with the system under discussion. Most often, the nature of the workload of each group of services is different, which creates some challenges but also provides opportunities to make optimizations in the consumption of computing resources, thus lowering the environmental footprint and at the same time gaining measurable financial benefits. Unlike other scaling methods, such as those based on MDP and reinforcement learning in particular, which focus on system load prediction, in this paper, the authors propose a reactive approach in which any, even unpredictable,...
Studia Informatica, 2013
jest parametrem wyznaczanym przez bazodanowy optymalizator zapytań w celu wczesnego oszacowania r... more jest parametrem wyznaczanym przez bazodanowy optymalizator zapytań w celu wczesnego oszacowania rozmiaru danych spełniających warunek zapytania. Jest to czynność niezbędna do znalezienia optymalnego planu wykonania zapytania. Selektywność jest na ogół oszacowywana na podstawie histogramów, które są nieparametrycznymi estymatorami rozkładów wartości atrybutów. Wyznaczanie selektywności dla zapytań z warunkiem selekcji opartym na kilku atrybutach wymaga wykorzystania wielowymiarowego histogramu estymującego łączny rozkład wartości atrybutów. Dokładność histogramów wielowymiarowych spada wraz ze wzrostem liczby wymiarów, co jest powszechnie znane pod nazwą problemu przekleństwa wymiarowości. Natomiast jednowymiarowe histogramy zbudowane dla pojedynczych atrybutów, które charakteryzują rozkład brzegowy, opisują ten jednowymiarowy rozkład dokładniej, ale oczywiście nie opisują zależności pomiędzy atrybutami. W niniejszym artykule zaproponowano metodę wyznaczania selektywności, opartą na histogramach opisujących zarówno rozkład łączny, jak i rozkłady brzegowe. Zaproponowana metoda (nazwana M2HSE) dotyczy pewnej klasy zapytań, w których zakresowy warunek selekcji oparty jest na wielu atrybutach. Dla takich zapytań przedstawiona metoda może pozwolić na wyznaczenie dokładniejszych przybliżeń wartości selektywności niż klasyczne metody, wykorzystujące histogramy opisujące tylko rozkład łączny albo tylko rozkłady brzegowe (gdzie zastosowane jest założenie o niezależności atrybutów).
Entropy, 2019
Analysis of eye movement has attracted a lot of attention recently in terms of exploring areas of... more Analysis of eye movement has attracted a lot of attention recently in terms of exploring areas of people’s interest, cognitive ability, and skills. The basis for eye movement usage in these applications is the detection of its main components—namely, fixations and saccades, which facilitate understanding of the spatiotemporal processing of a visual scene. In the presented research, a novel approach for the detection of eye movement events is proposed, based on the concept of approximate entropy. By using the multiresolution time-domain scheme, a structure entitled the Multilevel Entropy Map was developed for this purpose. The dataset was collected during an experiment utilizing the “jumping point” paradigm. Eye positions were registered with a 1000 Hz sampling rate. For event detection, the knn classifier was applied. The best classification efficiency in recognizing the saccadic period ranged from 83% to 94%, depending on the sample size used. These promising outcomes suggest that ...
Entropy, 2019
Analysis of eye movement has attracted a lot of attention recently in terms of exploring areas of... more Analysis of eye movement has attracted a lot of attention recently in terms of exploring areas of people’s interest, cognitive ability, and skills. The basis for eye movement usage in these applications is the detection of its main components—namely, fixations and saccades, which facilitate understanding of the spatiotemporal processing of a visual scene. In the presented research, a novel approach for the detection of eye movement events is proposed, based on the concept of approximate entropy. By using the multiresolution time-domain scheme, a structure entitled the Multilevel Entropy Map was developed for this purpose. The dataset was collected during an experiment utilizing the “jumping point” paradigm. Eye positions were registered with a 1000 Hz sampling rate. For event detection, the knn classifier was applied. The best classification efficiency in recognizing the saccadic period ranged from 83% to 94%, depending on the sample size used. These promising outcomes suggest that ...
Bulletin of the Polish Academy of Sciences Technical Sciences, 2014
Obtaining the optimal query execution plan requires a selectivity estimation. The selectivity val... more Obtaining the optimal query execution plan requires a selectivity estimation. The selectivity value allows to predict the size of a query result. This lets choose the best method of query execution. There are many selectivity obtaining methods that are based on different types of estimators of attribute values distribution (commonly they are based on histograms). The adaptive method, proposed in this paper, uses either attribute values distribution or range query condition boundaries one. The new type of histogram - the Query-Conditional-Aware V-optimal one (QCA-V-optimal) - is proposed as a non-parametric estimator of a probability density function of attribute values distribution. This histogram also takes into account information about already processed queries. This information is represented by the 1-dimensional Query Condition Distribution histogram (HQCD) which is an estimator of the include function PI which is also introduced in this paper. PI describes so-called regions of...
Lecture Notes in Computer Science, 2014
Selectivity estimation is a parameter used by a query optimizer for early estimation of the size ... more Selectivity estimation is a parameter used by a query optimizer for early estimation of the size of data that satisfies query condition. Selectivity is calculated using an estimator of distribution of attribute values of attribute involved in a processed query condition. Histograms built on attributes values from a database may be such representation of the distribution. The paper introduces a new querydistribution-aware V-optimal histogram which is useful in selectivity estimation for a range query. It takes into account either a 1-D distribution of attribute values or a 2-D distribution of boundaries of already processed queries. The advantages of qda-V-optimal histogram appears when it is applied for selectivity estimation of range query conditions that form so-called hot regions. To obtain the proposed error-optimal histogram we use dynamic programming method, Fuzzy C-Means clustering of a set of range boundaries.
Communications in Computer and Information Science, 2014
ABSTRACT The paper considers the problem of prediction of a probability distribution. We take int... more ABSTRACT The paper considers the problem of prediction of a probability distribution. We take into account an extrapolation model based on evolution of quantiles. We may use any concrete model which allows to track and extrapolate boundaries of buckets of an equi-height histogram. This histogram with p + 1 boundaries is equivalent to p-quantiles. Using such baseline extrapolation model we may obtain lines of locations of bucket boundaries that may intersect in future. To avoid intersections and to extend (in time) correctness of the results, we propose to use a model of continuous dynamical system with viscous resistance forces for obtaining improved lines of locations. The proposed model allows to obtain lines with unchanged shapes or very similar ones (comparing to the results from the baseline extrapolation model) but without any intersections. This approach will be helpful when a previously used baseline extrapolation model is too much time limited. The work was inspired by the problem of prediction of an attribute value distribution used for query selectivity estimation. However, the proposed method may be applied not only in query optimization problem
Przeglądowy artykuł opisuje metody estymacji selektywności pewnej klasy zapytań ze złożonymi waru... more Przeglądowy artykuł opisuje metody estymacji selektywności pewnej klasy zapytań ze złożonymi warunkami selekcji. Proste metody, wykorzystywane komercyjnie, zakładają pewne uproszczenie – niezależność wartości atrybutów tablic. Inne, dokładniejsze, bazują na estymacji wielowymiarowego rozkładu wartości atrybutów. Prezentowane, zaawansowane metody wykorzystują transformaty kosinusową i falkową dla efektywnego wyznaczania selektywności opierając się na stratnie skompresowanym widmie częstości wartości atrybutów.
Artykuł prezentuje wyniki wydajnościowej analizy programów przeznaczonych do symulacji ciągłych u... more Artykuł prezentuje wyniki wydajnościowej analizy programów przeznaczonych do symulacji ciągłych układów dynamicznych, utworzonych z wykorzystaniem modułu Parallel Extensions to .NET Framework. Przedmiotem rozważanym w artykule jest modelowanie ruchu układów ciał w polu grawitacyjnym. W pracy pokazano zalety zrównoleglonych programów, zbudowanych na podstawie technologii .NET. W artykule przedstawiono wydajnościowe porównania zaproponowanego rozwiązania do rozwiązań sekwencyjnych: skryptów systemu MATLAB i programów jednowątkowych, wykonanych w technologii .NET, uruchamianych na komputerach z procesorami wielordzeniowymi. W pracy rozważono zagadnienie skalowalności zaproponowanego rozwiązania.
Advances in Intelligent Systems and Computing, 2017
Selectivity is a parameter used by a query optimizer for estimating the size of data that satisfi... more Selectivity is a parameter used by a query optimizer for estimating the size of data that satisfies a query condition. Calculation of selectivity requires some representation of distribution of attribute values. Commonly, one-dimensional histograms that describe distributions of single attribute are used in DBMSes. A multidimensional (m-d) representation is required for complex queries with a range selection condition based on many attributes. Storing m-d representation directly (e.g. m-d histogram) is very space consuming for high dimensions hence the copula-based approach is proposed where we only need to store a few parameters. By using very few parameters of copula we achieve the method more accurate in selectivity estimation than the method based on attribute values independence which is commonly used by database management systems. The paper presents a software module which provides the copula-based method of selectivity estimation for a m-d range query. The presented solution is based on R Serve and it is integrated with Oracle DBMS. Some additional advantages of the module result from caching selectivities values for similar conditions are shown.
Computer Networks, 2017
Elements of cloud infrastructure like load balancers, instances of virtual server (service nodes)... more Elements of cloud infrastructure like load balancers, instances of virtual server (service nodes), storage services are used in an architecture of modern cloud-enabled systems. Auto scaling is a mechanism which allows to on-line adapt efficiency of a system to current load. It is done by increasing or decreasing number of running instances. Auto scaling model uses a statistics based on a standard metrics like CPU Utilization or a custom metrics like execution time of selected business service. By horizontal scaling, the model should satisfy Quality of Service requirements (QoS). QoS requirements are determined by criteria based on statistics defined on metrics. The auto scaling model should minimize the cost (mainly measured by the number of used instances) subject to an assumed QoS requirements. There are many reactive (on current load) and predictive (future load) approaches to the model of auto scaling. In this paper we propose some extensions to the concrete reactive auto scaling model to improve sensitivity to load changes. We introduce the extension which varying threshold of CPU Utilization in scaling-out policy. We extend the model by introducing randomized method in scaling-in policy.
Advances in Intelligent Systems and Computing, 2015
ABSTRACT Selectivity factor is obtained by database query optimizer for estimating the size of da... more ABSTRACT Selectivity factor is obtained by database query optimizer for estimating the size of data that satisfy a query condition. This allows to choose the optimal query execution plan. In this paper we consider the problem of selectivity estimation for inequality predicates based on two attributes, therefore the proposed solution allows to estimate the size of data that satisfy theta-join conditions. The proposed method is based on Discrete Fourier Transform and convolution theorem. DFT spectrums are used as representations of distribution of attribute values. We compute selectivity either performing Inverse DFT (for an inequality condition based on two attributes) or avoiding it (for a single-attribute range one). Selectivity calculation is a time-critical operation performed during an on-line query preparing phase. We show that by applying parallel processing capabilities of Graphical Processing Unit, the implementation of the method satisfies the assumed time constraint.
One of the key benefits of moving an application to the cloud is the ability to easy scale horizo... more One of the key benefits of moving an application to the cloud is the ability to easy scale horizontally when the workload increases. Many cloud providers offer a mechanism of auto scaling which dynamically adjusts the number of virtual server instances, on which given system is running, according to some basic resource-based metrics like CPU utilization. In this work, we propose a model of auto scaling which is based on timing statistics: a high order quantile and a mean value, which are calculated from custom metrics, like execution time of a user request, gathered on application level. Inputs to the model are user defined values of those custom metrics. We developed software module that controls a number of virtual server instances according to both auto scaling models and conducted experiments that show our model based on custom metrics can perform better, while it uses less instances and still maintains assumed time constraints.
Advances in Intelligent Systems and Computing, 2019
In query optimization theory a selectivity parameter is used by cost query optimizer for early es... more In query optimization theory a selectivity parameter is used by cost query optimizer for early estimating the size of data that satisfies a query condition. It requires some representation of distribution of attribute values. There are many approximate representations of m–d distribution where the copula-based is new one. This approach gives a possibility to take into account the fact of a varying m–d distribution by predicting both a copula and 1–d marginal distributions. In this paper we propose the method of forecasting trajectories of either copula parameters and marginals’ quantiles using time series prediction models. This method is mainly designated for predicting outdated distribution representation what may improve accuracy of selectivity estimation based on such representation. It also may be used for predicting a varying query workload to forecast important regions of data domain. Having detected such regions we may improve there the resolution of distribution representation.
Studia Informatica, 2010
Interceptor TestInterceptor register command
Journal of Computational Science
Scientific Data, Feb 7, 2023
the ability to uncover characteristics based on empirical measurement is an important step in und... more the ability to uncover characteristics based on empirical measurement is an important step in understanding the underlying system that gives rise to an observed time series. this is especially important for biological signals whose characteristic contributes to the underlying dynamics of the physiological processes. therefore, by studying such signals, the physiological systems that generate them can be better understood. The datasets presented consist of 33,000 time series of 15 dynamical systems (five chaotic and ten non-chaotic) of the first, second, or third order. Here, the order of a dynamical system means its dimension. The non-chaotic systems were divided into the following classes: periodic, quasi-periodic, and non-periodic. The aim is to propose datasets for machine learning methods, in particular deep learning techniques, to analyze unknown dynamical system characteristics based on obtained time series. In technical validation, three classifications experiments were conducted using two types of neural networks with long short-term memory modules and convolutional layers.
Lecture Notes in Computer Science, 2022
Applied Sciences
In this paper, the authors, based on a case study of the Polish healthcare IT system being deploy... more In this paper, the authors, based on a case study of the Polish healthcare IT system being deployed to the cloud, show the possibilities for limiting the computing resources consumption of rarely used services. The architecture of today’s developed application systems is often based on the architectural style of microservices, where individual groups of services are deployed independently of each other. This is also the case with the system under discussion. Most often, the nature of the workload of each group of services is different, which creates some challenges but also provides opportunities to make optimizations in the consumption of computing resources, thus lowering the environmental footprint and at the same time gaining measurable financial benefits. Unlike other scaling methods, such as those based on MDP and reinforcement learning in particular, which focus on system load prediction, in this paper, the authors propose a reactive approach in which any, even unpredictable,...
Studia Informatica, 2013
jest parametrem wyznaczanym przez bazodanowy optymalizator zapytań w celu wczesnego oszacowania r... more jest parametrem wyznaczanym przez bazodanowy optymalizator zapytań w celu wczesnego oszacowania rozmiaru danych spełniających warunek zapytania. Jest to czynność niezbędna do znalezienia optymalnego planu wykonania zapytania. Selektywność jest na ogół oszacowywana na podstawie histogramów, które są nieparametrycznymi estymatorami rozkładów wartości atrybutów. Wyznaczanie selektywności dla zapytań z warunkiem selekcji opartym na kilku atrybutach wymaga wykorzystania wielowymiarowego histogramu estymującego łączny rozkład wartości atrybutów. Dokładność histogramów wielowymiarowych spada wraz ze wzrostem liczby wymiarów, co jest powszechnie znane pod nazwą problemu przekleństwa wymiarowości. Natomiast jednowymiarowe histogramy zbudowane dla pojedynczych atrybutów, które charakteryzują rozkład brzegowy, opisują ten jednowymiarowy rozkład dokładniej, ale oczywiście nie opisują zależności pomiędzy atrybutami. W niniejszym artykule zaproponowano metodę wyznaczania selektywności, opartą na histogramach opisujących zarówno rozkład łączny, jak i rozkłady brzegowe. Zaproponowana metoda (nazwana M2HSE) dotyczy pewnej klasy zapytań, w których zakresowy warunek selekcji oparty jest na wielu atrybutach. Dla takich zapytań przedstawiona metoda może pozwolić na wyznaczenie dokładniejszych przybliżeń wartości selektywności niż klasyczne metody, wykorzystujące histogramy opisujące tylko rozkład łączny albo tylko rozkłady brzegowe (gdzie zastosowane jest założenie o niezależności atrybutów).
Entropy, 2019
Analysis of eye movement has attracted a lot of attention recently in terms of exploring areas of... more Analysis of eye movement has attracted a lot of attention recently in terms of exploring areas of people’s interest, cognitive ability, and skills. The basis for eye movement usage in these applications is the detection of its main components—namely, fixations and saccades, which facilitate understanding of the spatiotemporal processing of a visual scene. In the presented research, a novel approach for the detection of eye movement events is proposed, based on the concept of approximate entropy. By using the multiresolution time-domain scheme, a structure entitled the Multilevel Entropy Map was developed for this purpose. The dataset was collected during an experiment utilizing the “jumping point” paradigm. Eye positions were registered with a 1000 Hz sampling rate. For event detection, the knn classifier was applied. The best classification efficiency in recognizing the saccadic period ranged from 83% to 94%, depending on the sample size used. These promising outcomes suggest that ...
Entropy, 2019
Analysis of eye movement has attracted a lot of attention recently in terms of exploring areas of... more Analysis of eye movement has attracted a lot of attention recently in terms of exploring areas of people’s interest, cognitive ability, and skills. The basis for eye movement usage in these applications is the detection of its main components—namely, fixations and saccades, which facilitate understanding of the spatiotemporal processing of a visual scene. In the presented research, a novel approach for the detection of eye movement events is proposed, based on the concept of approximate entropy. By using the multiresolution time-domain scheme, a structure entitled the Multilevel Entropy Map was developed for this purpose. The dataset was collected during an experiment utilizing the “jumping point” paradigm. Eye positions were registered with a 1000 Hz sampling rate. For event detection, the knn classifier was applied. The best classification efficiency in recognizing the saccadic period ranged from 83% to 94%, depending on the sample size used. These promising outcomes suggest that ...
Bulletin of the Polish Academy of Sciences Technical Sciences, 2014
Obtaining the optimal query execution plan requires a selectivity estimation. The selectivity val... more Obtaining the optimal query execution plan requires a selectivity estimation. The selectivity value allows to predict the size of a query result. This lets choose the best method of query execution. There are many selectivity obtaining methods that are based on different types of estimators of attribute values distribution (commonly they are based on histograms). The adaptive method, proposed in this paper, uses either attribute values distribution or range query condition boundaries one. The new type of histogram - the Query-Conditional-Aware V-optimal one (QCA-V-optimal) - is proposed as a non-parametric estimator of a probability density function of attribute values distribution. This histogram also takes into account information about already processed queries. This information is represented by the 1-dimensional Query Condition Distribution histogram (HQCD) which is an estimator of the include function PI which is also introduced in this paper. PI describes so-called regions of...
Lecture Notes in Computer Science, 2014
Selectivity estimation is a parameter used by a query optimizer for early estimation of the size ... more Selectivity estimation is a parameter used by a query optimizer for early estimation of the size of data that satisfies query condition. Selectivity is calculated using an estimator of distribution of attribute values of attribute involved in a processed query condition. Histograms built on attributes values from a database may be such representation of the distribution. The paper introduces a new querydistribution-aware V-optimal histogram which is useful in selectivity estimation for a range query. It takes into account either a 1-D distribution of attribute values or a 2-D distribution of boundaries of already processed queries. The advantages of qda-V-optimal histogram appears when it is applied for selectivity estimation of range query conditions that form so-called hot regions. To obtain the proposed error-optimal histogram we use dynamic programming method, Fuzzy C-Means clustering of a set of range boundaries.
Communications in Computer and Information Science, 2014
ABSTRACT The paper considers the problem of prediction of a probability distribution. We take int... more ABSTRACT The paper considers the problem of prediction of a probability distribution. We take into account an extrapolation model based on evolution of quantiles. We may use any concrete model which allows to track and extrapolate boundaries of buckets of an equi-height histogram. This histogram with p + 1 boundaries is equivalent to p-quantiles. Using such baseline extrapolation model we may obtain lines of locations of bucket boundaries that may intersect in future. To avoid intersections and to extend (in time) correctness of the results, we propose to use a model of continuous dynamical system with viscous resistance forces for obtaining improved lines of locations. The proposed model allows to obtain lines with unchanged shapes or very similar ones (comparing to the results from the baseline extrapolation model) but without any intersections. This approach will be helpful when a previously used baseline extrapolation model is too much time limited. The work was inspired by the problem of prediction of an attribute value distribution used for query selectivity estimation. However, the proposed method may be applied not only in query optimization problem
Przeglądowy artykuł opisuje metody estymacji selektywności pewnej klasy zapytań ze złożonymi waru... more Przeglądowy artykuł opisuje metody estymacji selektywności pewnej klasy zapytań ze złożonymi warunkami selekcji. Proste metody, wykorzystywane komercyjnie, zakładają pewne uproszczenie – niezależność wartości atrybutów tablic. Inne, dokładniejsze, bazują na estymacji wielowymiarowego rozkładu wartości atrybutów. Prezentowane, zaawansowane metody wykorzystują transformaty kosinusową i falkową dla efektywnego wyznaczania selektywności opierając się na stratnie skompresowanym widmie częstości wartości atrybutów.
Artykuł prezentuje wyniki wydajnościowej analizy programów przeznaczonych do symulacji ciągłych u... more Artykuł prezentuje wyniki wydajnościowej analizy programów przeznaczonych do symulacji ciągłych układów dynamicznych, utworzonych z wykorzystaniem modułu Parallel Extensions to .NET Framework. Przedmiotem rozważanym w artykule jest modelowanie ruchu układów ciał w polu grawitacyjnym. W pracy pokazano zalety zrównoleglonych programów, zbudowanych na podstawie technologii .NET. W artykule przedstawiono wydajnościowe porównania zaproponowanego rozwiązania do rozwiązań sekwencyjnych: skryptów systemu MATLAB i programów jednowątkowych, wykonanych w technologii .NET, uruchamianych na komputerach z procesorami wielordzeniowymi. W pracy rozważono zagadnienie skalowalności zaproponowanego rozwiązania.
Advances in Intelligent Systems and Computing, 2017
Selectivity is a parameter used by a query optimizer for estimating the size of data that satisfi... more Selectivity is a parameter used by a query optimizer for estimating the size of data that satisfies a query condition. Calculation of selectivity requires some representation of distribution of attribute values. Commonly, one-dimensional histograms that describe distributions of single attribute are used in DBMSes. A multidimensional (m-d) representation is required for complex queries with a range selection condition based on many attributes. Storing m-d representation directly (e.g. m-d histogram) is very space consuming for high dimensions hence the copula-based approach is proposed where we only need to store a few parameters. By using very few parameters of copula we achieve the method more accurate in selectivity estimation than the method based on attribute values independence which is commonly used by database management systems. The paper presents a software module which provides the copula-based method of selectivity estimation for a m-d range query. The presented solution is based on R Serve and it is integrated with Oracle DBMS. Some additional advantages of the module result from caching selectivities values for similar conditions are shown.
Computer Networks, 2017
Elements of cloud infrastructure like load balancers, instances of virtual server (service nodes)... more Elements of cloud infrastructure like load balancers, instances of virtual server (service nodes), storage services are used in an architecture of modern cloud-enabled systems. Auto scaling is a mechanism which allows to on-line adapt efficiency of a system to current load. It is done by increasing or decreasing number of running instances. Auto scaling model uses a statistics based on a standard metrics like CPU Utilization or a custom metrics like execution time of selected business service. By horizontal scaling, the model should satisfy Quality of Service requirements (QoS). QoS requirements are determined by criteria based on statistics defined on metrics. The auto scaling model should minimize the cost (mainly measured by the number of used instances) subject to an assumed QoS requirements. There are many reactive (on current load) and predictive (future load) approaches to the model of auto scaling. In this paper we propose some extensions to the concrete reactive auto scaling model to improve sensitivity to load changes. We introduce the extension which varying threshold of CPU Utilization in scaling-out policy. We extend the model by introducing randomized method in scaling-in policy.
Advances in Intelligent Systems and Computing, 2015
ABSTRACT Selectivity factor is obtained by database query optimizer for estimating the size of da... more ABSTRACT Selectivity factor is obtained by database query optimizer for estimating the size of data that satisfy a query condition. This allows to choose the optimal query execution plan. In this paper we consider the problem of selectivity estimation for inequality predicates based on two attributes, therefore the proposed solution allows to estimate the size of data that satisfy theta-join conditions. The proposed method is based on Discrete Fourier Transform and convolution theorem. DFT spectrums are used as representations of distribution of attribute values. We compute selectivity either performing Inverse DFT (for an inequality condition based on two attributes) or avoiding it (for a single-attribute range one). Selectivity calculation is a time-critical operation performed during an on-line query preparing phase. We show that by applying parallel processing capabilities of Graphical Processing Unit, the implementation of the method satisfies the assumed time constraint.
One of the key benefits of moving an application to the cloud is the ability to easy scale horizo... more One of the key benefits of moving an application to the cloud is the ability to easy scale horizontally when the workload increases. Many cloud providers offer a mechanism of auto scaling which dynamically adjusts the number of virtual server instances, on which given system is running, according to some basic resource-based metrics like CPU utilization. In this work, we propose a model of auto scaling which is based on timing statistics: a high order quantile and a mean value, which are calculated from custom metrics, like execution time of a user request, gathered on application level. Inputs to the model are user defined values of those custom metrics. We developed software module that controls a number of virtual server instances according to both auto scaling models and conducted experiments that show our model based on custom metrics can perform better, while it uses less instances and still maintains assumed time constraints.
Advances in Intelligent Systems and Computing, 2019
In query optimization theory a selectivity parameter is used by cost query optimizer for early es... more In query optimization theory a selectivity parameter is used by cost query optimizer for early estimating the size of data that satisfies a query condition. It requires some representation of distribution of attribute values. There are many approximate representations of m–d distribution where the copula-based is new one. This approach gives a possibility to take into account the fact of a varying m–d distribution by predicting both a copula and 1–d marginal distributions. In this paper we propose the method of forecasting trajectories of either copula parameters and marginals’ quantiles using time series prediction models. This method is mainly designated for predicting outdated distribution representation what may improve accuracy of selectivity estimation based on such representation. It also may be used for predicting a varying query workload to forecast important regions of data domain. Having detected such regions we may improve there the resolution of distribution representation.