Mariza Ferro - Academia.edu (original) (raw)
Papers by Mariza Ferro
ABSTRACT Grid computing systems are extremely large and complex so, manually dealing with its fai... more ABSTRACT Grid computing systems are extremely large and complex so, manually dealing with its failures becomes impractical. Recently, it has been proposed that the systems themselves should manage their own failures or malfunctions. This is referred as self-healing. To deal with this challenging, is required to predict and control the process through a number of automated learning and proactive actions. In this paper, we proposed inductive logic programming, a relational machine learning method, for prediction and root causal analysis that makes it possible the development of a self-healing component.
Concurrency and Computation: Practice and Experience, Jun 1, 2023
SummaryMuch has been discussed about artificial intelligence's negative environmental impacts... more SummaryMuch has been discussed about artificial intelligence's negative environmental impacts due to its power‐hungry Machine Learning algorithms and emissions linked to this. This work discusses three direct impacts of AI on energy consumption associated with computation: the software, the hardware, and the energy source's carbon intensity. We present an up‐to‐date revision of the literature and assess it through experiments. For hardware, we evaluate the use of ARM‐based single‐board computers for training Machine Learning algorithms. An experimental setup was developed training the algorithm XGBoost and its cost‐effectiveness (energy consumption, acquisition cost, and execution time) compared with the X86‐64 and GPU architectures and other algorithms. In addition, the is estimated for these experiments and compared for three energy sources. The results show that this type of architecture can become a viable and greener alternative, not only for inference but also for training these algorithms. Finally, we evaluated low precision for training Random Forest algorithms with different datasets for the software aspect. Results show that is possible energy reduction with no decrease in accuracy.
Simpósio de Pesquisa Operacional e Logística da Marinha - Publicação Online, May 1, 2020
As the high processing computing becomes even more critical for scientific research across variou... more As the high processing computing becomes even more critical for scientific research across various fields, increasing performance without raising the energy consumption levels becomes an essential task in order to warrant the financial viability of exascale systems. This work presents the first step towards understanding how the many computational requirements of benchmark applications relate to the overall runtime through a machine learning model and how that can be used for the development of an autonomous framework capable of scaling applications to have an optimal trade-off between performance and energy consumption.
Bioinformatics could greatly benefit from increased computational resources delivered by High Per... more Bioinformatics could greatly benefit from increased computational resources delivered by High Performance Computing. However, the decision-making about which is the best architecture to deliver good performance for a set of Bioinformatics applications is a hard task. The traditional way is finding the architecture with a high theoretical peak of performance, obtained with benchmark tests. But, this is not an assured way for this decision, because each application of Bioinformatics has different computational requirements, which frequently are much different from usual benchmarks. We developed a methodology that assists researchers, even when their specialty is not high performance computing, to define the best computational infrastructure focused on their set of scientific application requirements. For this purpose, the methodology enables to define representative evaluation tests, including a model to define the correct benchmark, when the tests endorsed by the methodology could not be fully used. Further, a Gain Function allows a reliable decision-making based on the performances of a set of applications and architectures. It is also possible to consider the relative importance between applications and also between cost and performance.
Springer eBooks, 2015
ABSTRACT The increased use of virtualized environments has led to numerous research efforts about... more ABSTRACT The increased use of virtualized environments has led to numerous research efforts about the possibilities and restrictions of the use of these virtualized environments in cloud computing or for resource consolidation. However, most of these studies are limited to a level of performance analysis, that does not address the effects of concurrency among the various virtual environments, and how to mitigate these effects. The study presented below proposes the concept of affinity, based on the correct combination of certain applications classes, that are able to share the same environment, at the same time, causing less loss of performance. The results show that there are combinations of applications that could share the same environment with minimum loss, but there are combinations that must be avoided. This study also shows the influence of the type of parallel library used for the implementation of these applications
In this work, the use of ARM-based single-board computers is evaluated for training Machine Learn... more In this work, the use of ARM-based single-board computers is evaluated for training Machine Learning (ML) algorithms. For this, an experimental setup was developed, training the algorithm XGBoost with 36 hyperparameter configurations in four different architectures. Furthermore, its efficiency (energy consumption, acquisition cost and execution time) was compared with the main architectures used for the training of ML (x86 and GPU). The results show that this type of architecture can become a viable and greener alternative, not only for inference but also for the training phase of these algorithms. Resumo. Neste trabalhoé avaliado o uso de placas single-board computers baseadas em ARM para o treinamento de algoritmos de Aprendizado de Máquina (AM). Foi desenvolvido um conjunto experimental treinando o algoritmo XGBoost com 36 configurações de hiperparâmetros em quatro arquiteturas diferentes. Além disso, foi comparado a sua eficiência (consumo energético, custo de aquisição e tempo de execução) com as principais arquiteturas usadas no treinamento de algoritmos de AM (x86 e GPU). Os resultados mostram que este tipo de arquitetura pode se tornar uma alternativa viável e mais verde, não apenas para a inferência, mas também para a fase de treinamento desses algoritmos.
Performance and energy efficiency are now critical concerns in high performance scientific comput... more Performance and energy efficiency are now critical concerns in high performance scientific computing. It is expected that requirements of the scientific problem should guide the orchestration of different techniques of energy saving, in order to improve the balance between energy consumption and application performance. To enable this balance, we propose the development of an autonomous framework to make this orchestration and present the ongoing research to this development, more specifically, focusing in the characterization of the scientific applications and the performance modeling tasks using Machine Learning. Resumo. Alcançar altos níveis de desempenho com eficiência energética se tornou um grande desafio para a computação científica de alto desempenho. Para contornar esse desafio, espera-se que os próprios requisitos do problema científico orientem a orquestração de diferentes mecanismos de economia de energia, a fim de melhorar o equilíbrio entre o consumo de energia e o desempenho das aplicações. Para isso,é proposto o desenvolvimento de um framework autonômico para fazer essa orquestração. Neste trabalho são apresentadas as pesquisas em andamento para este desenvolvimento, mais especificamente, com foco na caracterização das aplicações científicas e das tarefas de modelagem de desempenho e consumo de energia utilizando técnicas de Aprendizado de Máquina.
Zenodo (CERN European Organization for Nuclear Research), Jul 21, 2017
GPUs has been widely used in scientific computing, as by offering exceptional performance as by p... more GPUs has been widely used in scientific computing, as by offering exceptional performance as by power-efficient hardware. Its position established in high-performance and scientific computing communities has increased the urgency of understanding the power cost of GPU usage in accurate measurements. For this, the use of internal sensors are extremely important. In this work, we employ the GPU sensors to obtain high-resolution power profiles of real and benchmark applications. We wrote our own tools to query the sensors of two NVIDIA GPUs from different generations and compare the accuracy of them. Also, we compare the power profile of GPU with CPU using IPMItool.
O conteúdo dos artigos e seus dados em sua forma, correção e confiabilidade são de responsabilida... more O conteúdo dos artigos e seus dados em sua forma, correção e confiabilidade são de responsabilidade exclusiva dos autores. 2019 Permitido o download da obra e o compartilhamento desde que sejam atribuídos créditos aos autores, mas sem a possibilidade de alterá-la de nenhuma forma ou utilizá-la para fins comerciais.
In this work, due to the growing demand for computational resources and energy limitations, a per... more In this work, due to the growing demand for computational resources and energy limitations, a performance evaluation methodology is proposed based on theoretical and practical parameters of the Roofline model and using its graph that is part of tools that implement this model. This methodology allows identifying performance patterns in applications, their main computational requirements, factors that limit performance and suggesting the best architecture to run an application. Experiments were developed, focusing on the evaluation of Machine Learning algorithms, where the proposed methodology is evaluated and shown to be effective. Resumo. Neste trabalho, devido a crescente demanda por recursos computacionais e limitações energéticas,é proposta uma metodologia de avaliação de desempenho com base em parâmetros teóricos e práticos do modelo Roofline e usando o gráfico bidimensional que faz parte de ferramentas que implementam este modelo. Essa metodologia permite identificar padrões de desempenho nas aplicações, seus principais requisitos computacionais, fatores que limitam o desempenho e sugerir a melhor arquitetura para executar uma aplicação. Foram desenvolvidos experimentos, com foco na avaliação de algoritmos de Aprendizado de Máquina, onde a metodologia propostaé avaliada se mostrando efetiva.
O principal objetivo deste trabalhoé a avaliação de duas meta-heurísticas inspiradas na natureza,... more O principal objetivo deste trabalhoé a avaliação de duas meta-heurísticas inspiradas na natureza, Algoritmos Genéticos e Colônia de Formigas, para o desenvolvimento de uma aplicação que possa gerar rotas otimizadas para aeronaves, atendendoàs exigências da Marinha do Brasil. Este trabalho apresenta os métodos desenvolvidos, obedecendo a duas restrições principais: a mobilidade dos pontos de checagem e a autonomia limitada das aeronaves. Apresenta também os resultados dos testes realizados com os métodos desenvolvidos e uma avaliação de seus desempenhos.
Understanding the computational impact of scientific applications on computational architectures ... more Understanding the computational impact of scientific applications on computational architectures through runtime should guide the use of computational resources in high-performance computing systems. In this work, we propose an analysis of Machine Learning (ML) algorithms to gather knowledge about the performance of these applications through hardware events and derived performance metrics. Nine NAS benchmarks were executed and the hardware events were collected. These experimental results were used to train a Neural Network, a Decision Tree Regressor and a Linear Regression focusing on predicting the runtime of scientific applications according to the performance metrics.
The performance evaluation in HPC, understanding the computational requirements of scientific app... more The performance evaluation in HPC, understanding the computational requirements of scientific applications, its relation with power consumption is a fundamental task to overcome the current barriers and to achieve the computational exascale. However, this imposes some challenging tasks, such as to monitor a wide range of parameters in heterogeneous environments, to enable fine grained profiling and power consumed across different components, to be language independent and to avoid code instrumentation. Considering these challenges, this work proposes the SMCis, an application monitoring tool developed with the goal of to collect all these aspects in an effective and accurate way, as well as to correlate these data graphically, with the environment of analysis and visualization.
Lecture Notes in Computer Science, 2017
This short paper proposes two novel methodologies for analyzing scientific applications in distri... more This short paper proposes two novel methodologies for analyzing scientific applications in distributed environments, using workload requirements. The first explores the impact of features such as problem size and programming language, over different computational architectures. The second explores the impact of mapping virtual cluster resources on the performance of parallel applications.
arXiv (Cornell University), Feb 1, 2016
Scientific Computing typically requires large computational needs which have been addressed with ... more Scientific Computing typically requires large computational needs which have been addressed with High Performance Distributed Computing. It is essential to efficiently deploy a number of complex scientific applications, which have different characteristics, and so require distinct computational resources too. However, in many research laboratories, this high performance architecture is not dedicated. So, the architecture must be shared to execute a set of scientific applications, with so many different execution times and relative importance to research. Also, the high performance architectures have different characteristics and costs. When a new infrastructure has to be acquired to meet the needs of this scenario, the decision-making is hard and complex. In this work, we present a Gain Function as a model of an utility function, with which it is possible a decision-making with confidence. With the function is possible to evaluate the best architectural option taking into account aspects of applications and architectures, including the executions time, cost of architecture, the relative importance of each application and also the relative importance of performance and cost on the final evaluation. This paper presents the Gain Function, examples, and a real case showing their applicabilities.
arXiv (Cornell University), Dec 3, 2014
High Performance Distributed Computing is essential to boost scientific progress in many areas of... more High Performance Distributed Computing is essential to boost scientific progress in many areas of science and to efficiently deploy a number of complex scientific applications. These applications have different characteristics that require distinct computational resources too. In this work we propose a systematic performance evaluation methodology. The focus of our methodology begins on scientific application characteristics, and then considers how these characteristics interact with the problem size, with the programming language and finally with a specific computational architecture. The computational experiments developed highlight this model of evaluation and indicate that optimal performance is found when we evaluate a combination of application class, program language, problem size and architecture model. 1. Introduction Scientific computing involves the construction of mathematical models and numerical solution techniques to solve complex scientific and engineering problems. These models often require a huge processing capacity in computer resources to perform large scale experiments within a reasonable time frame. These needs have been addressed with High Performance Parallel and Distributed Computing (HPDC), which allows many scientific domains leverage progress. However, it is very difficult for many research groups to evaluate these HPDC infrastructures and arrive at the best configuration to run their scientific applications. Usually, optimal configurations are searched by executing one of the existing benchmark suites, widely used for performance evaluation. Benchmarks are good for comparisons between computational architectures, but they are not the best approach for evaluating if an architecture is adequate for a set of scientific applications. Evaluations using traditional benchmarks, return a single performance number that corresponds to, for example, the maximum number of floating-point operations per second. In contrast with those applications which typically are floating-point intensive, many scientific applications do not corresponds to this model, and even the workload often used. In other words, traditional benchmark evaluations generally, don't consider the actual set of applications that will be used. However, each application has different system requirements (e.g., memory bound, I/O bound and CPU bound) and so it requires different computational resources. Thus, within the performance evaluation methodology proposed in this work, in order to achieve adequate performance evaluation it is necessary first to consider the characteristic of the scientific application that will be used in the HPDC architecture, under conditions as real as possible. In this way, the parameters evaluated differ from those usually evaluated when the focus is performance optimization. The parameters evaluated, that we refer to here as Essential Elements of Analysis (EEA), are application's class, execution time, programming language, problem size/workload, average memory time and percentage of memory, CPU and I/O usage; in contrast with Flops/s, cache miss rate and cache hit rate. The methodology under development comprises several phases and dozens of steps that enable researchers to evaluate which is the best HPDC configuration for their scientific applications set. The development of our methodology is rooted in two concepts: the first one is Operational Analysis (OA) [1], which is the foundational basis for our methodology. OA involves a sequence of phases and steps that aim to determine the performance of a system under the most realistic operational conditions. The second one is the Dwarfs of scientific computation, developed by Colella [2] and Berkeley team [3], that enable the application requirement characterization. Each Dwarf class characterizes applications by common requirements in terms of computation and data movement. Although the methodology under development comprises several phases and steps, in this work we briefly describe the overall methodology, describing in detail one of these steps with the experimental setup that enabled its development. The experimental results highlight how different interactions among the EEA, such as application's class and computational architectures, can deliver performance results that are completely diverse. The proposed methodology is supported by these results, and it is presented in the next sections of this work. The remainder of the paper is organized as follows: In Section 2 we discuss the scientific landscape and how applications can be categorized in classes using Dwarf taxonomy. In Section 3 we discuss related work. In Section 4 we discuss the traditional performance evaluation paradigm followed by our proposal for performance evaluation. Section 5 outlines our experimental setup and results. Section 6 concludes de paper and briefly discusses future work. 2. Applications and Dwarfs With the aim of categorizing the styles of computation seen in scientific computing, the work of Colella [2] identified seven numerical methods that he believed to be important for science and engineering and introduced the "Seven Dwarfs" of scientific computing. These Dwarfs are defined at a high level of abstraction to explain their behavior across different HPDC applications, and each class of Dwarfs shows similarities in computation and communication. According to his definition, applications of a particular class can be implemented differently with the change in numerical methods over time, but the underlying patterns have remained the same over generations of change and will remain the same in future implementations. These dwarfs were neither particular software applications nor were they small benchmark kernels. Instead, they represented entire families of computation with common computational properties. The Berkeley team in parallel computation extended this classification to thirteen Dwarfs after they examined important application domains. They were interested in applying Dwarfs to a broader number of computational methods and investigating how well the Dwarfs could capture computation and communication patterns for a large range of applications.
This work aims to analyze the aspects related to the performance, without losing focus on the ene... more This work aims to analyze the aspects related to the performance, without losing focus on the energy efficiency of the applications. To this end, we evaluated a representative set of experiments with three renowned benchmarks and two real world applications used by the energy industry. These experiments used a range of environments that included a medium scale HPC system with Xeon and CUDA cores and a mobile based Jetson TX2 development board composed of ARMv8 and CUDA cores. Our results enable analysing the performance and power consumption of the selected applications, and help to energy efficiecy in HPC systems.
ABSTRACT Grid computing systems are extremely large and complex so, manually dealing with its fai... more ABSTRACT Grid computing systems are extremely large and complex so, manually dealing with its failures becomes impractical. Recently, it has been proposed that the systems themselves should manage their own failures or malfunctions. This is referred as self-healing. To deal with this challenging, is required to predict and control the process through a number of automated learning and proactive actions. In this paper, we proposed inductive logic programming, a relational machine learning method, for prediction and root causal analysis that makes it possible the development of a self-healing component.
Concurrency and Computation: Practice and Experience, Jun 1, 2023
SummaryMuch has been discussed about artificial intelligence's negative environmental impacts... more SummaryMuch has been discussed about artificial intelligence's negative environmental impacts due to its power‐hungry Machine Learning algorithms and emissions linked to this. This work discusses three direct impacts of AI on energy consumption associated with computation: the software, the hardware, and the energy source's carbon intensity. We present an up‐to‐date revision of the literature and assess it through experiments. For hardware, we evaluate the use of ARM‐based single‐board computers for training Machine Learning algorithms. An experimental setup was developed training the algorithm XGBoost and its cost‐effectiveness (energy consumption, acquisition cost, and execution time) compared with the X86‐64 and GPU architectures and other algorithms. In addition, the is estimated for these experiments and compared for three energy sources. The results show that this type of architecture can become a viable and greener alternative, not only for inference but also for training these algorithms. Finally, we evaluated low precision for training Random Forest algorithms with different datasets for the software aspect. Results show that is possible energy reduction with no decrease in accuracy.
Simpósio de Pesquisa Operacional e Logística da Marinha - Publicação Online, May 1, 2020
As the high processing computing becomes even more critical for scientific research across variou... more As the high processing computing becomes even more critical for scientific research across various fields, increasing performance without raising the energy consumption levels becomes an essential task in order to warrant the financial viability of exascale systems. This work presents the first step towards understanding how the many computational requirements of benchmark applications relate to the overall runtime through a machine learning model and how that can be used for the development of an autonomous framework capable of scaling applications to have an optimal trade-off between performance and energy consumption.
Bioinformatics could greatly benefit from increased computational resources delivered by High Per... more Bioinformatics could greatly benefit from increased computational resources delivered by High Performance Computing. However, the decision-making about which is the best architecture to deliver good performance for a set of Bioinformatics applications is a hard task. The traditional way is finding the architecture with a high theoretical peak of performance, obtained with benchmark tests. But, this is not an assured way for this decision, because each application of Bioinformatics has different computational requirements, which frequently are much different from usual benchmarks. We developed a methodology that assists researchers, even when their specialty is not high performance computing, to define the best computational infrastructure focused on their set of scientific application requirements. For this purpose, the methodology enables to define representative evaluation tests, including a model to define the correct benchmark, when the tests endorsed by the methodology could not be fully used. Further, a Gain Function allows a reliable decision-making based on the performances of a set of applications and architectures. It is also possible to consider the relative importance between applications and also between cost and performance.
Springer eBooks, 2015
ABSTRACT The increased use of virtualized environments has led to numerous research efforts about... more ABSTRACT The increased use of virtualized environments has led to numerous research efforts about the possibilities and restrictions of the use of these virtualized environments in cloud computing or for resource consolidation. However, most of these studies are limited to a level of performance analysis, that does not address the effects of concurrency among the various virtual environments, and how to mitigate these effects. The study presented below proposes the concept of affinity, based on the correct combination of certain applications classes, that are able to share the same environment, at the same time, causing less loss of performance. The results show that there are combinations of applications that could share the same environment with minimum loss, but there are combinations that must be avoided. This study also shows the influence of the type of parallel library used for the implementation of these applications
In this work, the use of ARM-based single-board computers is evaluated for training Machine Learn... more In this work, the use of ARM-based single-board computers is evaluated for training Machine Learning (ML) algorithms. For this, an experimental setup was developed, training the algorithm XGBoost with 36 hyperparameter configurations in four different architectures. Furthermore, its efficiency (energy consumption, acquisition cost and execution time) was compared with the main architectures used for the training of ML (x86 and GPU). The results show that this type of architecture can become a viable and greener alternative, not only for inference but also for the training phase of these algorithms. Resumo. Neste trabalhoé avaliado o uso de placas single-board computers baseadas em ARM para o treinamento de algoritmos de Aprendizado de Máquina (AM). Foi desenvolvido um conjunto experimental treinando o algoritmo XGBoost com 36 configurações de hiperparâmetros em quatro arquiteturas diferentes. Além disso, foi comparado a sua eficiência (consumo energético, custo de aquisição e tempo de execução) com as principais arquiteturas usadas no treinamento de algoritmos de AM (x86 e GPU). Os resultados mostram que este tipo de arquitetura pode se tornar uma alternativa viável e mais verde, não apenas para a inferência, mas também para a fase de treinamento desses algoritmos.
Performance and energy efficiency are now critical concerns in high performance scientific comput... more Performance and energy efficiency are now critical concerns in high performance scientific computing. It is expected that requirements of the scientific problem should guide the orchestration of different techniques of energy saving, in order to improve the balance between energy consumption and application performance. To enable this balance, we propose the development of an autonomous framework to make this orchestration and present the ongoing research to this development, more specifically, focusing in the characterization of the scientific applications and the performance modeling tasks using Machine Learning. Resumo. Alcançar altos níveis de desempenho com eficiência energética se tornou um grande desafio para a computação científica de alto desempenho. Para contornar esse desafio, espera-se que os próprios requisitos do problema científico orientem a orquestração de diferentes mecanismos de economia de energia, a fim de melhorar o equilíbrio entre o consumo de energia e o desempenho das aplicações. Para isso,é proposto o desenvolvimento de um framework autonômico para fazer essa orquestração. Neste trabalho são apresentadas as pesquisas em andamento para este desenvolvimento, mais especificamente, com foco na caracterização das aplicações científicas e das tarefas de modelagem de desempenho e consumo de energia utilizando técnicas de Aprendizado de Máquina.
Zenodo (CERN European Organization for Nuclear Research), Jul 21, 2017
GPUs has been widely used in scientific computing, as by offering exceptional performance as by p... more GPUs has been widely used in scientific computing, as by offering exceptional performance as by power-efficient hardware. Its position established in high-performance and scientific computing communities has increased the urgency of understanding the power cost of GPU usage in accurate measurements. For this, the use of internal sensors are extremely important. In this work, we employ the GPU sensors to obtain high-resolution power profiles of real and benchmark applications. We wrote our own tools to query the sensors of two NVIDIA GPUs from different generations and compare the accuracy of them. Also, we compare the power profile of GPU with CPU using IPMItool.
O conteúdo dos artigos e seus dados em sua forma, correção e confiabilidade são de responsabilida... more O conteúdo dos artigos e seus dados em sua forma, correção e confiabilidade são de responsabilidade exclusiva dos autores. 2019 Permitido o download da obra e o compartilhamento desde que sejam atribuídos créditos aos autores, mas sem a possibilidade de alterá-la de nenhuma forma ou utilizá-la para fins comerciais.
In this work, due to the growing demand for computational resources and energy limitations, a per... more In this work, due to the growing demand for computational resources and energy limitations, a performance evaluation methodology is proposed based on theoretical and practical parameters of the Roofline model and using its graph that is part of tools that implement this model. This methodology allows identifying performance patterns in applications, their main computational requirements, factors that limit performance and suggesting the best architecture to run an application. Experiments were developed, focusing on the evaluation of Machine Learning algorithms, where the proposed methodology is evaluated and shown to be effective. Resumo. Neste trabalho, devido a crescente demanda por recursos computacionais e limitações energéticas,é proposta uma metodologia de avaliação de desempenho com base em parâmetros teóricos e práticos do modelo Roofline e usando o gráfico bidimensional que faz parte de ferramentas que implementam este modelo. Essa metodologia permite identificar padrões de desempenho nas aplicações, seus principais requisitos computacionais, fatores que limitam o desempenho e sugerir a melhor arquitetura para executar uma aplicação. Foram desenvolvidos experimentos, com foco na avaliação de algoritmos de Aprendizado de Máquina, onde a metodologia propostaé avaliada se mostrando efetiva.
O principal objetivo deste trabalhoé a avaliação de duas meta-heurísticas inspiradas na natureza,... more O principal objetivo deste trabalhoé a avaliação de duas meta-heurísticas inspiradas na natureza, Algoritmos Genéticos e Colônia de Formigas, para o desenvolvimento de uma aplicação que possa gerar rotas otimizadas para aeronaves, atendendoàs exigências da Marinha do Brasil. Este trabalho apresenta os métodos desenvolvidos, obedecendo a duas restrições principais: a mobilidade dos pontos de checagem e a autonomia limitada das aeronaves. Apresenta também os resultados dos testes realizados com os métodos desenvolvidos e uma avaliação de seus desempenhos.
Understanding the computational impact of scientific applications on computational architectures ... more Understanding the computational impact of scientific applications on computational architectures through runtime should guide the use of computational resources in high-performance computing systems. In this work, we propose an analysis of Machine Learning (ML) algorithms to gather knowledge about the performance of these applications through hardware events and derived performance metrics. Nine NAS benchmarks were executed and the hardware events were collected. These experimental results were used to train a Neural Network, a Decision Tree Regressor and a Linear Regression focusing on predicting the runtime of scientific applications according to the performance metrics.
The performance evaluation in HPC, understanding the computational requirements of scientific app... more The performance evaluation in HPC, understanding the computational requirements of scientific applications, its relation with power consumption is a fundamental task to overcome the current barriers and to achieve the computational exascale. However, this imposes some challenging tasks, such as to monitor a wide range of parameters in heterogeneous environments, to enable fine grained profiling and power consumed across different components, to be language independent and to avoid code instrumentation. Considering these challenges, this work proposes the SMCis, an application monitoring tool developed with the goal of to collect all these aspects in an effective and accurate way, as well as to correlate these data graphically, with the environment of analysis and visualization.
Lecture Notes in Computer Science, 2017
This short paper proposes two novel methodologies for analyzing scientific applications in distri... more This short paper proposes two novel methodologies for analyzing scientific applications in distributed environments, using workload requirements. The first explores the impact of features such as problem size and programming language, over different computational architectures. The second explores the impact of mapping virtual cluster resources on the performance of parallel applications.
arXiv (Cornell University), Feb 1, 2016
Scientific Computing typically requires large computational needs which have been addressed with ... more Scientific Computing typically requires large computational needs which have been addressed with High Performance Distributed Computing. It is essential to efficiently deploy a number of complex scientific applications, which have different characteristics, and so require distinct computational resources too. However, in many research laboratories, this high performance architecture is not dedicated. So, the architecture must be shared to execute a set of scientific applications, with so many different execution times and relative importance to research. Also, the high performance architectures have different characteristics and costs. When a new infrastructure has to be acquired to meet the needs of this scenario, the decision-making is hard and complex. In this work, we present a Gain Function as a model of an utility function, with which it is possible a decision-making with confidence. With the function is possible to evaluate the best architectural option taking into account aspects of applications and architectures, including the executions time, cost of architecture, the relative importance of each application and also the relative importance of performance and cost on the final evaluation. This paper presents the Gain Function, examples, and a real case showing their applicabilities.
arXiv (Cornell University), Dec 3, 2014
High Performance Distributed Computing is essential to boost scientific progress in many areas of... more High Performance Distributed Computing is essential to boost scientific progress in many areas of science and to efficiently deploy a number of complex scientific applications. These applications have different characteristics that require distinct computational resources too. In this work we propose a systematic performance evaluation methodology. The focus of our methodology begins on scientific application characteristics, and then considers how these characteristics interact with the problem size, with the programming language and finally with a specific computational architecture. The computational experiments developed highlight this model of evaluation and indicate that optimal performance is found when we evaluate a combination of application class, program language, problem size and architecture model. 1. Introduction Scientific computing involves the construction of mathematical models and numerical solution techniques to solve complex scientific and engineering problems. These models often require a huge processing capacity in computer resources to perform large scale experiments within a reasonable time frame. These needs have been addressed with High Performance Parallel and Distributed Computing (HPDC), which allows many scientific domains leverage progress. However, it is very difficult for many research groups to evaluate these HPDC infrastructures and arrive at the best configuration to run their scientific applications. Usually, optimal configurations are searched by executing one of the existing benchmark suites, widely used for performance evaluation. Benchmarks are good for comparisons between computational architectures, but they are not the best approach for evaluating if an architecture is adequate for a set of scientific applications. Evaluations using traditional benchmarks, return a single performance number that corresponds to, for example, the maximum number of floating-point operations per second. In contrast with those applications which typically are floating-point intensive, many scientific applications do not corresponds to this model, and even the workload often used. In other words, traditional benchmark evaluations generally, don't consider the actual set of applications that will be used. However, each application has different system requirements (e.g., memory bound, I/O bound and CPU bound) and so it requires different computational resources. Thus, within the performance evaluation methodology proposed in this work, in order to achieve adequate performance evaluation it is necessary first to consider the characteristic of the scientific application that will be used in the HPDC architecture, under conditions as real as possible. In this way, the parameters evaluated differ from those usually evaluated when the focus is performance optimization. The parameters evaluated, that we refer to here as Essential Elements of Analysis (EEA), are application's class, execution time, programming language, problem size/workload, average memory time and percentage of memory, CPU and I/O usage; in contrast with Flops/s, cache miss rate and cache hit rate. The methodology under development comprises several phases and dozens of steps that enable researchers to evaluate which is the best HPDC configuration for their scientific applications set. The development of our methodology is rooted in two concepts: the first one is Operational Analysis (OA) [1], which is the foundational basis for our methodology. OA involves a sequence of phases and steps that aim to determine the performance of a system under the most realistic operational conditions. The second one is the Dwarfs of scientific computation, developed by Colella [2] and Berkeley team [3], that enable the application requirement characterization. Each Dwarf class characterizes applications by common requirements in terms of computation and data movement. Although the methodology under development comprises several phases and steps, in this work we briefly describe the overall methodology, describing in detail one of these steps with the experimental setup that enabled its development. The experimental results highlight how different interactions among the EEA, such as application's class and computational architectures, can deliver performance results that are completely diverse. The proposed methodology is supported by these results, and it is presented in the next sections of this work. The remainder of the paper is organized as follows: In Section 2 we discuss the scientific landscape and how applications can be categorized in classes using Dwarf taxonomy. In Section 3 we discuss related work. In Section 4 we discuss the traditional performance evaluation paradigm followed by our proposal for performance evaluation. Section 5 outlines our experimental setup and results. Section 6 concludes de paper and briefly discusses future work. 2. Applications and Dwarfs With the aim of categorizing the styles of computation seen in scientific computing, the work of Colella [2] identified seven numerical methods that he believed to be important for science and engineering and introduced the "Seven Dwarfs" of scientific computing. These Dwarfs are defined at a high level of abstraction to explain their behavior across different HPDC applications, and each class of Dwarfs shows similarities in computation and communication. According to his definition, applications of a particular class can be implemented differently with the change in numerical methods over time, but the underlying patterns have remained the same over generations of change and will remain the same in future implementations. These dwarfs were neither particular software applications nor were they small benchmark kernels. Instead, they represented entire families of computation with common computational properties. The Berkeley team in parallel computation extended this classification to thirteen Dwarfs after they examined important application domains. They were interested in applying Dwarfs to a broader number of computational methods and investigating how well the Dwarfs could capture computation and communication patterns for a large range of applications.
This work aims to analyze the aspects related to the performance, without losing focus on the ene... more This work aims to analyze the aspects related to the performance, without losing focus on the energy efficiency of the applications. To this end, we evaluated a representative set of experiments with three renowned benchmarks and two real world applications used by the energy industry. These experiments used a range of environments that included a medium scale HPC system with Xeon and CUDA cores and a mobile based Jetson TX2 development board composed of ARMv8 and CUDA cores. Our results enable analysing the performance and power consumption of the selected applications, and help to energy efficiecy in HPC systems.