Parallel Hybrid Testing Tool for Applications Developed by Using MPI + OpenACC Dual-Programming Model (original) (raw)
Related papers
ACC_TEST: Hybrid Testing Approach for OpenACC Based Programs
IEEE Access
In recent years, OpenACC has been used in many supercomputers and attracted many noncomputer science specialists for parallelizing their programs in different scientific fields, including weather forecasting and simulations. OpenACC is a high-level programming model that supports parallelism and is easy to learn to use by adding high-level directives without considering too many low-level details. Testing parallel programs is a difficult task, made even harder if using programming models, especially if they have been badly programmed. If so, it will be challenging to detect their runtime errors as well as their causes, whether the error is from the user source code or from the programming model directives. Even when these errors are detected and the source code modified, we cannot guarantee that the errors have been corrected or are still hidden. There are many tools and studies that have investigated several programming models for identifying and detecting related errors. However, OpenACC has not been targeted clearly in any testing tool or previous studies, even though OpenACC has many benefits and features that could lead to increasing use in achieving parallel systems with less effort. In this paper, we enhance ACC_TEST with the ability to test OpenACC-based programs and detect runtime errors by using hybrid-testing techniques that enhance error coverage occurring in OpenACC as well as overheads and testing time. INDEX TERMS OpenACC, OpenACC testing tool, Hybrid-testing techniques, Parallel Programming, ACC_TEST.
IEEE Access
Recently, incorporating more than one programming model into a system designed for high performance computing (HPC) has become a popular solution to implementing parallel systems. Since traditional programming languages, such as C, C++, and Fortran, do not support parallelism, many programmers add one or more programming models to achieve parallelism and accelerate computation efficiently. These models include Open Accelerators (OpenACC) and Open Multi-Processing (OpenMP), which have recently been used with various models, including Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA). Due to the difficulty of predicting the behavior of threads, runtime errors cannot be predicted. The compiler cannot identify runtime errors such as data races, race conditions, deadlocks, or livelocks. Many studies have been conducted on the development of testing tools to detect runtime errors when using programming models, such as the combinations of OpenACC with MPI models and OpenMP with MPI. Although more applications use OpenACC and OpenMP together, no testing tools have been developed to test these applications to date. This paper presents a testing tool for detecting runtime using a static testing technique. This tool can detect actual and potential runtime errors during the integration of the OpenACC and OpenMP models into systems developed in C++. This tool implement error dependency graphs, which are proposed in this paper. Additionally, a dependency graph of the errors is provided, along with a classification of runtime errors that result from combining the two programming models mentioned earlier.
Parallel Hybrid Testing Techniques for the Dual-Programming Models-Based Programs
Symmetry, 2020
The importance of high-performance computing is increasing, and Exascale systems will be feasible in a few years. These systems can be achieved by enhancing the hardware’s ability as well as the parallelism in the application by integrating more than one programming model. One of the dual-programming model combinations is Message Passing Interface (MPI) + OpenACC, which has several features including increased system parallelism, support for different platforms with more performance, better productivity, and less programming effort. Several testing tools target parallel applications built by using programming models, but more effort is needed, especially for high-level Graphics Processing Unit (GPU)-related programming models. Owing to the integration of different programming models, errors will be more frequent and unpredictable. Testing techniques are required to detect these errors, especially runtime errors resulting from the integration of MPI and OpenACC; studying their behavi...
ACC_TEST: Hybrid Testing Techniques for MPI-Based Programs
IEEE Access
Recently, MPI has become widely used in many scientific applications, including different non-computer science fields, for parallelizing their applications. An MPI programming model is used for supporting parallelism in several programming languages, including C, C++, and Fortran. MPI also supports integration with some programming models and has several implementations from different vendors, including open-source and commercial implementations. However, testing parallel programs is a difficult task, especially when using programming models with different behaviours and types of error based on the programming model type. In addition, the increased use of these programming models by non-computer science specialists can cause several errors due to lack of experience in programming, which needs to be considered when using any testing tools. We noticed that dynamic testing techniques have been used for testing the majority of MPI programs. The dynamic testing techniques detect errors by analyzing the source code during runtime, which will cause overheads, and this will affect the program's performance, especially when targeting massive parallel applications generating thousands or millions of threads. In this paper, we enhance ACC_TEST to have the ability to test MPI-based programs and detect runtime errors occurring with different types of MPI communications. We decided to use hybrid-testing techniques by combining both static and dynamic testing techniques to gain the benefit of each and reduce the cost. INDEX TERMS MPI, MPI testing tool, hybrid testing techniques, parallel programming, ACC_TEST.
OpenACC Errors Classification and Static Detection Techniques
IEEE Access
With the continued increase of usage of High-Performance Computing (HPC) in scientific fields, the need for programming models in a heterogeneous architecture with less programming effort has become important in scientific applications. OpenACC is a high-level parallel programming model used with FORTRAN, C, and C++ programming languages to accelerate the programmers' code with fewer changes and less effort, which reduces programmer workloads and makes it easier to use and learn. Also, OpenACC has been increasingly used in many top supercomputers around the world, and three of the top five HPC applications in Intersect360 Research are currently using OpenACC. However, when programmers use OpenACC to parallelize their code without correctly understanding OpenACC directives and their usage or following OpenACC instructions, they can cause run-time errors that vary from causing wrong results, performance issues, and other undefined behaviors. In addition, building parallel systems by using a higher level programming model increase the possibility to introduce errors, and the parallel applications thus have non-determined behavior, which makes testing and detecting their run-time errors a challenging task. Although there are many testing tools that detect run-time errors, this is still inadequate for detecting errors that occur in applications implemented in high-level parallel programming models, especially OpenACC related applications. As a result, OpenACC errors that cannot be detected by compilers should be identified, and their causes should be explained. In this paper, our contribution is introducing new static techniques for detecting OpenACC errors, as well as for the first time classifying errors that can occur in OpenACC software programs. Finally, to the best of our knowledge, there is no published work to date that identifies or classifies OpenACC-related errors, nor is there a testing tool designed to test OpenACC applications and detect their run-time errors. INDEX TERMS OpenACC, OpenACC run-time errors, OpenACC error classifications, OpenACC testing tool, Static approach for OpenACC application.
A Validation Testsuite for OpenACC 1.0
2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Directive-based programming models provide highlevel of abstraction thus hiding complex low-level details of the underlying hardware from the programmer. One such model is OpenACC that is also a portable programming model allowing programmers to write applications that offload portions of work from a host CPU to an attached accelerator (GPU or a similar device). The model is gaining popularity and being used for accelerating many types of applications, ranging from molecular dynamics codes to particle physics models. It is critical to evaluate the correctness of the OpenACC implementations and determine its conformance to the specification. In this paper, we present a robust and scalable testing infrastructure that serves this purpose. We worked very closely with three main vendors that offer compiler support for OpenACC and assisted them in identifying and resolving compiler bugs helping them improve the quality of their compilers. The testsuite also aims to identify and resolve ambiguities within the OpenACC specification. This testsuite has been integrated into the harness infrastructure of the TITAN machine at Oak Ridge National Lab and is being used for production. The testsuite consists of test cases for all the directives and clauses of OpenACC, both for C and Fortran languages. The testsuite discussed in this paper focuses on the OpenACC 1.0 feature set. The framework of the testsuite is robust enough to create test cases for 2.0 and future releases. This work is in progress.
UPC-CHECK: a scalable tool for detecting run-time errors in Unified Parallel C
Computer Science - Research and Development, 2013
Unified Parallel C (UPC) [23] is a language used to write parallel programs for distributed memory parallel computers. UPC-CHECK is a scalable tool developed to automatically detect argument errors in UPC functions and deadlocks in UPC programs at runtime and issue high quality error messages to help programmers quickly fix those errors. The run-time complexity of all detection techniques used are optimal, i.e. O(1) except for deadlocks involving locks where it is theoretically known to be linear in the number of threads. The tool is easy to use, and involves merely replacing the compiler command with upc-check. Error messages issued by UPC-CHECK were evaluated using the UPC RTED test suite [6] for argument errors in UPC functions and deadlocks. Results of these tests show that the error messages issued by UPC-CHECK for these tests are excellent.
Structural testing criteria for message-passing parallel programs
Concurrency and Computation: Practice and Experience, 2008
Parallel programs present some features such as concurrency, communication and synchronization that make the test a challenging activity. Because of these characteristics, the direct application of traditional testing is not always possible and adequate testing criteria and tools are necessary. In this paper we investigate the challenges of validating message-passing parallel programs and present a set of specific testing criteria. We introduce a family of structural testing criteria based on a test model. The model captures control and data flow of the message-passing programs, by considering their sequential and parallel aspects. The criteria provide a coverage measure that can be used for evaluating the progress of the testing activity and also provide guidelines for the generation of test data. We also describe a tool, called ValiPar, which supports the application of the proposed testing criteria. Currently, ValiPar is configured for parallel virtual machine (PVM) and message-passing interface (MPI). Results of the application of the proposed criteria to MPI programs are also presented and analyzed.
Integrating Static and Dynamic Analysis Techniques for Detecting Dynamic Errors in MPI Programs
IJCSMC, 2018
Message passing interface is the de-facto standard for programming high performance computing applications and it is ready for scaling to extreme scale system with millions of nodes and billions of cores. With this huge number of components MPI will be error prone. Many types of errors can occur with MPI implementation such as deadlock. Testing tools can assist application developers in the detection of such errors. This thesis provides an integration of static and dynamic analysis to enable a precise testing of MPI applications. This paper presents a new algorithm that is able to detect deadlock for point –to-point communication in two phases. By using static phase the overhead in this dynamic phase will limited. The experimental results show that our verification tool verifies several benchmarks and finds deadlock.