Clemens Grelck - Profile on Academia.edu (original) (raw)

Papers by Clemens Grelck

Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

International Journal of Parallel Programming, Apr 11, 2016

As processor and system manufacturers increase the amount of both inter-and intra-chip parallelis... more As processor and system manufacturers increase the amount of both inter-and intra-chip parallelism it becomes crucial to provide the software industry with high-level, abstract, well defined, efficient and effective tools for engineering highquality software for ubiquitous parallel systems. Parallel and distributed programming methodologies are still dominated by low-level techniques such as send/receive message passing or similarly unstructured shared memory mechanisms. High-level parallel programming and its applications offer many potential advantages to this effect and have a key role to play in the scalable exploitation of ubiquitous parallelism. Since 2001 the series of initially Workshops and more recently Symposia on Highlevel Parallel Programming and Applications (HLPP) has been a forum for researchers developing state-of-the-art concepts, tools and applications for high-level parallel programming. The general emphasis is on software quality, programming productivity and high-level performance models. The 7th International Symposium on High-Level Parallel Programming and Applications was held July 3 and 4, 2014, in the historic Doelenzaal of the University of Amsterdam, right in the historic heart of the city. HLPP 2014 received 34 submissions, out of which the international programme committee selected 15 contributions for presentation at the symposium and for inclusion in this special issue. The symposium was opened with a keynote by Frank Schlimbach, Intel Corp, entitled Parallelism Through CnC: More Flexibility, Less Pain, that was well received by the 30 participants of the symposium. Our special thanks go to -Frank Schlimbach for his great keynote, -the University of Amsterdam for providing a fantastic venue,

Lecture Notes in Computer Science, 2020

Coordination is a well established computing paradigm with a plethora of languages, abstractions ... more Coordination is a well established computing paradigm with a plethora of languages, abstractions and approaches. Yet, we are not aware of any adoption of the principle of coordination in the broad domain of cyber-physical systems, where non-functional properties, such as execution/response time, energy consumption and security are as crucial as functional correctness. We propose a coordination approach, including a functional coordination language and its associated tool flow, that considers time, energy and security as first-class citizens in application design and development. We primarily target cyber-physical systems running on off-the-shelf heterogeneous multi-core platforms. We illustrate our approach by means of a real-world use case, an unmanned aerial vehicle for autonomous reconnaissance mission, which we develop in close collaboration with industry.

Lecture Notes in Computer Science, 2001

Sac is a functional array processing language particularly designed with numerical applications i... more Sac is a functional array processing language particularly designed with numerical applications in mind. In this field the runtime performance of programs critically depends on the efficient utilization of the memory hierarchy. Cache conflicts due to limited set associativity are one relevant source of inefficiency. This paper describes the realization of an optimization technique which aims at eliminating cache conflicts by adjusting the data layout of arrays to specific access patterns and cache configurations. Its effect on cache utilization and runtime performance is demonstrated by investigations on the PDE1 benchmark.

SE-WS 2015 : Software Engineering Workshops 2015 : Gemeinsamer Tagungsband der Workshops der Tagung Software Engineering 2015 : Dresden, 17.-18. März 2015

CEUR workshop proceedings, 2015

Q-learning for Statically Scheduling DAGs

Data parallel frameworks (e.g. Hive, Spark or Tez) can be used to execute complex data analyses c... more Data parallel frameworks (e.g. Hive, Spark or Tez) can be used to execute complex data analyses consisting of many dependent tasks represented by a Directed Acylical Graph (DAG). Minimising the job completion time (i.e. makespan) is still an open problem for large graphs.We propose a novel deep Q-learning (DQN) approach to statically scheduling DAGs and minimising the makespan. Our approach learns to schedule DAGs from scratch instead of learning how to imitate some heuristic. We show that our current approach learns fast and steadily. Furthermore, our approach can schedule DAGs almost 15 times faster than a Forward List Scheduling (FLS) heuristic.

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

Functional High-Performance Computing, Sep 23, 2013

It is our great pleasure to welcome you to the 2nd ACM SIGPLAN Workshop on Functional High-Perfor... more It is our great pleasure to welcome you to the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing. FHPC 2013 brings together researchers who explore declarative highlevel programming technology in application domains where large-scale computations arise naturally and high performance is essential. The workshop is in its second year. Our goal is to establish FHPC as a regular annual forum for researchers interested in applying functional programming techniques in the area of high-performance computing. Functional programming is increasingly recognized as presenting a nice sweet spot between expressiveness and efficiency for parallel programming, reconciling execution performance with programming productivity. Making FHPC'13 happen depended on a number of people and organizations, which we would like to acknowledge here. We thank the authors and panelists for providing the content of the program. We would like to express our gratitude to the program committee and the additional reviewers, who worked very hard in reviewing papers and providing suggestions for their improvements. Special thanks go to ACM SIGPLAN and the ICFP workshop chairs for accepting our workshop nomination and being flexible with organizational matters. The call for papers attracted 14 submissions from Asia, the Americas, and Europe. An international program committee selected 8 contributions for publication. These papers cover a variety of topics. Some touch upon optimizing compilation techniques and programming techniques for GPU applications. Others propose novel parallel programming models, libraries, and bespoke runtime management, which take advantage of declarative constructs for better performance and productivity. In addition to the refereed contributions, FHPC'13 features two invited talks. Matthew Fluet from Rochester Institute of Technology will provide an overview of the Manticore project, with focus on programming models and runtime techniques. Manuel Chakravarty from the University of New South Wales will present different strands of work in data-parallel computing, discussing results and issues in Data-Parallel Haskell and Accelerate. The topic of data-parallelism and GPU computing will be further deepened in a panel discussion. We hope to have put together an interesting program, looking forward to stimulating discussions during the second FHPC workshop, and a successful follow-up FHPC workshop at ICFP 2014.

Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

S-Net is a declarative component coordination language aimed at radically facilitating software e... more S-Net is a declarative component coordination language aimed at radically facilitating software engineering for modern parallel compute systems by near-complete separation of concerns between application (component) engineering and concurrency orchestration. S-Net builds on the concept of stream processing to structure networks of communicating asynchronous components implemented in a conventional (sequential) language. In this paper we present the design, implementation and evaluation of a new and innovative runtime system for S-Net streaming networks. The Front runtime system outperforms the existing implementations of S-Net by orders of magnitude for stress-test benchmarks, significantly reduces runtimes of fully-fledged parallel applications with compute-intensive components and achieves good scalability on our 48-core test system.

We propose a method called user-defined constraints specifically for shape-generic multi-dimensio... more We propose a method called user-defined constraints specifically for shape-generic multi-dimensional array programming. Our proposed technique allows programmers to make implicit constraints in the domain and codomain of functions explicit. This method can help compilers to generate more reliable code, improve performance through better optimization and improve software documentation. We propose and motivate a syntax extension for the functional array language SAC and describe steps to systematically transform source-level constraints into existing intermediate code representations. We discuss ways of statically resolving constraints through aggressive partial evaluation and propose some form of syntactic sugar that blurs the line between user-defined constraints and fullyfledged dependent types.

SAC Goes Cluster: Fully Implicit Distributed Computing

SAC (Single Assignment C) is a purely functional, data-parallel array programming language that p... more SAC (Single Assignment C) is a purely functional, data-parallel array programming language that predominantly targets compute-intensive applications. Thus, clusters of workstations, or distributed memory architectures in general, form highly relevant compilation targets. Notwithstanding, SAC as of today only supports shared-memory architectures, graphics accelerators and heterogeneous combinations thereof. In our current work we aim at closing this gap. At the same time, we are determined to uphold SAC's promise of entirely compiler-directed exploitation of concurrency, no matter what the target architecture is. Distributed memory architectures are going to make this promise a particular challenge. Despite SAC's functional semantics, it is generally far from straightforward to infer exact communication patterns from architecture-agnostic code. Therefore, we intend to capitalise on recent advances in network technology, namely the closing of the gap between memory bandwidth and network bandwidth. We aim at a solution based on a custom-designed software distributed shared memory (S-DSM) and large per-node software-managed cache memories. To this effect the functional nature of SAC with its write-once/read-only arrays provides a strategic advantage that we thoroughly exploit. Throughout the paper we further motivate our approach, sketch out our implementation strategy, show preliminary results and discuss the pros and cons of our approach.

We discuss the aspect of synchronisation in the language design of the asynchronous data flow lan... more We discuss the aspect of synchronisation in the language design of the asynchronous data flow language S-Net. Synchronisation is a crucial aspect of any coordination approach. S-Net provides a particularly simple construct, the synchrocell. The synchrocell is actually two simple to meet regular synchronisation demands itself. We show that in conjunction with other language feature, S-Net synchrocells can effectively do the job. Moreover, we argue that their simplistic design in fact is a necessary prerequisite to implement even more interesting scenarios, for which we outline ways of efficient implementation.

Applied Informatics, 2003

Springer eBooks, 1999

Sac (Single Assignment C) is a strict, purely functional programming language primarily designed ... more Sac (Single Assignment C) is a strict, purely functional programming language primarily designed with numerical applications in mind. Particular emphasis is on efficient support for arrays both in terms of language expressiveness and in terms of runtime performance. Array operations in Sac are based on elementwise specifications using so-called With-loops. These language constructs are also well-suited for concurrent execution on multiprocessor systems. This paper outlines an implicit approach to compile Sac programs for multi-threaded execution on shared memory architectures. Besides the basic compilation scheme, a brief overview of the runtime system is given. Finally, preliminary performance figures demonstrate that this approach is well-suited to achieve almost linear speedups.

Automating Library Migrations with Error Prone and Refaster

Applied computing review, Mar 1, 2023

Array Padding in the Functional Language SAC

Parallel and Distributed Processing Techniques and Applications, 2000

ABSTRACT

Xeon Phi is the common brand name of Intel's Many Integrated Core (MIC) architecture. The first c... more Xeon Phi is the common brand name of Intel's Many Integrated Core (MIC) architecture. The first commercially available generation Knights Corner and the second generation Knights Landing form a middle ground between modestly parallel desktop and standard server processor architectures and the massively parallel GPGPU architectures. In this paper we explore various compilation strategies for the purely functional data-parallel array language SaC (Single Assignment C) to support both MIC architectures in the presence of entirely resource-and target-agnostic source code. Our particular interest lies in doing so with limited, or entirely without, user knowledge about the target architecture. We report on a series of experiments involving two classical benchmarks, Matrix Multiplication and Gaussian Blur, that demonstrate the level of performance that can be expected from compilation of abstract, purely functional source code to the Xeon Phi family of architectures.

Parallel Computing, 2014

Booleans are the most basic values in computing. Machines, however, store Booleans in larger comp... more Booleans are the most basic values in computing. Machines, however, store Booleans in larger compounds such as bytes or integers due to limitations in addressing memory locations. For individual values the relative waste of memory capacity is huge, but the absolute waste is negligible. The latter radically changes if large numbers of Boolean values are processed in (multidimensional) arrays. Most programming languages, however, only provide sparse implementations of Boolean arrays, thus wasting large quantities of memory and potentially making poor use of cache hierarchies. In the context of the functional data-parallel array programming language SAC we investigate dense implementations of Boolean arrays and compare their performance with traditional sparse implementations. A particular challenge arises in data-parallel execution on today's shared memory multi-core architectures: scheduling of loops over Boolean arrays is unaware of the non-standard addressing of dense Boolean arrays. We discuss our proposed solution and report on experiments analysing the impact of the runtime representation of Boolean arrays both on sequential performance as well as on scalability using up to 32 cores of a large ccNUMA multi-core system.

SAC on a Niagara T3-4 Server: Lessons and Experiences

Parallel Computing, 2012

ABSTRACT The Sparc T3-4 server provides up to 512 concurrent hardware threads, a degree of concur... more ABSTRACT The Sparc T3-4 server provides up to 512 concurrent hardware threads, a degree of concurrency that is unprecedented in a single server system. This paper reports on how the automatically parallelising compiler of the data-parallel func-tional array language SAC copes with up to 512 execution units. We investigate three different numerical kernels that are representative for a wide range of appli-cations: matrix multiplication, convolution and 3-dimensional FFT. We show both the high-level declarative coding style of SAC and the performance achieved on the T3-4 server. Last not least, we draw conclusions for improving our compiler technology in the future.

Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

International Journal of Parallel Programming, Apr 11, 2016

Lecture Notes in Computer Science, 2020

Lecture Notes in Computer Science, 2001

SE-WS 2015 : Software Engineering Workshops 2015 : Gemeinsamer Tagungsband der Workshops der Tagung Software Engineering 2015 : Dresden, 17.-18. März 2015

CEUR workshop proceedings, 2015

Q-learning for Statically Scheduling DAGs

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

Functional High-Performance Computing, Sep 23, 2013

Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming

SAC Goes Cluster: Fully Implicit Distributed Computing

Applied Informatics, 2003

Springer eBooks, 1999

Automating Library Migrations with Error Prone and Refaster

Applied computing review, Mar 1, 2023

Array Padding in the Functional Language SAC

Parallel and Distributed Processing Techniques and Applications, 2000

ABSTRACT

Parallel Computing, 2014

SAC on a Niagara T3-4 Server: Lessons and Experiences

Parallel Computing, 2012