Costin Iancu - Academia.edu (original) (raw)

Costin Iancu

Uploads

Papers by Costin Iancu

Research paper thumbnail of Program Correctness, Verification and Testing for Exascale (Corvette)

Research paper thumbnail of An evaluation of search tree techniques in the presence of caches

2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.

Research paper thumbnail of Report of the HPC Correctness Summit, January 25-26, 2017, Washington, DC

Research paper thumbnail of Time-Sharing Redux for Large-Scale HPC Systems

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2016

Research paper thumbnail of Scaling Spark on Lustre

Lecture Notes in Computer Science, 2016

Research paper thumbnail of Scaling Spark on HPC Systems

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing - HPDC '16, 2016

Research paper thumbnail of Load balancing on speed

Proceedings of the 15th Acm Sigplan Symposium, Jan 9, 2010

Research paper thumbnail of SReplay

Proceedings of the 2016 International Conference on Supercomputing - ICS '16, 2016

Research paper thumbnail of Floating-point precision tuning using blame analysis

Proceedings of the 38th International Conference on Software Engineering - ICSE '16, 2016

Research paper thumbnail of Exploiting variability for energy optimization of parallel programs

Proceedings of the Eleventh European Conference on Computer Systems - EuroSys '16, 2016

Research paper thumbnail of Scalable data race detection for partitioned global address space programs

Proceedings of the 18th Acm Sigplan Symposium on Principles and Practice of Parallel Programming, Feb 23, 2013

Research paper thumbnail of Validation and Analysis

To fully exploit multicore processors, applications are expected to provide a large degree of thr... more To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical workloads, the current load balancing support in operating systems may not be able to achieve efficient hardware utilization for parallel workloads. Balancing run queue length globally ignores the needs of parallel applications where threads are required to make equal progress. In this paper we present a load balancing technique designed specifically for parallel applications running on multicore systems. Instead of balancing run queue length, our algorithm balances the time a thread has executed on “faster ” and “slower ” cores. We provide a user level implementation of speed balancing on UMA and NUMA multisocket architectures running Linux and discuss behavior across a variety of workloads, usage scenarios and programming models. Our results indicate that speed balancing when compared to the native Linux load b...

Research paper thumbnail of Exploiting communication concurrency on high performance computing systems

Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15, 2015

Research paper thumbnail of Scaling data race detection for partitioned global address space programs

Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13, 2013

Research paper thumbnail of Message Strip-Mining Heuristics for High Speed Networks

Lecture Notes in Computer Science, 2005

Research paper thumbnail of Runtime optimization of vector operations on large scale SMP clusters

Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008

Research paper thumbnail of Optimizing communication overlap for high-speed networks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007

... disregarded the variation of network perfor-mance parameters with system workload, scale, app... more ... disregarded the variation of network perfor-mance parameters with system workload, scale, application ... information about network performance variability with system scale and work-load into a ... a com-plete exploration of the optimization space or characterization of network ...

Research paper thumbnail of Scalable data race detection for partitioned global address space programs

ACM SIGPLAN Notices, 2013

Research paper thumbnail of Hybrid PGAS runtime support for multicore nodes

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model - PGAS '10, 2010

Research paper thumbnail of Oversubscription on multicore processors

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

Research paper thumbnail of Program Correctness, Verification and Testing for Exascale (Corvette)

Research paper thumbnail of An evaluation of search tree techniques in the presence of caches

2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.

Research paper thumbnail of Report of the HPC Correctness Summit, January 25-26, 2017, Washington, DC

Research paper thumbnail of Time-Sharing Redux for Large-Scale HPC Systems

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2016

Research paper thumbnail of Scaling Spark on Lustre

Lecture Notes in Computer Science, 2016

Research paper thumbnail of Scaling Spark on HPC Systems

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing - HPDC '16, 2016

Research paper thumbnail of Load balancing on speed

Proceedings of the 15th Acm Sigplan Symposium, Jan 9, 2010

Research paper thumbnail of SReplay

Proceedings of the 2016 International Conference on Supercomputing - ICS '16, 2016

Research paper thumbnail of Floating-point precision tuning using blame analysis

Proceedings of the 38th International Conference on Software Engineering - ICSE '16, 2016

Research paper thumbnail of Exploiting variability for energy optimization of parallel programs

Proceedings of the Eleventh European Conference on Computer Systems - EuroSys '16, 2016

Research paper thumbnail of Scalable data race detection for partitioned global address space programs

Proceedings of the 18th Acm Sigplan Symposium on Principles and Practice of Parallel Programming, Feb 23, 2013

Research paper thumbnail of Validation and Analysis

To fully exploit multicore processors, applications are expected to provide a large degree of thr... more To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical workloads, the current load balancing support in operating systems may not be able to achieve efficient hardware utilization for parallel workloads. Balancing run queue length globally ignores the needs of parallel applications where threads are required to make equal progress. In this paper we present a load balancing technique designed specifically for parallel applications running on multicore systems. Instead of balancing run queue length, our algorithm balances the time a thread has executed on “faster ” and “slower ” cores. We provide a user level implementation of speed balancing on UMA and NUMA multisocket architectures running Linux and discuss behavior across a variety of workloads, usage scenarios and programming models. Our results indicate that speed balancing when compared to the native Linux load b...

Research paper thumbnail of Exploiting communication concurrency on high performance computing systems

Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15, 2015

Research paper thumbnail of Scaling data race detection for partitioned global address space programs

Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13, 2013

Research paper thumbnail of Message Strip-Mining Heuristics for High Speed Networks

Lecture Notes in Computer Science, 2005

Research paper thumbnail of Runtime optimization of vector operations on large scale SMP clusters

Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008

Research paper thumbnail of Optimizing communication overlap for high-speed networks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007

... disregarded the variation of network perfor-mance parameters with system workload, scale, app... more ... disregarded the variation of network perfor-mance parameters with system workload, scale, application ... information about network performance variability with system scale and work-load into a ... a com-plete exploration of the optimization space or characterization of network ...

Research paper thumbnail of Scalable data race detection for partitioned global address space programs

ACM SIGPLAN Notices, 2013

Research paper thumbnail of Hybrid PGAS runtime support for multicore nodes

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model - PGAS '10, 2010

Research paper thumbnail of Oversubscription on multicore processors

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

Log In