Costin Iancu - Academia.edu (original) (raw)
Uploads
Papers by Costin Iancu
2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.
2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2016
Lecture Notes in Computer Science, 2016
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing - HPDC '16, 2016
Proceedings of the 15th Acm Sigplan Symposium, Jan 9, 2010
Proceedings of the 2016 International Conference on Supercomputing - ICS '16, 2016
Proceedings of the 38th International Conference on Software Engineering - ICSE '16, 2016
Proceedings of the Eleventh European Conference on Computer Systems - EuroSys '16, 2016
Proceedings of the 18th Acm Sigplan Symposium on Principles and Practice of Parallel Programming, Feb 23, 2013
To fully exploit multicore processors, applications are expected to provide a large degree of thr... more To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical workloads, the current load balancing support in operating systems may not be able to achieve efficient hardware utilization for parallel workloads. Balancing run queue length globally ignores the needs of parallel applications where threads are required to make equal progress. In this paper we present a load balancing technique designed specifically for parallel applications running on multicore systems. Instead of balancing run queue length, our algorithm balances the time a thread has executed on “faster ” and “slower ” cores. We provide a user level implementation of speed balancing on UMA and NUMA multisocket architectures running Linux and discuss behavior across a variety of workloads, usage scenarios and programming models. Our results indicate that speed balancing when compared to the native Linux load b...
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15, 2015
Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13, 2013
Lecture Notes in Computer Science, 2005
Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007
... disregarded the variation of network perfor-mance parameters with system workload, scale, app... more ... disregarded the variation of network perfor-mance parameters with system workload, scale, application ... information about network performance variability with system scale and work-load into a ... a com-plete exploration of the optimization space or characterization of network ...
ACM SIGPLAN Notices, 2013
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model - PGAS '10, 2010
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010
2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.
2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2016
Lecture Notes in Computer Science, 2016
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing - HPDC '16, 2016
Proceedings of the 15th Acm Sigplan Symposium, Jan 9, 2010
Proceedings of the 2016 International Conference on Supercomputing - ICS '16, 2016
Proceedings of the 38th International Conference on Software Engineering - ICSE '16, 2016
Proceedings of the Eleventh European Conference on Computer Systems - EuroSys '16, 2016
Proceedings of the 18th Acm Sigplan Symposium on Principles and Practice of Parallel Programming, Feb 23, 2013
To fully exploit multicore processors, applications are expected to provide a large degree of thr... more To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical workloads, the current load balancing support in operating systems may not be able to achieve efficient hardware utilization for parallel workloads. Balancing run queue length globally ignores the needs of parallel applications where threads are required to make equal progress. In this paper we present a load balancing technique designed specifically for parallel applications running on multicore systems. Instead of balancing run queue length, our algorithm balances the time a thread has executed on “faster ” and “slower ” cores. We provide a user level implementation of speed balancing on UMA and NUMA multisocket architectures running Linux and discuss behavior across a variety of workloads, usage scenarios and programming models. Our results indicate that speed balancing when compared to the native Linux load b...
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15, 2015
Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13, 2013
Lecture Notes in Computer Science, 2005
Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007
... disregarded the variation of network perfor-mance parameters with system workload, scale, app... more ... disregarded the variation of network perfor-mance parameters with system workload, scale, application ... information about network performance variability with system scale and work-load into a ... a com-plete exploration of the optimization space or characterization of network ...
ACM SIGPLAN Notices, 2013
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model - PGAS '10, 2010
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010