Camilo A. Celis Guzman | Seoul National University (original) (raw)
Uploads
Papers by Camilo A. Celis Guzman
2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)
This work proposes a co-scheduling technique for co-located parallel applications on Non-Uniform ... more This work proposes a co-scheduling technique for co-located parallel applications on Non-Uniform Memory Access (NUMA) multi-socket multi-core platforms. The technique allocates core resources for running parallel applications such that both the utilization of the memory controllers and the CPU cores are maximized. Utilization is predicted using an online performance prediction model based on queuing systems. At runtime, the core allocation is periodically re-evaluated and cores are reassigned to executing applications. Experimental results show that the proposed co-scheduling technique is able to execute co-located parallel applications in significantly less total execution time than the default Linux scheduler and a conventional scalability-based scheduler.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018
With an increasing number of cores and memory controllers in multiprocessor platforms, co-locatio... more With an increasing number of cores and memory controllers in multiprocessor platforms, co-location of parallel applications is gaining on importance. Key to achieve good performance is allocating the proper number of threads to colocated applications. This paper presents NuPoCo, a framework for automatically managing parallelism of co-located parallel applications on NUMA multi-socket multi-core systems. NuPoCo maximizes the utilization of CPU cores and memory controllers by dynamically adjusting the number of threads for co-located parallel applications. Evaluated with various scenarios of co-located OpenMP applications on a 64-core AMD and a 72-core Intel machine, NuPoCo achieves a reduction of the total turnaround time by 10-20% compared to the default Linux scheduler and an existing parallelism management policy focusing on CPU utilization only.
2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2017
Traditional approaches for cache-coherent shared-memory architectures running symmetric multiproc... more Traditional approaches for cache-coherent shared-memory architectures running symmetric multiprocessing (SMP) operating systems are not adequate for future manycore chips where power management presents one of the most important challenges. In this work, we present a power management framework for many-core systems that does not require coherent shared memory and supports multiple-voltage/multiple-frequency (MVMF) architectures. A hierar-chical NUMA-aware power management technique combines dynamic voltage and frequency scaling (DVFS) with workload migration. The conflicting goals of grouping workloads with similar utilization patterns and placing workloads as close as possible to their data are considered by a greedy placement algorithm. Implemented in software and evaluated on existing hardware, the proposed technique achieves a 30 and 8 percent improvement in performance-per-watt compared to DVFS-only and NUMA-unaware power management.
In this paper, we introduce SnuMAP, an open-source trace profiler for multi/many-core systems. Th... more In this paper, we introduce SnuMAP, an open-source trace profiler for multi/many-core systems. The proposed profiler is light-weight, requires no source-code instrumentation and does not degrade performance of the target parallel application, but instead it provides useful information and insights for application developers and multi/many-core resource managers. SnuMAP collects the execution trace of every thread from a multi-threaded application inside the Linux kernel and queries the trace information to visualize it in user-space. Since the trace information is collected in the Linux kernel, SnuMAP can provide insights for multi/many-core resource management. For example, the resource manager can understand how parallel applications are executed and behave on a platform, especially when other workloads are executed simultaneously. This feature is gaining importance as today’s multi/many-core systems co-schedule multiple parallel workloads to increase system utilization. In this p...
In this paper, we introduce SnuMAP, an open-source trace profiler for multi/many-core systems. Th... more In this paper, we introduce SnuMAP, an open-source trace profiler for multi/many-core systems. The proposed profiler is light-weight, requires no source-code instrumentation and does not degrade performance of the target parallel application, but instead it provides useful information and insights for application developers and multi/many-core resource managers. SnuMAP collects the execution trace of every thread from a multi-threaded application inside the Linux kernel and queries the trace information to visualize it in user-space. Since the trace information is collected in the Linux kernel, SnuMAP can provide insights for multi/many-core resource management. For example, the resource manager can understand how parallel applications are executed and behave on a platform, especially when other workloads are executed simultaneously. This feature is gaining importance as today's multi/many-core systems co-schedule multiple parallel workloads to increase system utilization. In this paper, we suggest several use cases of the profiler and present interesting results through SnuMAP on our two multi-core platforms, a 32-core AMD Opteron server, and a 36-core Tile-Gx36 processor. SnuMAP is an open source project, more information is available at
2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)
This work proposes a co-scheduling technique for co-located parallel applications on Non-Uniform ... more This work proposes a co-scheduling technique for co-located parallel applications on Non-Uniform Memory Access (NUMA) multi-socket multi-core platforms. The technique allocates core resources for running parallel applications such that both the utilization of the memory controllers and the CPU cores are maximized. Utilization is predicted using an online performance prediction model based on queuing systems. At runtime, the core allocation is periodically re-evaluated and cores are reassigned to executing applications. Experimental results show that the proposed co-scheduling technique is able to execute co-located parallel applications in significantly less total execution time than the default Linux scheduler and a conventional scalability-based scheduler.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018
With an increasing number of cores and memory controllers in multiprocessor platforms, co-locatio... more With an increasing number of cores and memory controllers in multiprocessor platforms, co-location of parallel applications is gaining on importance. Key to achieve good performance is allocating the proper number of threads to colocated applications. This paper presents NuPoCo, a framework for automatically managing parallelism of co-located parallel applications on NUMA multi-socket multi-core systems. NuPoCo maximizes the utilization of CPU cores and memory controllers by dynamically adjusting the number of threads for co-located parallel applications. Evaluated with various scenarios of co-located OpenMP applications on a 64-core AMD and a 72-core Intel machine, NuPoCo achieves a reduction of the total turnaround time by 10-20% compared to the default Linux scheduler and an existing parallelism management policy focusing on CPU utilization only.
2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2017
Traditional approaches for cache-coherent shared-memory architectures running symmetric multiproc... more Traditional approaches for cache-coherent shared-memory architectures running symmetric multiprocessing (SMP) operating systems are not adequate for future manycore chips where power management presents one of the most important challenges. In this work, we present a power management framework for many-core systems that does not require coherent shared memory and supports multiple-voltage/multiple-frequency (MVMF) architectures. A hierar-chical NUMA-aware power management technique combines dynamic voltage and frequency scaling (DVFS) with workload migration. The conflicting goals of grouping workloads with similar utilization patterns and placing workloads as close as possible to their data are considered by a greedy placement algorithm. Implemented in software and evaluated on existing hardware, the proposed technique achieves a 30 and 8 percent improvement in performance-per-watt compared to DVFS-only and NUMA-unaware power management.
In this paper, we introduce SnuMAP, an open-source trace profiler for multi/many-core systems. Th... more In this paper, we introduce SnuMAP, an open-source trace profiler for multi/many-core systems. The proposed profiler is light-weight, requires no source-code instrumentation and does not degrade performance of the target parallel application, but instead it provides useful information and insights for application developers and multi/many-core resource managers. SnuMAP collects the execution trace of every thread from a multi-threaded application inside the Linux kernel and queries the trace information to visualize it in user-space. Since the trace information is collected in the Linux kernel, SnuMAP can provide insights for multi/many-core resource management. For example, the resource manager can understand how parallel applications are executed and behave on a platform, especially when other workloads are executed simultaneously. This feature is gaining importance as today’s multi/many-core systems co-schedule multiple parallel workloads to increase system utilization. In this p...
In this paper, we introduce SnuMAP, an open-source trace profiler for multi/many-core systems. Th... more In this paper, we introduce SnuMAP, an open-source trace profiler for multi/many-core systems. The proposed profiler is light-weight, requires no source-code instrumentation and does not degrade performance of the target parallel application, but instead it provides useful information and insights for application developers and multi/many-core resource managers. SnuMAP collects the execution trace of every thread from a multi-threaded application inside the Linux kernel and queries the trace information to visualize it in user-space. Since the trace information is collected in the Linux kernel, SnuMAP can provide insights for multi/many-core resource management. For example, the resource manager can understand how parallel applications are executed and behave on a platform, especially when other workloads are executed simultaneously. This feature is gaining importance as today's multi/many-core systems co-schedule multiple parallel workloads to increase system utilization. In this paper, we suggest several use cases of the profiler and present interesting results through SnuMAP on our two multi-core platforms, a 32-core AMD Opteron server, and a 36-core Tile-Gx36 processor. SnuMAP is an open source project, more information is available at