Fumiyoshi Shoji - Academia.edu (original) (raw)

Papers by Fumiyoshi Shoji

Research paper thumbnail of Multi-block/multi-core SSOR preconditioner for the QCD quark solver for K computer

arXiv (Cornell University), Oct 28, 2012

Research paper thumbnail of K計算機の完全システムに関するCVIII原子量子材料シミュレーションのための超拡張可能なアルゴリズム【Powered by NICT】

IEEE Conference Proceedings, 2016

Research paper thumbnail of Extremely scalable algorithm for 10$^8$-atom quantum material simulation on the full system of the K computer

arXiv (Cornell University), Sep 27, 2016

Research paper thumbnail of In-situ Measurement of the Internal Stress in Sputter-Deposited Silicon Nitride/Oxide Thin Films

Research paper thumbnail of MCRG study and renormalized coupling constants

Nuclear Physics B - Proceedings Supplements, 1995

We report our analysis of MCRG study where we employ an improved blocking scheme and present our ... more We report our analysis of MCRG study where we employ an improved blocking scheme and present our first trial to determine effective coupling constants obtained on the blocked lattices.

Research paper thumbnail of 23a-E-3 Gauge dependence of monopole dynamics in lattice QCD II

Meeting Abstracts of the Physical Society of Japan (Nihon Butsuri Gakkai koen gaiyoshu), 1997

Research paper thumbnail of アーベリアン射影におけるゲージ依存性について

取得学位:博士(理学),学位授与番号:博乙第204号,学位授与年月日:平成12年3月22日,学位授与年:200

Research paper thumbnail of Extremely scalable algorithm for 10^8-atom quantum material simulation on the full system of the K computer

An extremely scalable linear-algebraic algorithm was developed for quantum material simulation (e... more An extremely scalable linear-algebraic algorithm was developed for quantum material simulation (electronic state calculation) with 10^8 atoms or 100-nm-scale materials. The mathematical foundation is generalized shifted linear equations ((zB - A) x = b), instead of conventional generalized eigenvalue equations. The method has a highly parallelizable mathematical structure. The fundamental theory is mathematical and is applicable also to other scientific fields. The benchmark shows an extreme strong scaling and a qualified time-to-solution on the full system of the K computer. The method was demonstrated in a real material research for ultra-flexible (organic) devices, key devices of next-generation IoT products. The present paper shows that an innovative scalable algorithm for a real research can appear by the co-design among application, algorithm and architecture.

Research paper thumbnail of Visualization Tool for Development of Communication Algorithms and a Case Study Using the K Computer

In this paper, we introduce our visualization tool, the Communication Log Viewer (CLV), that assi... more In this paper, we introduce our visualization tool, the Communication Log Viewer (CLV), that assists the development of collective communication algorithms. We also present visualiza-tion results as a case study. CLV visualizes information regarding node events and network statistics in linked multiple views. CLV also has a function for analyzing the results obtained from network simulators and actual machines in the same framework, which is useful when developers repeatedly test their algorithms on a simulator and an actual system. For a case study, we visually evaluated two all-to-all algorithms on the full system of the K computer that has 82,944 nodes. As a result, we confirmed that an optimized all-to-all algorithm implemented for the K computer performed better than an all-to-all implemented in Open MPI. We also confirmed that the barrier operation used in the K computer's Message Passing Interface (MPI) functions keep link utilization high. However, there is also a trade-...

Research paper thumbnail of Current status of the K computer

Research paper thumbnail of Multipurpose Independent-Study Environment for Information Technology Based Education and Training

We have investigated the multipurpose independent-study environment to equip all the students wit... more We have investigated the multipurpose independent-study environment to equip all the students with a higher edu-cation of the information technology and to reduce the computer anxiety. As a trial case we considered students who are interested in the international communication through network and who want to study computer science, multimedia technology and foreign languages. The multi-purpose environment, featuring a variety of computer system, opened June 2000 in Hiroshima university. Sup-port staffs are always ready to help the students. The en-vironment consists of four kinds of independent-study rooms, terminal rooms, VOD (Video-on-Demand) corner, and separated booths. After the environment opened, al-most all computers are filled by university students. It encourages the students to obtain the practical knowledge of the information technology.

Research paper thumbnail of Design and Evaluation of K Computer

The IEICE transactions on information and systems, 2013

Research paper thumbnail of Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets

Lecture Notes in Computer Science

This paper presents an in situ framework focused on time-varying simulations, and uses a novel te... more This paper presents an in situ framework focused on time-varying simulations, and uses a novel temporal buffer for storing simulation results sampled at user-defined intervals. This framework has been designed to provide flexible data processing and visualization capabilities in modern HPC operational environments composed of powerful front-end systems, for pre-and post-processing purposes, along with traditional back-end HPC systems. The temporal buffer is implemented using the functionalities provided by Open Address Space (OpAS) library, which enables asynchronous one-sided communication from outside processes to any exposed memory region on the simulator side. This buffer can store time-varying simulation results, and can be processed via in situ approaches with different proximities. We present a prototype of our framework, and code integration process with a target simulation code. The proposed in situ framework utilizes separate files to describe the initialization and execution codes, which are in the form of Python scripts. This framework also enables the runtime modification of these Python-based files, thus providing greater flexibility to the users, not only for data processing, such as visualization and analysis, but also for the simulation steering.

Research paper thumbnail of Gauge independence of Abelian and monopole dominance

We formulate a stochastic gauge fixing method to study the gauge dependence of Abelian projection... more We formulate a stochastic gauge fixing method to study the gauge dependence of Abelian projection. In this method, one can change the gauge from the maximally Abelian one to no gauge fixing continuously. We have found that the linear part of the heavy quark potential from Abelian contribution depends little on the gauge parameter. Similar results have been obtained for the monopole contribution part.We also investigate the gauge dependence of the length of monopole loop, which is known to be important for the confinement, and monopole density. These results suggest that the picture that monopole plays an important role for the confinement of QCD dose not depend on choice of the gauge.

Research paper thumbnail of Monopole condensation and confinement

Research paper thumbnail of An In-Situ Visualization Approach for the K Computer Using Mesa 3D and KVS

Although K computer has been operational for more than five years, it is still ranked in the top ... more Although K computer has been operational for more than five years, it is still ranked in the top 10 of the Top500 list, and in active use, especially in Japan. One of the peculiarity of this system is the use of SPARC64fx CPU, with no instruction set compatibility with other traditional CPU architecture, and the use of a two-staged parallel file system, where the necessary data is moved from the user accessible GFS (Global File System) to a faster LFS (Local File System) for enabling high performance I/O during the simulation run. Since the users have no access to the data during the simulation run, the tightly coupled (co-processing) in-situ visualization approach seems to be the most suitable approach for this HPC system. For the visualization purposes, the hardware developer (Fujitsu) did not provide or support the traditional Mesa 3D graphics library on their SPARC64fx CPU, and in exchange, it provided a non-OSS (Open Source Software) and non-OpenGL visualization library with Pa...

Research paper thumbnail of A use of PC terminals as PC cluster

Research paper thumbnail of Workload Classification and Performance Analysis using Job Metrics in the K computer

In the K computer, the job manager and peripheral tools collect various metrics and store them in... more In the K computer, the job manager and peripheral tools collect various metrics and store them into databases. A part of the metrics is directly provided to users by the job manager. Also, some part of the metrics is summarized and reported by administrators. However, most of the data are not fully exploited for analysis to help inform our operations because the amount of data stored in databases is growing every moment and becoming huge size that is difficult to handle them. In this study, to get the picture of workloads behavior regarding arithmetic, memory access, and I/O intensive, we attempt to classify the workloads based on modern statistics. At first, before classification of the workloads, we analyze metrics behavior as a preliminary study by PCA and select features to be used in classification. After that, we partition the workloads into several groups by k-means and DBSCAN clustering methods with 10,000 sampling workload records extracted from nearly one million records i...

Research paper thumbnail of Implementation and Evaluation of MPI Allreduce on the K Computer

This paper reports a method of speeding up MPI collective communication on the K computer, which ... more This paper reports a method of speeding up MPI collective communication on the K computer, which consists of 82,944 computing nodes connected by a 6D direct network, named Tofu interconnect. Existing MPI libraries, however, do not have topology-aware algorithms which perform well on such a direct network. Thus, an Allreduce collective algorithm, named Trinaryx3, is designed and implemented in the MPI library for the K computer. The algorithm is optimized for a torus network and enables utilizing multiple RDMA engines, one of the strengths of the K computer. The evaluation results show the new implementation achieves five times higher bandwidth than existing one.

Research paper thumbnail of A Study on Open Source Software for Large-Scale Data Visualization on SPARC64fx based HPC Systems

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018

In this paper, we present a study on the available open-source software (OSS) for large-scale dat... more In this paper, we present a study on the available open-source software (OSS) for large-scale data visualization on the SPARC64fx based HPC systems, such as the K computer and also the Fujitsu PRIMEHPC FX family of supercomputers (FX10 and FX100), which are commonly available throughout Japan. It is widely known that these HPC systems have been generating a vast amount of simulation results in a wide range of science and engineering fields. However, there was no much information regarding the large-scale data visualization software and approaches in such HPC infrastructure. In this work, we focused on the visualization approaches where the HPC hardware resources are directly used for the visualization processing, which can be helpful to minimize the large data transfer issue for the visualization and analysis purposes. This study includes both OpenGL (Open Graphics Library) and non-OpenGL based visualization approaches, and also the availability of the GLSL (OpenGL Shading Language)...

Research paper thumbnail of Multi-block/multi-core SSOR preconditioner for the QCD quark solver for K computer

arXiv (Cornell University), Oct 28, 2012

Research paper thumbnail of K計算機の完全システムに関するCVIII原子量子材料シミュレーションのための超拡張可能なアルゴリズム【Powered by NICT】

IEEE Conference Proceedings, 2016

Research paper thumbnail of Extremely scalable algorithm for 10$^8$-atom quantum material simulation on the full system of the K computer

arXiv (Cornell University), Sep 27, 2016

Research paper thumbnail of In-situ Measurement of the Internal Stress in Sputter-Deposited Silicon Nitride/Oxide Thin Films

Research paper thumbnail of MCRG study and renormalized coupling constants

Nuclear Physics B - Proceedings Supplements, 1995

We report our analysis of MCRG study where we employ an improved blocking scheme and present our ... more We report our analysis of MCRG study where we employ an improved blocking scheme and present our first trial to determine effective coupling constants obtained on the blocked lattices.

Research paper thumbnail of 23a-E-3 Gauge dependence of monopole dynamics in lattice QCD II

Meeting Abstracts of the Physical Society of Japan (Nihon Butsuri Gakkai koen gaiyoshu), 1997

Research paper thumbnail of アーベリアン射影におけるゲージ依存性について

取得学位:博士(理学),学位授与番号:博乙第204号,学位授与年月日:平成12年3月22日,学位授与年:200

Research paper thumbnail of Extremely scalable algorithm for 10^8-atom quantum material simulation on the full system of the K computer

An extremely scalable linear-algebraic algorithm was developed for quantum material simulation (e... more An extremely scalable linear-algebraic algorithm was developed for quantum material simulation (electronic state calculation) with 10^8 atoms or 100-nm-scale materials. The mathematical foundation is generalized shifted linear equations ((zB - A) x = b), instead of conventional generalized eigenvalue equations. The method has a highly parallelizable mathematical structure. The fundamental theory is mathematical and is applicable also to other scientific fields. The benchmark shows an extreme strong scaling and a qualified time-to-solution on the full system of the K computer. The method was demonstrated in a real material research for ultra-flexible (organic) devices, key devices of next-generation IoT products. The present paper shows that an innovative scalable algorithm for a real research can appear by the co-design among application, algorithm and architecture.

Research paper thumbnail of Visualization Tool for Development of Communication Algorithms and a Case Study Using the K Computer

In this paper, we introduce our visualization tool, the Communication Log Viewer (CLV), that assi... more In this paper, we introduce our visualization tool, the Communication Log Viewer (CLV), that assists the development of collective communication algorithms. We also present visualiza-tion results as a case study. CLV visualizes information regarding node events and network statistics in linked multiple views. CLV also has a function for analyzing the results obtained from network simulators and actual machines in the same framework, which is useful when developers repeatedly test their algorithms on a simulator and an actual system. For a case study, we visually evaluated two all-to-all algorithms on the full system of the K computer that has 82,944 nodes. As a result, we confirmed that an optimized all-to-all algorithm implemented for the K computer performed better than an all-to-all implemented in Open MPI. We also confirmed that the barrier operation used in the K computer's Message Passing Interface (MPI) functions keep link utilization high. However, there is also a trade-...

Research paper thumbnail of Current status of the K computer

Research paper thumbnail of Multipurpose Independent-Study Environment for Information Technology Based Education and Training

We have investigated the multipurpose independent-study environment to equip all the students wit... more We have investigated the multipurpose independent-study environment to equip all the students with a higher edu-cation of the information technology and to reduce the computer anxiety. As a trial case we considered students who are interested in the international communication through network and who want to study computer science, multimedia technology and foreign languages. The multi-purpose environment, featuring a variety of computer system, opened June 2000 in Hiroshima university. Sup-port staffs are always ready to help the students. The en-vironment consists of four kinds of independent-study rooms, terminal rooms, VOD (Video-on-Demand) corner, and separated booths. After the environment opened, al-most all computers are filled by university students. It encourages the students to obtain the practical knowledge of the information technology.

Research paper thumbnail of Design and Evaluation of K Computer

The IEICE transactions on information and systems, 2013

Research paper thumbnail of Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets

Lecture Notes in Computer Science

This paper presents an in situ framework focused on time-varying simulations, and uses a novel te... more This paper presents an in situ framework focused on time-varying simulations, and uses a novel temporal buffer for storing simulation results sampled at user-defined intervals. This framework has been designed to provide flexible data processing and visualization capabilities in modern HPC operational environments composed of powerful front-end systems, for pre-and post-processing purposes, along with traditional back-end HPC systems. The temporal buffer is implemented using the functionalities provided by Open Address Space (OpAS) library, which enables asynchronous one-sided communication from outside processes to any exposed memory region on the simulator side. This buffer can store time-varying simulation results, and can be processed via in situ approaches with different proximities. We present a prototype of our framework, and code integration process with a target simulation code. The proposed in situ framework utilizes separate files to describe the initialization and execution codes, which are in the form of Python scripts. This framework also enables the runtime modification of these Python-based files, thus providing greater flexibility to the users, not only for data processing, such as visualization and analysis, but also for the simulation steering.

Research paper thumbnail of Gauge independence of Abelian and monopole dominance

We formulate a stochastic gauge fixing method to study the gauge dependence of Abelian projection... more We formulate a stochastic gauge fixing method to study the gauge dependence of Abelian projection. In this method, one can change the gauge from the maximally Abelian one to no gauge fixing continuously. We have found that the linear part of the heavy quark potential from Abelian contribution depends little on the gauge parameter. Similar results have been obtained for the monopole contribution part.We also investigate the gauge dependence of the length of monopole loop, which is known to be important for the confinement, and monopole density. These results suggest that the picture that monopole plays an important role for the confinement of QCD dose not depend on choice of the gauge.

Research paper thumbnail of Monopole condensation and confinement

Research paper thumbnail of An In-Situ Visualization Approach for the K Computer Using Mesa 3D and KVS

Although K computer has been operational for more than five years, it is still ranked in the top ... more Although K computer has been operational for more than five years, it is still ranked in the top 10 of the Top500 list, and in active use, especially in Japan. One of the peculiarity of this system is the use of SPARC64fx CPU, with no instruction set compatibility with other traditional CPU architecture, and the use of a two-staged parallel file system, where the necessary data is moved from the user accessible GFS (Global File System) to a faster LFS (Local File System) for enabling high performance I/O during the simulation run. Since the users have no access to the data during the simulation run, the tightly coupled (co-processing) in-situ visualization approach seems to be the most suitable approach for this HPC system. For the visualization purposes, the hardware developer (Fujitsu) did not provide or support the traditional Mesa 3D graphics library on their SPARC64fx CPU, and in exchange, it provided a non-OSS (Open Source Software) and non-OpenGL visualization library with Pa...

Research paper thumbnail of A use of PC terminals as PC cluster

Research paper thumbnail of Workload Classification and Performance Analysis using Job Metrics in the K computer

In the K computer, the job manager and peripheral tools collect various metrics and store them in... more In the K computer, the job manager and peripheral tools collect various metrics and store them into databases. A part of the metrics is directly provided to users by the job manager. Also, some part of the metrics is summarized and reported by administrators. However, most of the data are not fully exploited for analysis to help inform our operations because the amount of data stored in databases is growing every moment and becoming huge size that is difficult to handle them. In this study, to get the picture of workloads behavior regarding arithmetic, memory access, and I/O intensive, we attempt to classify the workloads based on modern statistics. At first, before classification of the workloads, we analyze metrics behavior as a preliminary study by PCA and select features to be used in classification. After that, we partition the workloads into several groups by k-means and DBSCAN clustering methods with 10,000 sampling workload records extracted from nearly one million records i...

Research paper thumbnail of Implementation and Evaluation of MPI Allreduce on the K Computer

This paper reports a method of speeding up MPI collective communication on the K computer, which ... more This paper reports a method of speeding up MPI collective communication on the K computer, which consists of 82,944 computing nodes connected by a 6D direct network, named Tofu interconnect. Existing MPI libraries, however, do not have topology-aware algorithms which perform well on such a direct network. Thus, an Allreduce collective algorithm, named Trinaryx3, is designed and implemented in the MPI library for the K computer. The algorithm is optimized for a torus network and enables utilizing multiple RDMA engines, one of the strengths of the K computer. The evaluation results show the new implementation achieves five times higher bandwidth than existing one.

Research paper thumbnail of A Study on Open Source Software for Large-Scale Data Visualization on SPARC64fx based HPC Systems

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018

In this paper, we present a study on the available open-source software (OSS) for large-scale dat... more In this paper, we present a study on the available open-source software (OSS) for large-scale data visualization on the SPARC64fx based HPC systems, such as the K computer and also the Fujitsu PRIMEHPC FX family of supercomputers (FX10 and FX100), which are commonly available throughout Japan. It is widely known that these HPC systems have been generating a vast amount of simulation results in a wide range of science and engineering fields. However, there was no much information regarding the large-scale data visualization software and approaches in such HPC infrastructure. In this work, we focused on the visualization approaches where the HPC hardware resources are directly used for the visualization processing, which can be helpful to minimize the large data transfer issue for the visualization and analysis purposes. This study includes both OpenGL (Open Graphics Library) and non-OpenGL based visualization approaches, and also the availability of the GLSL (OpenGL Shading Language)...