Rogério Iope | Universidade Estadual Paulista "Júlio de Mesquita Filho" (original) (raw)
Papers by Rogério Iope
Lecture Notes in Computer Science, 2023
Journal of physics, Oct 1, 2016
The recent emergence of hardware architectures characterized by many-core or accelerated processo... more The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Journal of physics, Dec 23, 2015
Detector simulation is consuming at least half of the HEP computing cycles, and even so, experime... more Detector simulation is consuming at least half of the HEP computing cycles, and even so, experiments have to take hard decisions on what to simulate, as their needs greatly surpass the availability of computing resources. New experiments still in the design phase such as FCC, CLIC and ILC as well as upgraded versions of the existing LHC detectors will push further the simulation requirements. Since the increase in computing resources is not likely to keep pace with our needs, it is therefore necessary to explore innovative ways of speeding up simulation in order to sustain the progress of High Energy Physics. The GeantV project aims at developing a high performance detector simulation system integrating fast and full simulation that can be ported on different computing architectures, including CPU accelerators. After more than two years of R&D the project has produced a prototype capable of transporting particles in complex geometries exploiting micro-parallelism, SIMD and multithreading. Portability is obtained via C++ template techniques that allow the development of machineindependent computational kernels. A set of tables derived from Geant4 for cross sections and final states provides a realistic shower development and, having been ported into a Geant4 physics list, can be used as a basis for a direct performance comparison.
The Pulsar IIb is a general purpose FPGA-based processor board designed for full mesh ATCA backpl... more The Pulsar IIb is a general purpose FPGA-based processor board designed for full mesh ATCA backplanes. This hardware was originally designed to support Level-1 silicon track trigger R&D projects at the LHC. Each ATCA carrier board is required to support the Intelligent Platform Management Interface (IPMI) protocol, which is responsible for coordinating hot swap operations and for exchanging sensor data with the shelf manager. This work describes the development of the microcontroller software which supports the IPMI protocol as well as additional non-IPMI services used to remotely program the Pulsar IIb FPGA.
Journal of Physics: Conference Series, 2016
The GeantV project aims to research and develop the next-generation simulation software describin... more The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.
Symposium on Virtual and Augmented Reality, 2021
There is great interest in developing virtual reality and augmented reality applications for use ... more There is great interest in developing virtual reality and augmented reality applications for use in neuromotor rehabilitation treatments. These applications provide the benefits of traditional rehabilitation without becoming tiresome. Furthermore, the way the users interact with these applications makes it possible to retrieve and analyze the data generated by their movements during treatment. However, it is hard to find datasets of body movements, and those available do not include specific movements in the rehabilitation context. We present a development framework for virtual and augmented reality applications to support neuromotor rehabilitation.
Symposium on Virtual and Augmented Reality, 2021
Augmented and virtual reality can be used in motor or neuromotor rehabilitation clinics to make p... more Augmented and virtual reality can be used in motor or neuromotor rehabilitation clinics to make patients become more motivated and engaged with the treatment. The interaction with the applications stimulates the patient to exercise the impaired limb while enjoying the experience. This work takes the real-time tracking data generated from optical and wearable motion capture devices and uses it to feed machine learning algorithms. The data processing makes the movements with different durations consistent and enables the convergence of the models. Also, the data format is independent of the camera position and user. One of the experiments presented recognizes eight movements being executed in the system.
Uma das premissas fundamentais em redes fotônicas baseadas na tecnologia de multiplexação por com... more Uma das premissas fundamentais em redes fotônicas baseadas na tecnologia de multiplexação por comprimento de onda (WDM) é o controle dos caminhos ópticos. Um caminho óptico, ou caminho de luz, é uma conexão puramente óptica estabelecida entre dois nós da rede, que pode atravessar diversos nós intermediários. Para dar suporte eficiente à demanda exigida de uma rede WDM sobre a qual executam aplicações de alto desempenho, os caminhos ópticos devem ser estabelecidos e encerrados dinamicamente, de tal forma que as rotas e os comprimentos de onda escolhidos minimizem a probabilidade de ocorrência de bloqueios de conexão por falta de recursos disponíveis. O elemento central de uma rede WDM é o sistema que controla os comutadores ópticos, determina as rotas, aloca os comprimentos de onda, e estabelece, mantém e encerra as conexões ópticas entre os nós da rede. O objetivo deste trabalho é apresentar estratégias de roteamento e de atribuição de comprimentos de onda para redes fotônicas usand...
Journal on Interactive Systems
The technological evolution allowed the use of a single camera for precise and effective body tra... more The technological evolution allowed the use of a single camera for precise and effective body tracking, reducing the cost and increasing the accessibility of applications in places where depth cameras and wearable sensors are not available. This paper describes and implements a supervised machine learning process consisting of a mobile application used as a motion capture device which also transforms the data into an input for a machine learning model that classifies upper and lower limbs movements (24 types of human movements). The user performs movements in front of the camera, and the trained model classifies them. We designed the system to work in a motor-rehabilitation context to assist the professional while the patient does physical exercises. The implementation can summarize the movements made during the rehabilitation sessions by counting the repetitions and classifying them when done completely or reached a specific range of motion.
Abstract. Scheduling in Grid computing environments is a complex task because of the unique chara... more Abstract. Scheduling in Grid computing environments is a complex task because of the unique characteristics and demands exhibited by those systems. This work proposes the use of classes of service to obtain a more efficient way for Grid resource discovery and allocation. By using ants that continuously wander in the network probing for Grid resources, we designed a system capable of a joint optimization of lightpath routing and scheduling in Lambda Grid systems, without the need of a resource broker, considering the heterogeneity of resources with multiple users competing for them. The results show that the proposed ant-based system can effectively be used as a Grid scheduler over optical networks. Resumo. O escalonamento de recursos em ambientes de Grids computacionais é uma tarefa complexa, devido às características e demandas dinâmicas de tais sistemas. Este trabalho propõe o uso de classes de serviço com o objetivo de obter um meio mais eficiente para a descoberta e a alocação d...
We present an evaluation of the heterogeneous memory system of the Intel Xeon Phi KNL architectur... more We present an evaluation of the heterogeneous memory system of the Intel Xeon Phi KNL architecture, using applications with di!erent characteristics. Applications that perform many data transfer operations from/to to main memory, when associated with the e"cient use of cache memory, are best candidates to obtain performance gains when mapping data structures to the high bandwidth memory unit . Resumo. Neste artigo é apresentada uma avaliação do sistema de memória heterogênea da arquitetura Intel Xeon Phi KNL, usando aplicações com diferentes características. Aplicações que realizam muitas operações de transferências de dados de e para a memória principal, quando associadas ao uso e#ciente de memória cache, são fortes candidatas a terem ganhos de desempenho ao mapearem estruturas de dados para a unidade de memória de grande largura de banda.
UNESP - the Sao Paulo State University - is in the final stages of setting up one of the largest ... more UNESP - the Sao Paulo State University - is in the final stages of setting up one of the largest Campus Grid initiatives in Latin America, with computing resources widely dispersed on seven different cities over the State of Sao Paulo, Brazil. GridUNESP, as the project is known, will empower University research groups of several areas of scientific investigation, mainly genetic sequencing, weather forecasting, molecular and cellular modeling, medical image reconstruction, development of new materials, quantum chemistry, large-scale numerical simulations, and high-energy physics, allowing them to access state-of-art data processing and storage systems. The central cluster, which is installed at the new UNESP campus in Barra Funda, Sao Paulo, has 2,048 processing cores, reaching a theoretical peak performance of about 23.2 teraflops (trillions of calculations per second).
Scheduling in Grid computing environments is a complex task because of the unique characteristics... more Scheduling in Grid computing environments is a complex task because of the unique characteristics and demands exhibited by those systems. This work proposes the use of classes of service to obtain a more efficient way for Grid resource discovery and allocation. By using ants that continuously wander in the network probing for Grid resources, we designed a system capable of a joint optimization of lightpath routing and scheduling in Lambda Grid systems, without the need of a resource broker, considering the heterogeneity of resources with multiple users competing for them. The results show that the proposed ant-based system can effectively be used as a Grid scheduler over optical networks. Resumo. O escalonamento de recursos em ambientes de Grids computacionais é uma tarefa complexa, devido às caracterı́sticas e demandas dinâmicas de tais sistemas. Este trabalho propõe o uso de classes de serviço com o objetivo de obter um meio mais eficiente para a descoberta e a alocação de recurso...
Performance prediction of applications has always been a great challenge, even for homogeneous ar... more Performance prediction of applications has always been a great challenge, even for homogeneous architectures. However, today’s trend is the design of cluster running in a heterogeneous architecture, which increases the complexity of new strategies to predict the behavior and time spent by an application to run. In this paper we present a strategy that predicts the performance of an application on different architectures and rank then according to the performance that the application can achieve on each architecture. The proposed strategy was able to correctly rank three of four applications tested without overhead implications. Our next step is to extend the metrics in order to increase the accuracy.
Deep neural networks provide the canvas to create models of millions of parameters to fit distrib... more Deep neural networks provide the canvas to create models of millions of parameters to fit distributions involving an equally large number of random variables. The contribution of this study is twofold. First, we introduce a diffraction dataset containing computer-based simulations of a Young's interference experiment. Then, we demonstrate the adeptness of variational autoencoders to learn diffraction patterns and extract a latent feature that correlates with the physical wavelength.
Lecture Notes in Computer Science, 2023
Journal of physics, Oct 1, 2016
The recent emergence of hardware architectures characterized by many-core or accelerated processo... more The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Journal of physics, Dec 23, 2015
Detector simulation is consuming at least half of the HEP computing cycles, and even so, experime... more Detector simulation is consuming at least half of the HEP computing cycles, and even so, experiments have to take hard decisions on what to simulate, as their needs greatly surpass the availability of computing resources. New experiments still in the design phase such as FCC, CLIC and ILC as well as upgraded versions of the existing LHC detectors will push further the simulation requirements. Since the increase in computing resources is not likely to keep pace with our needs, it is therefore necessary to explore innovative ways of speeding up simulation in order to sustain the progress of High Energy Physics. The GeantV project aims at developing a high performance detector simulation system integrating fast and full simulation that can be ported on different computing architectures, including CPU accelerators. After more than two years of R&D the project has produced a prototype capable of transporting particles in complex geometries exploiting micro-parallelism, SIMD and multithreading. Portability is obtained via C++ template techniques that allow the development of machineindependent computational kernels. A set of tables derived from Geant4 for cross sections and final states provides a realistic shower development and, having been ported into a Geant4 physics list, can be used as a basis for a direct performance comparison.
The Pulsar IIb is a general purpose FPGA-based processor board designed for full mesh ATCA backpl... more The Pulsar IIb is a general purpose FPGA-based processor board designed for full mesh ATCA backplanes. This hardware was originally designed to support Level-1 silicon track trigger R&D projects at the LHC. Each ATCA carrier board is required to support the Intelligent Platform Management Interface (IPMI) protocol, which is responsible for coordinating hot swap operations and for exchanging sensor data with the shelf manager. This work describes the development of the microcontroller software which supports the IPMI protocol as well as additional non-IPMI services used to remotely program the Pulsar IIb FPGA.
Journal of Physics: Conference Series, 2016
The GeantV project aims to research and develop the next-generation simulation software describin... more The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.
Symposium on Virtual and Augmented Reality, 2021
There is great interest in developing virtual reality and augmented reality applications for use ... more There is great interest in developing virtual reality and augmented reality applications for use in neuromotor rehabilitation treatments. These applications provide the benefits of traditional rehabilitation without becoming tiresome. Furthermore, the way the users interact with these applications makes it possible to retrieve and analyze the data generated by their movements during treatment. However, it is hard to find datasets of body movements, and those available do not include specific movements in the rehabilitation context. We present a development framework for virtual and augmented reality applications to support neuromotor rehabilitation.
Symposium on Virtual and Augmented Reality, 2021
Augmented and virtual reality can be used in motor or neuromotor rehabilitation clinics to make p... more Augmented and virtual reality can be used in motor or neuromotor rehabilitation clinics to make patients become more motivated and engaged with the treatment. The interaction with the applications stimulates the patient to exercise the impaired limb while enjoying the experience. This work takes the real-time tracking data generated from optical and wearable motion capture devices and uses it to feed machine learning algorithms. The data processing makes the movements with different durations consistent and enables the convergence of the models. Also, the data format is independent of the camera position and user. One of the experiments presented recognizes eight movements being executed in the system.
Uma das premissas fundamentais em redes fotônicas baseadas na tecnologia de multiplexação por com... more Uma das premissas fundamentais em redes fotônicas baseadas na tecnologia de multiplexação por comprimento de onda (WDM) é o controle dos caminhos ópticos. Um caminho óptico, ou caminho de luz, é uma conexão puramente óptica estabelecida entre dois nós da rede, que pode atravessar diversos nós intermediários. Para dar suporte eficiente à demanda exigida de uma rede WDM sobre a qual executam aplicações de alto desempenho, os caminhos ópticos devem ser estabelecidos e encerrados dinamicamente, de tal forma que as rotas e os comprimentos de onda escolhidos minimizem a probabilidade de ocorrência de bloqueios de conexão por falta de recursos disponíveis. O elemento central de uma rede WDM é o sistema que controla os comutadores ópticos, determina as rotas, aloca os comprimentos de onda, e estabelece, mantém e encerra as conexões ópticas entre os nós da rede. O objetivo deste trabalho é apresentar estratégias de roteamento e de atribuição de comprimentos de onda para redes fotônicas usand...
Journal on Interactive Systems
The technological evolution allowed the use of a single camera for precise and effective body tra... more The technological evolution allowed the use of a single camera for precise and effective body tracking, reducing the cost and increasing the accessibility of applications in places where depth cameras and wearable sensors are not available. This paper describes and implements a supervised machine learning process consisting of a mobile application used as a motion capture device which also transforms the data into an input for a machine learning model that classifies upper and lower limbs movements (24 types of human movements). The user performs movements in front of the camera, and the trained model classifies them. We designed the system to work in a motor-rehabilitation context to assist the professional while the patient does physical exercises. The implementation can summarize the movements made during the rehabilitation sessions by counting the repetitions and classifying them when done completely or reached a specific range of motion.
Abstract. Scheduling in Grid computing environments is a complex task because of the unique chara... more Abstract. Scheduling in Grid computing environments is a complex task because of the unique characteristics and demands exhibited by those systems. This work proposes the use of classes of service to obtain a more efficient way for Grid resource discovery and allocation. By using ants that continuously wander in the network probing for Grid resources, we designed a system capable of a joint optimization of lightpath routing and scheduling in Lambda Grid systems, without the need of a resource broker, considering the heterogeneity of resources with multiple users competing for them. The results show that the proposed ant-based system can effectively be used as a Grid scheduler over optical networks. Resumo. O escalonamento de recursos em ambientes de Grids computacionais é uma tarefa complexa, devido às características e demandas dinâmicas de tais sistemas. Este trabalho propõe o uso de classes de serviço com o objetivo de obter um meio mais eficiente para a descoberta e a alocação d...
We present an evaluation of the heterogeneous memory system of the Intel Xeon Phi KNL architectur... more We present an evaluation of the heterogeneous memory system of the Intel Xeon Phi KNL architecture, using applications with di!erent characteristics. Applications that perform many data transfer operations from/to to main memory, when associated with the e"cient use of cache memory, are best candidates to obtain performance gains when mapping data structures to the high bandwidth memory unit . Resumo. Neste artigo é apresentada uma avaliação do sistema de memória heterogênea da arquitetura Intel Xeon Phi KNL, usando aplicações com diferentes características. Aplicações que realizam muitas operações de transferências de dados de e para a memória principal, quando associadas ao uso e#ciente de memória cache, são fortes candidatas a terem ganhos de desempenho ao mapearem estruturas de dados para a unidade de memória de grande largura de banda.
UNESP - the Sao Paulo State University - is in the final stages of setting up one of the largest ... more UNESP - the Sao Paulo State University - is in the final stages of setting up one of the largest Campus Grid initiatives in Latin America, with computing resources widely dispersed on seven different cities over the State of Sao Paulo, Brazil. GridUNESP, as the project is known, will empower University research groups of several areas of scientific investigation, mainly genetic sequencing, weather forecasting, molecular and cellular modeling, medical image reconstruction, development of new materials, quantum chemistry, large-scale numerical simulations, and high-energy physics, allowing them to access state-of-art data processing and storage systems. The central cluster, which is installed at the new UNESP campus in Barra Funda, Sao Paulo, has 2,048 processing cores, reaching a theoretical peak performance of about 23.2 teraflops (trillions of calculations per second).
Scheduling in Grid computing environments is a complex task because of the unique characteristics... more Scheduling in Grid computing environments is a complex task because of the unique characteristics and demands exhibited by those systems. This work proposes the use of classes of service to obtain a more efficient way for Grid resource discovery and allocation. By using ants that continuously wander in the network probing for Grid resources, we designed a system capable of a joint optimization of lightpath routing and scheduling in Lambda Grid systems, without the need of a resource broker, considering the heterogeneity of resources with multiple users competing for them. The results show that the proposed ant-based system can effectively be used as a Grid scheduler over optical networks. Resumo. O escalonamento de recursos em ambientes de Grids computacionais é uma tarefa complexa, devido às caracterı́sticas e demandas dinâmicas de tais sistemas. Este trabalho propõe o uso de classes de serviço com o objetivo de obter um meio mais eficiente para a descoberta e a alocação de recurso...
Performance prediction of applications has always been a great challenge, even for homogeneous ar... more Performance prediction of applications has always been a great challenge, even for homogeneous architectures. However, today’s trend is the design of cluster running in a heterogeneous architecture, which increases the complexity of new strategies to predict the behavior and time spent by an application to run. In this paper we present a strategy that predicts the performance of an application on different architectures and rank then according to the performance that the application can achieve on each architecture. The proposed strategy was able to correctly rank three of four applications tested without overhead implications. Our next step is to extend the metrics in order to increase the accuracy.
Deep neural networks provide the canvas to create models of millions of parameters to fit distrib... more Deep neural networks provide the canvas to create models of millions of parameters to fit distributions involving an equally large number of random variables. The contribution of this study is twofold. First, we introduce a diffraction dataset containing computer-based simulations of a Young's interference experiment. Then, we demonstrate the adeptness of variational autoencoders to learn diffraction patterns and extract a latent feature that correlates with the physical wavelength.