Emilio Luque | University Autonoma of Barcelona (original) (raw)
Papers by Emilio Luque
En la ultima decada, el consumo energetico ha dirigido el diseno de todos los sistemas de computo... more En la ultima decada, el consumo energetico ha dirigido el diseno de todos los sistemas de computo, desde dispositivos moviles a los cuales cada vez se le piden mas prestaciones que deben ser soportadas por una pequena bateria, hasta los sistemas de Computo de Altas Prestaciones (HPC, de High Performance Computing), objeto de nuestro interes, los cuales consumen enormes cantidades de energia. Esta alta demanda energetica tiene serias consecuencias nancieras, medioambientales, y en muchos casos tambien sociales. El aumento de la e ciencia energetica de los sistemas de HPC no solo proviene de las nuevas arquitecturas hardware, tambien esta involucrado el software, quien debe gestionar y con gurar el sistema para mantener un determinado equilibrio entre tiempo de ejecucion, e ciencia energetica y productividad. Esta situacion nos ha motivado a realizar una colaboracion entre tres universidades para estudiar diferentes temas relacionados a la computacion ecologica. Nuestro trabajo se cen...
The Journal of Supercomputing
The analysis of parallel scientific applications allows us to understand their computational and ... more The analysis of parallel scientific applications allows us to understand their computational and communication behavior. One way of obtaining performance information is through performance tools. One such tool is parallel application signatures for performance prediction (PAS2P), based on parallel application repeatability, focusing on performance analysis and prediction. The same resources that execute the parallel application are used to perform its analysis, creating a machine independent model of the application and identifying its common patterns. However, the analysis is costly in terms of execution time due to the high number of synchronization communications performed by PAS2P, degrading performance as the number of processes increases. To solve this problem, we propose a model that reduces data dependency between processes, reducing the number of communications performed by PAS2P in the analysis stage and taking advantage of the characteristics of single program, multiple s...
The modeling of large-scale stochastic systems of heterogeneous individuals and their interaction... more The modeling of large-scale stochastic systems of heterogeneous individuals and their interactions, where multiple behaviors exist, requires a large number of scenarios and repetitions of simulation experiments. In these areas, the agent-based simulation (ABM) is the common tool and the High-Performance Computing can provide an adequate infrastructure for this type of simulations. The present work shows the methodology and the tools developed to allow the execution of multiple simulation scenarios based on ABM Netlogo model simulation in an HPC environment. The goal is to provide a user-friendly environment for ABM simulation to non-technological users.
Lecture Notes in Computer Science, 2018
The continuously growing High-Performance Computing requirements increments the number of compone... more The continuously growing High-Performance Computing requirements increments the number of components and at the same time failure probabilities. Long-running parallel applications are directly affected by this phenomena, disrupting its executions on failure occurrences. MPI, a well-known standard for parallel applications follows a fail-stop semantic, requiring the application owners restart the whole execution when hard failures appear losing time and computation data. Fault Tolerance (FT) techniques approach this issue by providing high availability to the users' applications execution, though adding significant resource and time costs. In this paper, we present a Fault Tolerance Manager (FTM) framework based on RADIC architecture, which provides FT protection to parallel applications implemented with MPI, in order to successfully complete executions despite failures. The solution is implemented in the application-layer following the uncoordinated and semi-coordinated rollback recovery protocols. It uses a sender-based message logger to store exchanged messages between the application processes; and checkpoints only the processes data required to restart them in case of failures. The solution uses the concepts of ULFM for failure detection and recovery. Furthermore, a dynamic resource controller is added to the proposal, which monitors the message logger buffers and performs actions to maintain an acceptable level of protection. Experimental validation verifies the FTM functionality using two private clusters infrastructures.
The “Hospital de Clínicas” is one of the busiest hospitals in Paraguay, with an average of 1150 o... more The “Hospital de Clínicas” is one of the busiest hospitals in Paraguay, with an average of 1150 outpatient per day. Usually, patients have to wait a very long time to be treated, causing anger and discomfort. This paper presents an agent-based model of the process of outpatient consultation for the Department of Internal Medicine. The goal is to have a better understanding of the process and evaluating different solutions to reduce the patient waiting time. Keywordsagent-based model; Simulation; consultations; patient flow; outpatient.
The performance of message-passing applications varies depending on the parallel system, causing ... more The performance of message-passing applications varies depending on the parallel system, causing potential inefficiencies when its number of processes increases. By this reason, it is critical to predict the application behavior before executing it, in order to use the system efficiently. We propose a methodology that allows us to predict the application scalability behavior in a specific system, providing information to select the most appropriate resources to run the application. The methodology strives to use a bounded analysis time, and a reduced set of resources. This paper presents the general methodology, focusing on validating the step of the methodology concerning to the generation of scalability communication model. We can predict the evolution of the communication pattern using a reduced set of resources. Analyzing from 16 to 256 processes, we can predict the communication pattern for 4,096 processes.
Computer simulation based methods have enjoyed widespread use in healthcare system investigation ... more Computer simulation based methods have enjoyed widespread use in healthcare system investigation and improvement in recent years. Healthcare systems are based on human interactions and Emergency Departments (ED) are one of the key components of the healthcare system. The efficiency and quality of service in ED have a great influence on the whole healthcare system. The first step to intensively study the emergency department, to find its underlying problem or to provide the best service with limited budget, should be to create a realistic computational model of the ED. Agent-Based Modeling and Simulation (ABMS) is an excellent tool to deal with complex system like ED. This research introduces a generalized ABMS-based computational model of ED. The model has been implemented and verified in a Netlogo modeling environment and can be used to simulate different EDs through a tuning process. Keywords–Emergency Department; Healthcare; Agent-Based Modeling and Simulating; Complex system.
When a message-passing application is executed many times over a long period of time, using an el... more When a message-passing application is executed many times over a long period of time, using an elevated number of resources, it is critical to predict its behavior before executing it. We propose a methodology to predict the strong scalability behavior for message-passing applications in specific systems. It is focused on characterizing and analyzing the communication and computational application patterns, from a set of executions in small scale, to project their behavior when the number of processes increases. The methodology strives to use a reduced number of resources. This paper presents the general methodology, focusing on validating the computational time model, which is a regression based approach. This model allows us to predict the computation time with high accuracy for a large number of processes. We executed from 16 to 256 processes and we predicted the computation time up until 4,096 processes. For the applications tested, we obtained an error of less than 9%.
IEEE Access, 2020
In this paper, a new method is presented to study the impacts of telemedicine on the performance ... more In this paper, a new method is presented to study the impacts of telemedicine on the performance of an emergency department in Spain. Spain's Demographics indicate that this country is experiencing population aging, resulting in overcrowding of emergency departments and significant demand on the healthcare system. However, it has been reported that most patients visiting emergency departments are not in an urgent clinical condition, thus they causing hospital overcrowding, high medical expenses, delays in clinical service delivery and low service efficiency for urgent patients who truly need emergency care. Telemedicine and e-health are considered as solutions for remote delivery of health services to care seekers in order to decrease hospital visits for patients who are in less of an emergency condition. In this study, by using detailed computational modeling and clinical data, we have investigated the impacts of telemedicine on the performance of an emergency department through estimations of Length of Stay as a quantitative index for evaluation of quality of service in the emergency department. Specifically, an agent-based modeling and simulation system was developed and used to study the behavior of the emergency department by taking detailed modeling parameters, including varying the number of non-urgent arrivals as a result of telemedicine, into account as inputs of the model. The inputs were provided through collection and analysis of clinical data that enabled us to predict how telemedicine changes emergency department visits. Our results indicated that emergency departments would experience decreases equal to 41.14% in total Length of Stay if eliminating all non-urgent visits and decreases of up to 10.48% if restricting the non-urgent visits. The developed computational tool in this study and the corresponding results obtained can provide decision makers and health care providers with objective information on the impacts of e-health services on the efficiency of emergency department and they can have also implications for care delivery, optimizing resources, planning, and improving the quality of care. INDEX TERMS Emergency department, length of stay, telemedicine and e-health, agent-based modeling and simulation, clinical data collection and analysis, non-urgent visits.
Procedia Computer Science, 2017
A clinical trial is a study designed to demonstrate the efficacy and safety of a drug, procedure,... more A clinical trial is a study designed to demonstrate the efficacy and safety of a drug, procedure, medical device, or diagnostic test. Since clinical trials involve research in humans, they must be carefully designed and must comply strictly with a set of ethical conditions. Logistical disadvantages, ethical constraints, costs and high execution times could have a negative impact on the execution of the clinical trial. This article proposes the use of a simulation tool, the MRSA-T-Simulator, to design and perform "virtual clinical trials" for the purpose of studying MRSA contact transmission among hospitalized patients. The main advantage of the simulator is its flexibility when it comes to configuring the patient population, healthcare staff and the simulation environment.
2015 Winter Simulation Conference (WSC), 2015
International Journal of Computational Science and Engineering, 2016
The evaluation of a program's behavior in the presence of transient faults is often a very time c... more The evaluation of a program's behavior in the presence of transient faults is often a very time consuming work. In order to achieve significant data, thousands of executions are normally required and each execution will have the significant overhead of the fault injection environment. Our previously published methodology reduced significantly the time needed to evaluate the robustness of a program execution by exhaustively analyzing its basic blocks trace instead of using fault injection. In this paper we present an even forward improvement in the evaluation time of parallel programs robustness against transient faults by combining our methodology with PAS2P-a method that strives to describe an application based on its messagepassing activity. The combination of our approach and PAS2P allowed us to predict the robustness of larger parallel programs, reducing in some cases in more than 20 times the time needed to calculate the robustness while obtaining a robustness prediction error of less than 4%.
Journal of Computer Science Technology, Apr 1, 2015
Currently, there is an increasing interest about the cloud platform by the High Performance Compu... more Currently, there is an increasing interest about the cloud platform by the High Performance Computing (HPC) community, and the Parallel I/O for High Performance Systems is not an exception. In cloud platforms, the user takes into account not only the execution time but also the cost, because the cost can be one of the most important issue. In this paper, we propose a methodology to quickly evaluate the performance and cost of Virtual Clusters for parallel scientific application that uses parallel I/O. From the parallel application I/O model automatically extracted with our tool PAS2P-IO, we obtain the I/O requirements and then the user can select the Virtual Cluster that meets the application requirements. The application I/O model does not depend on the underlying I/O system. One of the main benefits of applying our methodology is that it is not necessary to execute the application to select the Virtual Cluster on cloud. Finally, costs and performance-cost ratio for the Virtual Clusters are provided to facilitate the decision making on the selection of resources on a cloud platform.
Resumen La demanda actual de grandes capacidades de cómputo ha comportado un importante progreso ... more Resumen La demanda actual de grandes capacidades de cómputo ha comportado un importante progreso de los sistemas paralelos. Asimismo, diferentes estudios constatan la baja utilización de los recursos de cómputo disponibles en una red de ordenadores. Considerando ambas situaciones, la comunidad científica ha trabajado en el desarrollo de entornos que permitan el uso de estas redes de ordenadores, también denominadas clusters, con una doble funcionalidad: ejecutar aplicaciones paralelas aprovechando los recursos ociosos presentes a lo largo del cluster, sin perturbar el rendimiento de las aplicaciones ejecutadas por los usuarios locales. Este trabajo presenta un entorno, denominado CISNE, enmarcado en esta línea de trabajo. CISNE incide tanto en la planificación espacial como temporal de las aplicaciones distribuidas, garantizando, al mismo tiempo, que el usuario local presente en cada uno de los nodos que constituyen el cluster, no perciba una ralentización en el rendimiento de sus aplicaciones. Los resultados obtenidos, medidos con herramientas que han sido desarrolladas a tal efecto, muestran la viabilidad de nuestras propuestas.
Providing Quality of Service (QoS) in Video on Demand systems (VoD) is a challenging problem. In ... more Providing Quality of Service (QoS) in Video on Demand systems (VoD) is a challenging problem. In this paper, we analyse the fault tolerance on a P2P multicast delivery scheme, called Patch Collaboration Manager / Multicast Channel Distributed Branching (PCM/MCDB) [01]. This scheme decentralizes the delivery process between clients and scales the VoD server performance. PCM/MCDB synchronizes a group of clients in order to create local network channels to replace ongoing multicast channels from the VoD server. Using the P2P paradigm supposes facing the challenge of how often peers connect and disconnect from the system. To address this problem, a centralized mechanism is able to replace the failed client. We evaluate the failure management process of the centralized scheme in terms of the overhead injected into the network and analyse the applicability of a distributed approach to managing the process. Analytical models are developed for centralized and distributed approaches. Their behaviour are compared in order to evaluate whether the distributed scheme can improve the fault management process, in terms of reducing server load and generating better scalability.
The goal of this work is to execute SPMD applications efficiently on heterogeneous environments. ... more The goal of this work is to execute SPMD applications efficiently on heterogeneous environments. Applications used to test our work are designed with message-passing interface to communicate and are developed to be executed in a single core cluster. However, we create a methodology to execute efficiently these SPMD applications over heterogeneous architectures. The SPMD applications are selected because they present high level of synchronism and communications; both elements could generate challenges when we want to obtain our objective, which is defined as to obtain an improvement in the execution time while maintaining the efficiency level over a threshold defined by programmer, taking into consideration the communications heterogeneities present in a multicore cluster. This objective is achieved using a mapping and scheduling strategies included in our methodology. Finally, the results obtained show an improvement around 40% in the best case of efficiency in SPMD applications tested, when our methodology is applied.
The efficient use of high performance computing is usually focused on the use of computational re... more The efficient use of high performance computing is usually focused on the use of computational resources. However, scientific applications currently produce a large volume of information. Therefore, the Input/Output (I/O) subsystem also should be used efficiently. In order to do so, it is necessary to know the application I/O patterns and establish a relationship between these patterns and the I/O susbsystem configuration. To analyze the I/O behavior of applications, we propose use a library of the PAS2P (Application Signature for Performance Prediction) tool. Parallel applications typically have repetitive behavior, and the I/O patterns of parallel applications also have that behavior. We propose to identify the portions (I/O phases) where the application does I/O. From these I/O phases, we extract an application model that can be used to evaluate it in different I/O subsystems considering the I/O phases and compute-communication phases. In this paper, we present the concepts used in the PAS2P methodology, which have been adapted for MPI-IO applications. We have extracted the I/O model of applications. This approach was used to estimate the I/O time of an application in different subsystems. The results show a relative error of estimation lower than 10%.
En la ultima decada, el consumo energetico ha dirigido el diseno de todos los sistemas de computo... more En la ultima decada, el consumo energetico ha dirigido el diseno de todos los sistemas de computo, desde dispositivos moviles a los cuales cada vez se le piden mas prestaciones que deben ser soportadas por una pequena bateria, hasta los sistemas de Computo de Altas Prestaciones (HPC, de High Performance Computing), objeto de nuestro interes, los cuales consumen enormes cantidades de energia. Esta alta demanda energetica tiene serias consecuencias nancieras, medioambientales, y en muchos casos tambien sociales. El aumento de la e ciencia energetica de los sistemas de HPC no solo proviene de las nuevas arquitecturas hardware, tambien esta involucrado el software, quien debe gestionar y con gurar el sistema para mantener un determinado equilibrio entre tiempo de ejecucion, e ciencia energetica y productividad. Esta situacion nos ha motivado a realizar una colaboracion entre tres universidades para estudiar diferentes temas relacionados a la computacion ecologica. Nuestro trabajo se cen...
The Journal of Supercomputing
The analysis of parallel scientific applications allows us to understand their computational and ... more The analysis of parallel scientific applications allows us to understand their computational and communication behavior. One way of obtaining performance information is through performance tools. One such tool is parallel application signatures for performance prediction (PAS2P), based on parallel application repeatability, focusing on performance analysis and prediction. The same resources that execute the parallel application are used to perform its analysis, creating a machine independent model of the application and identifying its common patterns. However, the analysis is costly in terms of execution time due to the high number of synchronization communications performed by PAS2P, degrading performance as the number of processes increases. To solve this problem, we propose a model that reduces data dependency between processes, reducing the number of communications performed by PAS2P in the analysis stage and taking advantage of the characteristics of single program, multiple s...
The modeling of large-scale stochastic systems of heterogeneous individuals and their interaction... more The modeling of large-scale stochastic systems of heterogeneous individuals and their interactions, where multiple behaviors exist, requires a large number of scenarios and repetitions of simulation experiments. In these areas, the agent-based simulation (ABM) is the common tool and the High-Performance Computing can provide an adequate infrastructure for this type of simulations. The present work shows the methodology and the tools developed to allow the execution of multiple simulation scenarios based on ABM Netlogo model simulation in an HPC environment. The goal is to provide a user-friendly environment for ABM simulation to non-technological users.
Lecture Notes in Computer Science, 2018
The continuously growing High-Performance Computing requirements increments the number of compone... more The continuously growing High-Performance Computing requirements increments the number of components and at the same time failure probabilities. Long-running parallel applications are directly affected by this phenomena, disrupting its executions on failure occurrences. MPI, a well-known standard for parallel applications follows a fail-stop semantic, requiring the application owners restart the whole execution when hard failures appear losing time and computation data. Fault Tolerance (FT) techniques approach this issue by providing high availability to the users' applications execution, though adding significant resource and time costs. In this paper, we present a Fault Tolerance Manager (FTM) framework based on RADIC architecture, which provides FT protection to parallel applications implemented with MPI, in order to successfully complete executions despite failures. The solution is implemented in the application-layer following the uncoordinated and semi-coordinated rollback recovery protocols. It uses a sender-based message logger to store exchanged messages between the application processes; and checkpoints only the processes data required to restart them in case of failures. The solution uses the concepts of ULFM for failure detection and recovery. Furthermore, a dynamic resource controller is added to the proposal, which monitors the message logger buffers and performs actions to maintain an acceptable level of protection. Experimental validation verifies the FTM functionality using two private clusters infrastructures.
The “Hospital de Clínicas” is one of the busiest hospitals in Paraguay, with an average of 1150 o... more The “Hospital de Clínicas” is one of the busiest hospitals in Paraguay, with an average of 1150 outpatient per day. Usually, patients have to wait a very long time to be treated, causing anger and discomfort. This paper presents an agent-based model of the process of outpatient consultation for the Department of Internal Medicine. The goal is to have a better understanding of the process and evaluating different solutions to reduce the patient waiting time. Keywordsagent-based model; Simulation; consultations; patient flow; outpatient.
The performance of message-passing applications varies depending on the parallel system, causing ... more The performance of message-passing applications varies depending on the parallel system, causing potential inefficiencies when its number of processes increases. By this reason, it is critical to predict the application behavior before executing it, in order to use the system efficiently. We propose a methodology that allows us to predict the application scalability behavior in a specific system, providing information to select the most appropriate resources to run the application. The methodology strives to use a bounded analysis time, and a reduced set of resources. This paper presents the general methodology, focusing on validating the step of the methodology concerning to the generation of scalability communication model. We can predict the evolution of the communication pattern using a reduced set of resources. Analyzing from 16 to 256 processes, we can predict the communication pattern for 4,096 processes.
Computer simulation based methods have enjoyed widespread use in healthcare system investigation ... more Computer simulation based methods have enjoyed widespread use in healthcare system investigation and improvement in recent years. Healthcare systems are based on human interactions and Emergency Departments (ED) are one of the key components of the healthcare system. The efficiency and quality of service in ED have a great influence on the whole healthcare system. The first step to intensively study the emergency department, to find its underlying problem or to provide the best service with limited budget, should be to create a realistic computational model of the ED. Agent-Based Modeling and Simulation (ABMS) is an excellent tool to deal with complex system like ED. This research introduces a generalized ABMS-based computational model of ED. The model has been implemented and verified in a Netlogo modeling environment and can be used to simulate different EDs through a tuning process. Keywords–Emergency Department; Healthcare; Agent-Based Modeling and Simulating; Complex system.
When a message-passing application is executed many times over a long period of time, using an el... more When a message-passing application is executed many times over a long period of time, using an elevated number of resources, it is critical to predict its behavior before executing it. We propose a methodology to predict the strong scalability behavior for message-passing applications in specific systems. It is focused on characterizing and analyzing the communication and computational application patterns, from a set of executions in small scale, to project their behavior when the number of processes increases. The methodology strives to use a reduced number of resources. This paper presents the general methodology, focusing on validating the computational time model, which is a regression based approach. This model allows us to predict the computation time with high accuracy for a large number of processes. We executed from 16 to 256 processes and we predicted the computation time up until 4,096 processes. For the applications tested, we obtained an error of less than 9%.
IEEE Access, 2020
In this paper, a new method is presented to study the impacts of telemedicine on the performance ... more In this paper, a new method is presented to study the impacts of telemedicine on the performance of an emergency department in Spain. Spain's Demographics indicate that this country is experiencing population aging, resulting in overcrowding of emergency departments and significant demand on the healthcare system. However, it has been reported that most patients visiting emergency departments are not in an urgent clinical condition, thus they causing hospital overcrowding, high medical expenses, delays in clinical service delivery and low service efficiency for urgent patients who truly need emergency care. Telemedicine and e-health are considered as solutions for remote delivery of health services to care seekers in order to decrease hospital visits for patients who are in less of an emergency condition. In this study, by using detailed computational modeling and clinical data, we have investigated the impacts of telemedicine on the performance of an emergency department through estimations of Length of Stay as a quantitative index for evaluation of quality of service in the emergency department. Specifically, an agent-based modeling and simulation system was developed and used to study the behavior of the emergency department by taking detailed modeling parameters, including varying the number of non-urgent arrivals as a result of telemedicine, into account as inputs of the model. The inputs were provided through collection and analysis of clinical data that enabled us to predict how telemedicine changes emergency department visits. Our results indicated that emergency departments would experience decreases equal to 41.14% in total Length of Stay if eliminating all non-urgent visits and decreases of up to 10.48% if restricting the non-urgent visits. The developed computational tool in this study and the corresponding results obtained can provide decision makers and health care providers with objective information on the impacts of e-health services on the efficiency of emergency department and they can have also implications for care delivery, optimizing resources, planning, and improving the quality of care. INDEX TERMS Emergency department, length of stay, telemedicine and e-health, agent-based modeling and simulation, clinical data collection and analysis, non-urgent visits.
Procedia Computer Science, 2017
A clinical trial is a study designed to demonstrate the efficacy and safety of a drug, procedure,... more A clinical trial is a study designed to demonstrate the efficacy and safety of a drug, procedure, medical device, or diagnostic test. Since clinical trials involve research in humans, they must be carefully designed and must comply strictly with a set of ethical conditions. Logistical disadvantages, ethical constraints, costs and high execution times could have a negative impact on the execution of the clinical trial. This article proposes the use of a simulation tool, the MRSA-T-Simulator, to design and perform "virtual clinical trials" for the purpose of studying MRSA contact transmission among hospitalized patients. The main advantage of the simulator is its flexibility when it comes to configuring the patient population, healthcare staff and the simulation environment.
2015 Winter Simulation Conference (WSC), 2015
International Journal of Computational Science and Engineering, 2016
The evaluation of a program's behavior in the presence of transient faults is often a very time c... more The evaluation of a program's behavior in the presence of transient faults is often a very time consuming work. In order to achieve significant data, thousands of executions are normally required and each execution will have the significant overhead of the fault injection environment. Our previously published methodology reduced significantly the time needed to evaluate the robustness of a program execution by exhaustively analyzing its basic blocks trace instead of using fault injection. In this paper we present an even forward improvement in the evaluation time of parallel programs robustness against transient faults by combining our methodology with PAS2P-a method that strives to describe an application based on its messagepassing activity. The combination of our approach and PAS2P allowed us to predict the robustness of larger parallel programs, reducing in some cases in more than 20 times the time needed to calculate the robustness while obtaining a robustness prediction error of less than 4%.
Journal of Computer Science Technology, Apr 1, 2015
Currently, there is an increasing interest about the cloud platform by the High Performance Compu... more Currently, there is an increasing interest about the cloud platform by the High Performance Computing (HPC) community, and the Parallel I/O for High Performance Systems is not an exception. In cloud platforms, the user takes into account not only the execution time but also the cost, because the cost can be one of the most important issue. In this paper, we propose a methodology to quickly evaluate the performance and cost of Virtual Clusters for parallel scientific application that uses parallel I/O. From the parallel application I/O model automatically extracted with our tool PAS2P-IO, we obtain the I/O requirements and then the user can select the Virtual Cluster that meets the application requirements. The application I/O model does not depend on the underlying I/O system. One of the main benefits of applying our methodology is that it is not necessary to execute the application to select the Virtual Cluster on cloud. Finally, costs and performance-cost ratio for the Virtual Clusters are provided to facilitate the decision making on the selection of resources on a cloud platform.
Resumen La demanda actual de grandes capacidades de cómputo ha comportado un importante progreso ... more Resumen La demanda actual de grandes capacidades de cómputo ha comportado un importante progreso de los sistemas paralelos. Asimismo, diferentes estudios constatan la baja utilización de los recursos de cómputo disponibles en una red de ordenadores. Considerando ambas situaciones, la comunidad científica ha trabajado en el desarrollo de entornos que permitan el uso de estas redes de ordenadores, también denominadas clusters, con una doble funcionalidad: ejecutar aplicaciones paralelas aprovechando los recursos ociosos presentes a lo largo del cluster, sin perturbar el rendimiento de las aplicaciones ejecutadas por los usuarios locales. Este trabajo presenta un entorno, denominado CISNE, enmarcado en esta línea de trabajo. CISNE incide tanto en la planificación espacial como temporal de las aplicaciones distribuidas, garantizando, al mismo tiempo, que el usuario local presente en cada uno de los nodos que constituyen el cluster, no perciba una ralentización en el rendimiento de sus aplicaciones. Los resultados obtenidos, medidos con herramientas que han sido desarrolladas a tal efecto, muestran la viabilidad de nuestras propuestas.
Providing Quality of Service (QoS) in Video on Demand systems (VoD) is a challenging problem. In ... more Providing Quality of Service (QoS) in Video on Demand systems (VoD) is a challenging problem. In this paper, we analyse the fault tolerance on a P2P multicast delivery scheme, called Patch Collaboration Manager / Multicast Channel Distributed Branching (PCM/MCDB) [01]. This scheme decentralizes the delivery process between clients and scales the VoD server performance. PCM/MCDB synchronizes a group of clients in order to create local network channels to replace ongoing multicast channels from the VoD server. Using the P2P paradigm supposes facing the challenge of how often peers connect and disconnect from the system. To address this problem, a centralized mechanism is able to replace the failed client. We evaluate the failure management process of the centralized scheme in terms of the overhead injected into the network and analyse the applicability of a distributed approach to managing the process. Analytical models are developed for centralized and distributed approaches. Their behaviour are compared in order to evaluate whether the distributed scheme can improve the fault management process, in terms of reducing server load and generating better scalability.
The goal of this work is to execute SPMD applications efficiently on heterogeneous environments. ... more The goal of this work is to execute SPMD applications efficiently on heterogeneous environments. Applications used to test our work are designed with message-passing interface to communicate and are developed to be executed in a single core cluster. However, we create a methodology to execute efficiently these SPMD applications over heterogeneous architectures. The SPMD applications are selected because they present high level of synchronism and communications; both elements could generate challenges when we want to obtain our objective, which is defined as to obtain an improvement in the execution time while maintaining the efficiency level over a threshold defined by programmer, taking into consideration the communications heterogeneities present in a multicore cluster. This objective is achieved using a mapping and scheduling strategies included in our methodology. Finally, the results obtained show an improvement around 40% in the best case of efficiency in SPMD applications tested, when our methodology is applied.
The efficient use of high performance computing is usually focused on the use of computational re... more The efficient use of high performance computing is usually focused on the use of computational resources. However, scientific applications currently produce a large volume of information. Therefore, the Input/Output (I/O) subsystem also should be used efficiently. In order to do so, it is necessary to know the application I/O patterns and establish a relationship between these patterns and the I/O susbsystem configuration. To analyze the I/O behavior of applications, we propose use a library of the PAS2P (Application Signature for Performance Prediction) tool. Parallel applications typically have repetitive behavior, and the I/O patterns of parallel applications also have that behavior. We propose to identify the portions (I/O phases) where the application does I/O. From these I/O phases, we extract an application model that can be used to evaluate it in different I/O subsystems considering the I/O phases and compute-communication phases. In this paper, we present the concepts used in the PAS2P methodology, which have been adapted for MPI-IO applications. We have extracted the I/O model of applications. This approach was used to estimate the I/O time of an application in different subsystems. The results show a relative error of estimation lower than 10%.