Filipe Araujo - Profile on Academia.edu (original) (raw)

Papers by Filipe Araujo

We experimentally evaluate the performance overhead of the virtual environments VMware Player, QE... more We experimentally evaluate the performance overhead of the virtual environments VMware Player, QEMU, VirtualPC and VirtualBox on a dual-core machine. Firstly, we assess the performance of a Linux guest OS running on a virtual machine by separately benchmarking the CPU, file I/O and the network bandwidth. These values are compared to the performance achieved when applications are run on a Linux OS directly over the physical machine. Secondly, we measure the impact that a virtual machine running a volunteer @home project worker causes on a host OS. Results show that performance attainable on virtual machines depends simultaneously on the virtual machine software and on the application type, with CPU-bound applications much less impacted than IO-bound ones. Additionally, the performance impact on the host OS caused by a virtual machine using all the virtual CPU, ranges from 10% to 35%, depending on the virtual environment.

In this paper we present Chkpt2Chkpt, a desktop grid system that aims to reduce turnaround times ... more In this paper we present Chkpt2Chkpt, a desktop grid system that aims to reduce turnaround times of applications by replicating checkpoints. We target desktop computing projects with applications that are comprised of longrunning independent tasks, executed in hundreds or thousands of computers spread over the Internet. While these applications typically do local checkpointing to deal with failures, we propose to replicate those checkpoints in remote places to make them available to other worker nodes.

Checkpoints that store intermediate results of computation have a fundamental impact on the compu... more Checkpoints that store intermediate results of computation have a fundamental impact on the computing throughput of Desktop Grid systems, like BOINC. Currently, BOINC workers store their checkpoints locally. A major limitation of this approach is that whenever a worker leaves unfinished computation, no other worker can proceed from the last stable checkpoint. This forces tasks to be restarted from scratch when the original machine is no longer available. To overcome this limitation, we propose to share checkpoints between nodes. To organize this mechanism, we arrange nodes to form complete graphs (cliques), where nodes share all the checkpoints they compute. Cliques function as survivable units, where checkpoints and tasks are not lost as long as one of the nodes of the clique remains alive. To simplify construction and maintenance of the cliques, we take advantage of the central supervisor of BOINC. To evaluate our solution, we combine simulation with some real data to answer the most fundamental question: what do we need to pay for increased throughput?

Journal of Grid Computing, 2009

Desktop Grids, such as XtremWeb and BOINC, and Service Grids, such as EGEE, are two different app... more Desktop Grids, such as XtremWeb and BOINC, and Service Grids, such as EGEE, are two different approaches for science communities to gather computing power from a large number of computing resources. Nevertheless, little work has been done to combine these two Grid technologies in order to establish a seamless and vast Grid resource pool. In this paper we present the EGEE Service Grid, the BOINC and XtremWeb Desktop Grids. Then, we present the EDGeS solution to bridge the EGEE Service Grid with the BOINC and XtremWeb Desktop Grids.

We present a scheme based on the comparison of intermediate checkpoints that accelerates the dete... more We present a scheme based on the comparison of intermediate checkpoints that accelerates the detection of computing errors of bag-of-tasks executed on volunteer desktop grids. Currently, in the state-of-the-art, replicated task execution is used for result validation. Our method also uses replication, but instead of only comparing results at the end of the replicated computations, we validate ongoing executions by comparing checkpoints of their intermediate execution points. This scheme significantly reduces the time to detect a computational error, which we show with both theoretical analysis and simulation results. In particular, we develop a model that gives the benefit of intermediate checkpointing as a function of checkpoint frequency and error rate, and we confirm this model with simulation experiments. We find that with an error rate of 5% and checkpoint frequency of 20 times per task, the gain is as high as 35% compared to the case where error detection is done only at the end of task execution; for higher checkpoint frequencies or high error rates, the benefits are even greater. In addition, when an erroneous computation is detected at an intermediate execution point, we propose the immediate replacement of that computation with a correct replica from another worker. In this way, useful execution and further validation can continue from that point onward instead of being delayed.

Journal of Grid Computing, 2009

Desktop Grid systems reached a preeminent place among the most powerful computing platforms in th... more Desktop Grid systems reached a preeminent place among the most powerful computing platforms in the planet. Unfortunately, they are extremely vulnerable to mischief, because computing projects exert no administrative or technical control on volunteers. These can very easily output bad results, due to software or hardware glitches (resulting from over-clocking for instance), to get unfair computational credit, or simply to ruin the project. To mitigate this problem, Desktop Grid servers replicate work units and apply majority voting, typically on 2 or 3 results. In this paper, we observe that simple majority voting is powerless against malicious volunteers that collude to attack the project. We argue that to identify this type of attack and to spot colluding nodes, each work unit needs at least 3 voters. In addition, we propose to post-process the voting pools in two steps. i) In the first step, we use a statistical approach to identify nodes that were not colluding, but submitted bad results; ii) then, we use a rather simple principle to go after malicious nodes which acted together: they might have won conflicting voting pools against nodes that were not identified in step i. We use simulation to show that our heuristic can be quite effective against colluding nodes, in scenarios where honest nodes form a majority.

Desktop grids use the free resources in Intranet and Internet environments for large-scale comput... more Desktop grids use the free resources in Intranet and Internet environments for large-scale computation and storage. While desktop grids offer tremendous computational power and a high return on investment, one critical issue is the validation of results returned by participating hosts that are volatile, anonymous, and potentially malicious. We conduct a benefit analysis of a mechanism for result validation that we proposed recently for the detection of errors in longrunning applications. This mechanism is based on using the digest of intermediate checkpoints, and we show in theory and simulation that the relative benefit of this method compared to the state-of-the-art is as high as 45%.

Desktop grid systems reached a preeminent place among the most powerful computing platforms in th... more Desktop grid systems reached a preeminent place among the most powerful computing platforms in the planet. Unfortunately, they are extremely vulnerable to mischief because volunteers can output bad results, for reasons ranging from faulty hardware (like over-clocked CPUs) to intentional sabotage. To mitigate this problem, desktop grid projects replicate work units and apply majority voting, typically on 2 or 3 results. In this paper, we observe that this form of replication is powerless against malicious volunteers that have the intention and the (simple) means to ruin the project using some form of collusion. We argue that each work unit needs at least 3 voters and that voting pools with conflicts enable the master to spot colluding malicious nodes. Hence, we post- process the voting pools in two steps: i) we use a statistical approach to identify nodes that were not colluding, but submitted bad results; ii) we use a rather simple principle to go after malicious nodes which acted together: they might have won conflicting voting pools against nodes that were not identified in step i. We use simulation to show that our heuristic can be quite effective against colluding nodes, in scenarios where honest nodes form a majority.

Desktop grids use the free resources in Intranet and Internet environments for large-scale comput... more Desktop grids use the free resources in Intranet and Internet environments for large-scale computation and storage. While desktop grids offer a high return on investment, one critical issue is the validation of results returned by participating hosts. Several mechanisms for result validation have been previously proposed. However, the characterization of errors is poorly understood. To study error rates, we implemented and deployed a desktop grid application across several thousand hosts distributed over the Internet. We then analyzed the results to give quantitative and empirical characterization of errors stemming from input or output (I/O) failures. We find that in practice, error rates are widespread across hosts but occur relatively infrequently. Moreover, we find that error rates tend to not be stationary over time nor correlated between hosts. In light of these characterization results, we evaluated state-of-the-art error detection mechanisms and describe the trade-offs for using each mechanism.

Service grids and desktop grids are both promoted by their supportive communities as great soluti... more Service grids and desktop grids are both promoted by their supportive communities as great solutions for solving the available compute power problem and helping to balance loads across network systems. Little work, however, has been undertaken to blend these two technologies together in an effort to create one vast and seamless pool of resources. In this paper we will introduce a new FP7 infrastructures project, entitled Enabling Desktop Grids for e-Science (EDGeS), that is building technological bridges to facilitate service and desktop grid interoperability. We provide a taxonomy for existing state of the art grid systems and background into service grids, such as EGEE and volunteer computing platforms, such as BOINC and XtremWeb. We then describe our approach within three themes for identifying translation technologies for porting applications between service grids and desktop grids and vice versa. The individual themes discuss the actual bridging technologies employed, the distributed data issues surrounding deployment and application development and user access issues.

Parallel Processing Letters, 2008

Service grids and desktop grids are both promoted by their supportive communities as great soluti... more Service grids and desktop grids are both promoted by their supportive communities as great solutions for solving the available compute power problem and helping to balance loads across network systems. Little work, however, has been undertaken to blend these two technologies together. In this paper we introduce a new EU project, that is building technological bridges to facilitate service and desktop grid interoperability. We provide a taxonomy and background into service grids, such as EGEE and desktop grids or volunteer computing platforms, such as BOINC and XtremWeb. We then describe our approach for identifying translation technologies between service and desktop grids. The individual themes discuss the actual bridging technologies employed and the distributed data issues surrounding deployment.

Journal of Grid Computing, 2009

Desktop grids use the free resources in Intranet and Internet environments for large-scale comput... more Desktop grids use the free resources in Intranet and Internet environments for large-scale computation and storage. While desktop grids offer a high return on investment, one critical issue is the validation of results returned by participating hosts. Several mechanisms for result validation have been previously proposed. However, the characterization of errors is poorly understood. To study error rates, we implemented and deployed a desktop grid application across several thousand hosts distributed over the Internet. We then analyzed the results to give quantitative and empirical characterization of errors stemming from input or output (I/O) failures. We find that in practice, error rates are widespread across hosts but occur relatively infrequently. Moreover, we find that error rates tend to not be stationary over time nor correlated between hosts. In light of these characterization results, we evaluated state-of-the-art error detection mechanisms and describe the trade-offs for using each mechanism.

Parallel Processing Letters, 2008

Service grids and desktop grids are both promoted by their supportive communities as great soluti... more Service grids and desktop grids are both promoted by their supportive communities as great solutions for solving the available compute power problem and helping to balance loads across network systems. Little work, however, has been undertaken to blend these two technologies together. In this paper we introduce a new EU project, that is building technological bridges to facilitate service and desktop grid interoperability. We provide a taxonomy and background into service grids, such as EGEE and desktop grids or volunteer computing platforms, such as BOINC and XtremWeb. We then describe our approach for identifying translation technologies between service and desktop grids. The individual themes discuss the actual bridging technologies employed and the distributed data issues surrounding deployment.