On Characterising and Identifying Mismatches in Scientific Workflows (original) (raw)
Related papers
A taxonomy for the analysis of scientific workflow faults
2010 13th IEEE International Conference on Computational Science and Engineering, 2010
Scientific workflows generally involve the distribution of tasks to distributed resources, which may exist in different administrative domains. The use of distributed resources in this way may lead to faults, and detecting them, identifying them and subsequently correcting them remains an important research challenge. We introduce a fault taxonomy for scientific workflows that may help in conducting a systematic analysis of faults, so that the potential faults that may arise at execution time can be corrected (recovered from). The presented taxonomy is motivated by previous work [4], but has a particular focus on workflow environments (compared to previous work which focused on Grid-based resource management) and demonstrated through its use in Weka4WS.
Failure analysis of distributed scientific workflows executing in the cloud
Conference on Network and Service Management, 2012
This work presents models characterizing failures observed during the execution of large scientific applications on Amazon EC2. Scientific workflows are used as the underlying abstraction for application representations. As scientific workflows scale to hundreds of thousands of distinct tasks, failures due to software and hardware faults become increasingly common. We study job failure models for data collected from 4 scientific applications, by our Stampede framework. In particular, we show that a Naive Bayes classifier can accurately predict the failure probability of jobs. The models allow us to predict job failures for a given execution resource and then use these failure predictions for two higher-level goals: (1) to suggest a better job assignment, and (2) to provide quantitative feedback to the workflow component developer about the robustness of their application codes.
Characterization of scientific workflows
2008
Researchers working on the planning, scheduling and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. We describe basic workflow structures that are composed into complex workflows by scientific communities. We provide a characterization of workflows from five diverse scientific applications, describing their composition and data and computational requirements. We also describe the effect of the size of the input datasets on the structure and execution profiles of these workflows. Finally, we describe a workflow generator that produces synthetic, parameterizable workflows that closely resemble the workflows that we characterize. We make these workflows available to the community to be used as benchmarks for evaluating various workflow systems and scheduling algorithms.
A science-gateway workload archive application to the self-healing of workflow incidents
… mésocentres et France …, 2012
Information about the execution of distributed workload is important for studies in computer science and engineering, but workloads acquired at the infrastructure-level reputably lack information about users and application-level middleware. Meanwhile, workloads acquired at science-gateway level contain detailed information about users, pilot jobs, task sub-steps, bag of tasks and workflow executions. In this work, we present a science-gateway archive, we illustrate its possibilities on a few case studies, and we use it for the autonomic handling of workflow incidents.
Examining the Challenges of Scientific Workflows
IEEE Computer, 2007
Workflows have recently emerged as a paradigm for representing and managing complex distributed scientific computations and therefore accelerate the pace of scientific progress. A recent workshop on the Challenges of Scientific Workflows, sponsored by the National Science Foundation and held on May 1-2, 2006, brought together domain scientists, computer scientists, and social scientists to discuss requirements of future scientific applications and the challenges that they present to current workflow technologies. This paper reports on the discussions and recommendations of the workshop, the full report can be found at
Workflows Community Summit: Bringing the Scientific Workflows Community Together
ArXiv, 2021
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, sti...