An Empirical Evaluation of Process Mining Algorithms based on Structural and Behavioral Similarities (original) (raw)

A framework for comparing process mining algorithms

2011 IEEE GCC Conference and Exhibition (GCC), 2011

There are many process mining algorithms with different theoretical foundations and aims, raising the question of how to choose the best for a particular situation.

Towards an evaluation framework for process mining algorithms

Reactivity of Solids, 2007

Although there has been a lot of progress in developing pro- cess mining algorithms in recent years, no eort has been put in devel- oping a common means of assessing the quality of the models discovered by these algorithms. In this paper, we outline elements of an evaluation framework that is intended to enable (a) process mining researchers to compare

Mining Reference Process Models and their Configurations

Reference process models are templates for common processes run by many corporations. However, the individual needs among organizations on the execution of these processes usually vary. A process model can address these variations through control-flow choices. Thus, it can integrate the different process variants into one model. Through configuration parameters, a configurable reference models enables corporations to derive their individual process variant from such an integrated model. While this simplifies the adaptation process for the reference model user, the construction of a configurable model integrating several process variants is far more complex than the creation of a traditional reference model depicting a single best-practice variant. In this paper we therefore recommend the use of process mining techniques on log files of existing, well-running IT systems to help the reference model provider in creating such integrated process models. Afterwards, the same log files are used to derive suggestions for common configurations that can serve as starting points for individual configurations.

Towards an Evaluation Framework for Process Mining Systems

Although there has been a lot of progress in developing process mining algorithms in recent years, no effort has been put in developing a common means of assessing the quality of the models discovered by these algorithms. In this paper, we outline elements of an evaluation framework that is intended to enable (a) process mining researchers to compare the performance of their algorithms, and (b) end users to evaluate the validity of their process mining results. Furthermore, we describe two possible approaches to evaluate a discovered model (i) using existing comparison metrics that have been developed by the process mining research community, and (ii) based on the so-called k-fold-cross validation known from the machine learning community. To illustrate the application of these two approaches, we compared a set of models discovered by different algorithms based on a simple example log.

Development of the Process Mining Discipline

It is exciting to see the spectacular developments in process mining since I started to work on this in the late 1990-ties. Many of the techniques we developed 15-20 years ago have become standard functionality in today's process mining tools. Therefore, it is good to view current and future developments in this historical context. This chapter starts with a brief summary of the history of process mining showing how ideas from academia got adopted in commercial tools. This provides the basis to talk about the expanding scope of process mining, both in terms of applications and in terms of functionalities supported. Despite the rapid development of the process mining discipline, there are still several challenges. Some of these challenges are new, but there are also several challenges that have been around for a while and still need to be addressed urgently. This requires the concerted action of process mining users, technology providers, and scientists. Adoption of traditional process mining techniques Process mining started in the late nineties when I had a sabbatical and was working for one year at the University of Colorado in Boulder (USA). Before, I was mostly focusing on concurrency theory, discrete event simulation, and workflow management. We had built our own simulation engines (e.g., ExSpect) and workflow management systems. Although our research was well-received and influential, I was disappointed by the average quality of process models and the impact process models had on reality. In both simulation studies and workflow implementations, the real processes often turned out to be very different from what was modeled by the people involved. As a result, workflow and simulation projects often failed. Therefore, I decided to focus on the analysis of processes through event data [1]. Around the turn of the century, we developed the first process discovery algorithms [2]. The Alpha algorithm was the first algorithm able to learn concurrent process models from event data and still provide formal guarantees. However, at the time, little event data were available and the assumptions made by the first algorithms were unrealistic. People working on data mining and machine learning were (and perhaps still are) not interested in process analysis. Therefore, it was not easy to convince other researchers to work on this. Nevertheless, for me, it was crystal clear that process mining would become a crucial ingredient of any process management or process improvement initiative. In the period that followed, I stopped working on the traditional business process management topics and fully focused on process mining. It is interesting to see that concepts such as conformance checking, organizational process mining, decision mining, token animation, time prediction, etc. were already developed and implemented 15 years ago [2]. These capabilities are still considered to be cutting-edge and not supported by most of the commercial process mining tools.

Automated Discovery of Process Models from Event Logs: Review and Benchmark

IEEE Transactions on Knowledge and Data Engineering, 2019

Process mining allows analysts to exploit logs of historical executions of business processes to extract insights regarding the actual performance of these processes. One of the most widely studied process mining operations is automated process discovery. An automated process discovery method takes as input an event log, and produces as output a business process model that captures the control-flow relations between tasks that are observed in or implied by the event log. Various automated process discovery methods have been proposed in the past two decades, striking different tradeoffs between scalability, accuracy and complexity of the resulting models. However, these methods have been evaluated in an ad-hoc manner, employing different datasets, experimental setups, evaluation measures and baselines, often leading to incomparable conclusions and sometimes unreproducible results due to the use of closed datasets. This article provides a systematic review and comparative evaluation of automated process discovery methods, using an open-source benchmark and covering twelve publicly-available real-life event logs, twelve proprietary real-life event logs, and nine quality metrics. The results highlight gaps and unexplored tradeoffs in the field, including the lack of scalability of some methods and a strong divergence in their performance with respect to the different quality metrics used.

An Integrated Framework for Process Discovery Algorithm Evaluation

Process mining offers techniques to exploit event data by providing insights and recommendations to improve business processes. The growing amount of algorithms for process discovery has raised the question of which algorithms perform best on a given event log. Current evaluation frameworks for empirically evaluating discovery techniques depend on the notation used (behavioral identical models may give different results) and cannot provide more general statements about populations of models. Therefore, this paper proposes a new integrated evaluation framework that uses a classification approach to make it modeling notation independent. Furthermore, it is founded on experimental design to ensure the generalization of results. It supports two main evaluation objectives: benchmarking process discovery algorithms and sensitivity analysis, i.e. studying the effect of model and log characteristics on a discovery algorithm's accuracy. The framework is designed as a scientific workflow which enables automated, extendable and shareable evaluation experiments. An extensive experiment including four discovery algorithms and six control-flow characteristics validates the relevance and flexibility of the framework. Ultimately, the paper aims to advance the state-of-the-art for evaluating process discovery techniques.

Process Mining Functional and Structural Validation

Current study proposes solutions for functional and structural validation of business process models extracted after mining the event log dataset with several process mining algorithms. Structural validation (verification) assesses the quality of the business processes by using conformance analysis techniques and computed statistical results. Cross validation for structural validation is also presented as a methodology used for evaluating business processes. Furthermore we propose extending verification of process models with functional validation with the scope of aligning business processes with business objectives. Functional validation starts with process requirement definition, split of process requirements on clear use cases and generating event log data capturing the use case functionality. Functional validation is applied on real event log data generated during one software release in automotive industry, tools development area. Structural and functional validation technique...

A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs

Information Systems, 2012

Process mining is the research domain that is dedicated to the a posteriori analysis of business process executions. The techniques developed within this research area are specifically designed to provide profound insight by exploiting the untapped reservoir of knowledge that resides within event logs of information systems. Process discovery is one specific subdomain of process mining that entails the discovery of control-flow models from such event logs. Assessing the quality of discovered process models is an essential element, both for conducting process mining research as well as for the use of process mining in practice. In this paper, a multi-dimensional quality assessment is presented in order to comprehensively evaluate process discovery techniques. In contrast to previous studies, the major contribution of this paper is the use of eight real-life event logs. For instance, we show that evaluation based on real-life event logs significantly differs from the traditional approach to assess process discovery techniques using artificial event logs. In addition, we provide an extensive overview of available process discovery techniques and we describe how discovered process models can be assessed regarding both accuracy and comprehensibility. The results of our study indicate that the HeuristicsMiner algorithm is especially suited in a real-life setting.

A Comprehensive Process Similarity Measure Based on Models and Logs

IEEE Access, 2019

Process similarity measure plays an important role in business process management and is usually considered as a versatile solution to fulfill the effective utilization of process models. Although many studies have worked on different notions of process similarity, most of them are not precise enough, as they simply compare processes with respect to the model structure features or the model behavior features separately. To address the problem, in this paper, we propose to measure the business process similarity by considering both process models and process logs. The process models are pre-defined descriptions of business processes, and the process logs can be considered as an objective observation of the actual process execution behavior. The combination of both can help to better character business processes. More specifically, two effective frameworks together with four novel approaches are presented. The former first constructs a weighted business process graph (WBPG) from the process model and the process log, and then computes the similarity of two corresponding WBPGs by using the weighted graph edit distance measure and the weighted node adjacent relation similarity measure. The latter first measures the similarity of process logs and the similarity of process models separately, and then merges the results. Finally, by experimental evaluation, we demonstrate the effectiveness and the applicability of the proposed approaches by comparing them with the start of the art.