Using Meta-learning to Recommend Process Discovery Methods (original) (raw)

Recommendation of Process Discovery Algorithms Through Event Log Classification

Lecture Notes in Computer Science, 2015

Process mining is concerned with the extraction of knowledge about business processes from information system logs. Process discovery algorithms are process mining techniques focused on discovering process models starting from event logs. The applicability and effectiveness of process discovery algorithms rely on features of event logs and process characteristics. Selecting a suitable algorithm for an event log is a tough task due to the variety of variables involved in this process. The traditional approaches use empirical assessment in order to recommend a suitable discovery algorithm. This is a time consuming and computationally expensive approach. The present paper evaluates the usefulness of an approach based on classification to recommend discovery algorithms. A knowledge base was constructed, based on features of event logs and process characteristics, in order to train the classifiers. Experimental results obtained with the classifiers evidence the usefulness of the proposal for recommendation of discovery algorithms.

A Recommender System for Process Discovery

Lecture Notes in Computer Science, 2014

Over the last decade, several algorithms for process discovery and process conformance have been proposed. Still, it is well-accepted that there is no dominant algorithm in any of these two disciplines, and then it is often difficult to apply them successfully. Most of these algorithms need a close-to expert knowledge in order to be applied satisfactorily. In this paper, we present a recommender system that uses portfolio-based algorithm selection strategies to face the following problems: to find the best discovery algorithm for the data at hand, and to allow bridging the gap between general users and process mining algorithms. Experiments performed with the developed tool witness the usefulness of the approach for a variety of instances.

A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs

Information Systems, 2012

Process mining is the research domain that is dedicated to the a posteriori analysis of business process executions. The techniques developed within this research area are specifically designed to provide profound insight by exploiting the untapped reservoir of knowledge that resides within event logs of information systems. Process discovery is one specific subdomain of process mining that entails the discovery of control-flow models from such event logs. Assessing the quality of discovered process models is an essential element, both for conducting process mining research as well as for the use of process mining in practice. In this paper, a multi-dimensional quality assessment is presented in order to comprehensively evaluate process discovery techniques. In contrast to previous studies, the major contribution of this paper is the use of eight real-life event logs. For instance, we show that evaluation based on real-life event logs significantly differs from the traditional approach to assess process discovery techniques using artificial event logs. In addition, we provide an extensive overview of available process discovery techniques and we describe how discovered process models can be assessed regarding both accuracy and comprehensibility. The results of our study indicate that the HeuristicsMiner algorithm is especially suited in a real-life setting.

An Integrated Framework for Process Discovery Algorithm Evaluation

Process mining offers techniques to exploit event data by providing insights and recommendations to improve business processes. The growing amount of algorithms for process discovery has raised the question of which algorithms perform best on a given event log. Current evaluation frameworks for empirically evaluating discovery techniques depend on the notation used (behavioral identical models may give different results) and cannot provide more general statements about populations of models. Therefore, this paper proposes a new integrated evaluation framework that uses a classification approach to make it modeling notation independent. Furthermore, it is founded on experimental design to ensure the generalization of results. It supports two main evaluation objectives: benchmarking process discovery algorithms and sensitivity analysis, i.e. studying the effect of model and log characteristics on a discovery algorithm's accuracy. The framework is designed as a scientific workflow which enables automated, extendable and shareable evaluation experiments. An extensive experiment including four discovery algorithms and six control-flow characteristics validates the relevance and flexibility of the framework. Ultimately, the paper aims to advance the state-of-the-art for evaluating process discovery techniques.

Automated Discovery of Process Models from Event Logs: Review and Benchmark

IEEE Transactions on Knowledge and Data Engineering, 2019

Process mining allows analysts to exploit logs of historical executions of business processes to extract insights regarding the actual performance of these processes. One of the most widely studied process mining operations is automated process discovery. An automated process discovery method takes as input an event log, and produces as output a business process model that captures the control-flow relations between tasks that are observed in or implied by the event log. Various automated process discovery methods have been proposed in the past two decades, striking different tradeoffs between scalability, accuracy and complexity of the resulting models. However, these methods have been evaluated in an ad-hoc manner, employing different datasets, experimental setups, evaluation measures and baselines, often leading to incomparable conclusions and sometimes unreproducible results due to the use of closed datasets. This article provides a systematic review and comparative evaluation of automated process discovery methods, using an open-source benchmark and covering twelve publicly-available real-life event logs, twelve proprietary real-life event logs, and nine quality metrics. The results highlight gaps and unexplored tradeoffs in the field, including the lack of scalability of some methods and a strong divergence in their performance with respect to the different quality metrics used.

Towards an Evaluation Framework for Process Mining Systems

Although there has been a lot of progress in developing process mining algorithms in recent years, no effort has been put in developing a common means of assessing the quality of the models discovered by these algorithms. In this paper, we outline elements of an evaluation framework that is intended to enable (a) process mining researchers to compare the performance of their algorithms, and (b) end users to evaluate the validity of their process mining results. Furthermore, we describe two possible approaches to evaluate a discovered model (i) using existing comparison metrics that have been developed by the process mining research community, and (ii) based on the so-called k-fold-cross validation known from the machine learning community. To illustrate the application of these two approaches, we compared a set of models discovered by different algorithms based on a simple example log.

Feature Recommendation for Structural Equation Model Discovery in Process Mining

Process mining techniques can help organizations to improve the operational processes. Organizations can benefit from process mining techniques in finding and amending the root causes of performance or compliance problems. Considering the volume of the data and the number of features captured by the information system of today's companies, the task of discovering the set of features that should be considered in root cause analysis can be quite involving. In this paper, we propose a method for finding the set of (aggregated) features with a possible effect on the problem. The root cause analysis task is usually done by applying a machine learning technique to the data gathered from the information system supporting the processes. To prevent mixing up correlation and causation, which may happen because of interpreting the findings of machine learning techniques as causal, we propose a method for discovering the structural equation model of the process that can be used for root cause analysis. We have implemented the proposed method as a plugin in ProM and we have evaluated it using two real and synthetic event logs. These experiments show the validity and effectiveness of the proposed methods.

All That Glitters Is Not Gold: Four Maturity Stages of Process Discovery Algorithms

Information Systems, 2023

A process discovery algorithm aims to construct a process model that represents the real-world process stored in event data well; it is precise, generalizes the data correctly, and is simple. At the same time, it is reasonable to expect that better-quality input event data should lead to constructed process models of better quality. However, existing process discovery algorithms omit the discussion of this relationship between the inputs and outputs and, as it turns out, often do not guarantee it. We demonstrate the latter claim using several quality measures for event data and discovered process models. Consequently, this paper requests for more rigor in the design of process discovery algorithms, including properties that relate the qualities of the inputs and outputs of these algorithms. We present four incremental maturity stages for process discovery algorithms, along with concrete guidelines for formulating relevant properties and experimental validation. We then use these stages to review several state of the art process discovery algorithms to confirm the need to reflect on how we perform algorithmic process discovery.

A novel approach to process mining: Intentional process models discovery

2014 IEEE Eighth International Conference on Research Challenges in Information Science (RCIS), 2014

So far, process mining techniques have suggested to model processes in terms of tasks that occur during the enactment of a process. However, research on method engineering and guidance has illustrated that many issues, such as lack of flexibility or adaptation, are solved more effectively when intentions are explicitly specified. This paper presents a novel approach of process mining, called Map Miner Method (MMM). This method is designed to automate the construction of intentional process models from process logs. MMM uses Hidden Markov Models to model the relationship between users' activities logs and the strategies to fulfill their intentions. The method also includes two specific algorithms developed to infer users' intentions and construct intentional process model (Map) respectively. MMM can construct Map process models with different levels of abstraction (fine-grained and coarse-grained process models) with respect to the Map metamodel formalism (i.e., metamodel that specifies intentions and strategies of process actors). This paper presents all steps toward the construction of Map process models topology. The entire method is applied on a large-scale case study (Eclipse UDC) to mine the associated intentional process. The likelihood of the obtained process model shows a satisfying efficiency for the proposed method.

Increasing Efficiency of Process Discovery Algorithms and Process model Discovery from Unlabeled Event Logs: A Review

2020

Business processes leave behind trails of their execution histories and present-day information systems record these trails in event logs. Process mining helps analysts to have a better insight into these processes by exploiting these event logs. Out of several process mining operations, process discovery is the prominent and most widely researched topic. A process discovery method produces a business process model by correlating events available in an event log. Numerous process discovery techniques have been presented in recent years focusing on issues like the complexity of the generated model, accuracy, and scalability. However, most of these methods incorporate algorithms that are computationally complex and rely on case identifiers to establish the correlation between the events available in an event log to produce a process model. Hence, it becomes important to explore the availability of methods that (i) increase the execution efficiency of the computationally complex proces...