A Data-Driven Knowledge Acquisition System: An End-to-End Knowledge Engineering Process for Generating Production Rules (original) (raw)
Data-driven knowledge acquisition is one of the key research fields in data mining. Dealing with large amounts of data has received a lot of attention in the field recently, and a number of methodologies have been proposed to extract insights from data in an automated or semi-automated manner. However, these methodologies generally target a specific aspect of the data mining process, such as data acquisition, data preprocessing, or data classification. However, a comprehensive knowledge acquisition method is crucial to support the end-to-end knowledge engineering process. In this paper, we introduce a knowledge acquisition system that covers all major phases of the cross-industry standard process for data mining. Acknowledging the importance of an end-to-end knowledge engineering process, we designed and developed an easy-to-use data-driven knowledge acquisition tool (DDKAT). The major features of the DDKAT are: (1) a novel unified features scoring approach for data selection; (2) a user-friendly data processing interface to improve the quality of the raw data; (3) an appropriate decision tree algorithm selection approach to build a classification model; and (4) the generation of production rules from various decision tree classification models in an automated manner. Furthermore, two diabetes studies were performed to assess the value of the DDKAT in terms of user experience. A total of 19 experts were involved in the first study and 102 students in the artificial intelligence domain were involved in the second study. The results showed that the overall user experience of the DDKAT was positive in terms of its attractiveness, as well as its pragmatic and hedonic quality factors. INDEX TERMS Knowledge engineering, data mining, features ranking, algorithm selection, decision tree, production rule, user experience. I. INTRODUCTION Knowledge systems have come a long way, from manual knowledge curation to automatic data-driven knowledge generation. The major drivers of this transition were the size and complexity of data. Since large datasets cannot be efficiently analyzed manually, the automation process is essential [2]. Initially in this process of knowledge automation, knowledge engineers followed ad-hoc procedures [3]. Later on, more systematic methodologies were devised, which can be referred to as data-driven knowledge acquisition systems. Knowledge extraction from structured sources such as databases is an active area of research in the information