Toward an integrated knowledge discovery and data mining process model | The Knowledge Engineering Review | Cambridge Core (original) (raw)

Abstract

The knowledge discovery and data mining (KDDM) process models describe the various phases (e.g. business understanding, data understanding, data preparation, modeling, evaluation and deployment) of the KDDM process. They act as a roadmap for implementation of the KDDM process by presenting a list of tasks for executing the various phases. The checklist approach of describing the tasks is not adequately supported by appropriate tools, which specify ‘how’ the particular task can be implemented. This may result in tasks not being implemented. Another disadvantage is that the long checklist does not capture or leverage the dependencies that exist among the various tasks of the same and different phases. This not only makes the process cumbersome to implement, but also hinders possibilities for semi-automation of certain tasks. Given that each task in the process model serves an important goal and even affects the execution of related tasks due to the dependencies, these limitations are likely to negatively affect the efficiency and effectiveness of KDDM projects. This paper proposes an improved KDDM process model that overcomes these shortcomings by prescribing tools for supporting each task as well as identifying and leveraging dependencies among tasks for semi-automation of tasks, wherever possible.

References

Anand, S., Buchner, A. 1998. Decision Support Using Data Mining. London: Financial Times Pitman Publishers.Google Scholar

Basili, V. R., Weiss, D. M. 1984. A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering 10(6), 728–738.CrossRefGoogle Scholar

Bernstein, A., Provost, F. & Hill, S. 2005. Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering 17(4), 503–518.CrossRefGoogle Scholar

Berry, M., Linoff, G. 1997. Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley and Sons.Google Scholar

Berry, M., Linoff, G. 2000. Mastering Data Mining: The Art and Relationship of Customer Relationship Management. John Wiley and Sons.Google Scholar

Cabena, P., Hadjinian, P., Stadler, R., Verhees, J.Zanasi, A. 1998. Discovering Data Mining: From Concepts to Implementation. Prentice Hall.Google Scholar

Charest, M., Delisle, S., Cervantes, O.Shen, Y. 2006. Intelligent data mining assistance via CBR and ontologies. In Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA’06).Google Scholar

Choi, D. H., Ahn, B. S., Kim, S. H. 2005. Ranking discovered rules from data mining with multiple criteria by data envelopment analysis. Expert Systems with Applications 29(4), 867–878.CrossRefGoogle Scholar

Cios, K., Kurgan, L. 2005. Trends in data mining and knowledge discovery. In Advanced Techniques in Knowledge Discovery and Data Mining. Pal, N. & Jain, L. (eds). Springer, 1–26.Google Scholar

Cios, K., Teresinska, A., Konieczna, J. & Sharma, S. 2000. Diagnosing myocardial perfusion from PECT bull’s-eye maps—a knowledge discovery approach. IEEE Engineering in Medicine and Biology Magazine, Special Issue on Medical Data Mining and Knowledge Discovery 19(4), 17–25.CrossRefGoogle Scholar

Davenport, T. H., Harris, J. G. 2007. Competing on Analytics. Harvard Business School Press.Google Scholar

Doran, G. T. 1981. There’s a S.M.A.R.T. way to write management goals and objectives. Management Review (AMA Forum), 35–36.Google Scholar

Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P. & Uthuruswamy, R. (eds). 1996a. Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press.Google Scholar

Fox, M. S., Barbuceanu, M. & Gruninger, M. 1998. An organization ontology for enterprise modeling. Simulating Organizations: Computational Models of Institutions and Groups. AAAI/MIT Press, 131–152.Google Scholar

Han, J., Kamber, M. 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann.Google Scholar

Keeney, R. 1996. Value focussed thinking: a path to creative decision-making, Harvard University Press.CrossRefGoogle Scholar

Kurgan, L. A., Musilek, P. 2006. A survey of knowledge discovery and data mining process models. The Knowledge Engineering Review 21(1), 1–24.CrossRefGoogle Scholar

Laguna, M. A., Marqués, J. M. & Garcia, F. 2001. A user requirements elicitation tool. ACM SIGSOFT Software Engineering Notes Archive 26(2), 35–37.CrossRefGoogle Scholar

Osei-Bryson, K.-M. 2004. Evaluation of decision trees. Computers and Operations Research 31, 1933–1945.CrossRefGoogle Scholar

Osei-Bryson, K.-M. 2006. Class Notes: Clustering Info 614: Graduate Course in Data Mining Virginia Commonwealth University.Google Scholar

Pyle, D. 2003. Business Modeling and Data Mining. Morgan Kaufmann Publishers.Google Scholar

Redpath, R., Srinivasan, B. 2003. Criteria for a comparative study of visualization techniques in data mining. IEEE 3rd International Conference On Intelligent Systems Design and Application, Tulsa, USA. Springer-Verlag.Google Scholar

Saaty, T. L. 1991. Response to Holder’s comments on the analytic hierarchy process. The Journal of the Operational Research Society 42(10), 909–914.CrossRefGoogle Scholar

Sharma, S., Osei-Bryson, K.-M. 2008a. Organization-Ontology Based Framework for Executing the Business Understanding Phase of Data Mining Projects. Hawaii International Conference on Systems Sciences.Google Scholar

Sharma, S., Osei-Bryson, K.-M. 2008b. Framework for formal implementation of the business understanding Phase of data mining projects. Expert Systems with Applications 36(2), 4114–4124.CrossRefGoogle Scholar

Simon, H. A. 1996. The Sciences of the Artificial. MIT Press.Google Scholar

Simoudis, E., Livezey, B. & Kerber, R. 1996. Integrating inductive and deductive reasoning for data mining. In Advances in Knowledge Discovery and Data Mining. Fayyad, U., Paitetsky-Shapiro, G., Smyth, P. & Uthurusamy, R. (eds). AAAI Press/MIT Press.Google Scholar