Data-Driven Constructive Induction (original) (raw)

The presented methodology concerns constructive induction, viewed generally as a process combining two intertwined searches: first for the "best" representation space, and second for the "best" hypothesis in that space. The first search employs a range of operators for improving the initial representation space, such as operators for generating new attributes, selecting best attributes among the given ones, and for abstracting attributes. In the methodology presented, these operators are chosen on the basis of the analysis of training data, hence the term data-driven. The second search employs an AQtype rule learning to the examples projected at each iteration to the newly modified representation space. The aim of the search is to determine a generalized description of examples that optimizes a task-oriented multicriterion evaluation function. The two searches are intertwined, as they are executed in a loop in which one feeds into another. Experimental applications of the methodology to text categorization and natural scene interpretation demonstrate a significant practical utility of the proposed methodology.