Bridging logic and kernel machines (original) (raw)

Multitask kernel-based learning with logic constraints

2010

This paper presents a general framework to integrate prior knowledge in the form of logic constraints among a set of task functions into kernel machines. The logic propositions provide a partial representation of the environment, in which the learner operates, that is exploited by the learning algorithm together with the information available in the supervised examples. In particular, we consider a multi-task learning scheme, where multiple unary predicates on the feature space are to be learned by kernel machines and a higher level abstract representation consists of logic clauses on these predicates, known to hold for any input. A general approach is presented to convert the logic clauses into a continuous implementation, that processes the outputs computed by the kernel-based predicates. The learning task is formulated as a primal optimization problem of a loss function that combines a term measuring the fitting of the supervised examples, a regularization term, and a penalty term that enforces the constraints on both supervised and unsupervised examples. The proposed semi-supervised learning framework is particularly suited for learning in high dimensionality feature spaces, where the supervised training examples tend to be sparse and generalization difficult. Unlike for standard kernel machines, the cost function to optimize is not generally guaranteed to be convex. However, the experimental results show that it is still possible to find good solutions using a two stage learning schema, in which first the supervised examples are learned until convergence and then the logic constraints are forced. Some promising experimental results on artificial multi-task learning tasks are reported, showing how the classification accuracy can be effectively improved by exploiting the a priori rules and the unsupervised examples.

Multitask kernel-based learning with first-order logic constraints

2010

In this paper we propose a general framework to integrate supervised and unsupervised examples with background knowledge expressed by a collection of first-order logic clauses into kernel machines. In particular, we consider a multi-task learning scheme where multiple predicates defined on a set of objects are to be jointly learned from examples, enforcing a set of FOL constraints on the admissible configurations of their values. The predicates are defined on the feature spaces, in which the input objects are represented, and can be either known a priori or approximated by an appropriate kernel-based learner. A general approach is presented to convert the FOL clauses into a continuous implementation that can deal with the outputs computed by the kernel-based predicates. The learning problem is formulated as a semi-supervised task that requires the optimization in the primal of a loss function that combines a fitting loss measure on the supervised examples, a regularization term, and a penalty term that enforces the constraints on both the supervised and unsupervised examples. Unfortunately, the penalty term is not convex and it can hinder the optimization process. However, it is possible to avoid poor solutions by using a two stage learning schema, in which the supervised examples are learned first and then the constraints are enforced.

Learning with kernels in description logics

2008

We tackle the problem of statistical learning in the standard knowledge base representations for the Semantic Web which are ultimately expressed in description Logics. Specifically, in our method a kernel functions for the ALCN\ mathcal {ALCN} logic integrates with a support vector machine which enables the usage of statistical learning with reference representations. Experiments where performed in which kernel classification is applied to the tasks of resource retrieval and query answering on OWL ontologies.

LivingKnowledge: Kernel Methods for Relational Learning and Semantic Modeling

Lecture Notes in Computer Science, 2010

Latest results of statistical learning theory have provided techniques such us pattern analysis and relational learning, which help in modeling system behavior, e.g. the semantics expressed in text, images, speech for information search applications (e.g. as carried out by Google, Yahoo,..) or the semantics encoded in DNA sequences studied in Bioinformatics. These represent distinguished cases of successful use of statistical machine learning. The reason of this success relies on the ability of the latter to overcome the critical limitations of logic/rule-based approaches to semantic modeling: although, from a knowledge engineer perspective, hand-crafted rules are natural methods to encode system semantics, noise, ambiguity and errors, affecting dynamic systems, prevent them from being effective.

Kernel learning at the first level of inference

Neural Networks, 2014

Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e. parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense.

Structure and semantics for expressive text kernels

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM '07, 2007

Several problems in text categorization are too hard to be solved by standard bag-of-words representations. Work in kernel-based learning has approached this problem by (i) considering information about the syntactic structure of the input or by (ii) incorporating knowledge about the semantic similarity of term features. In this paper, we propose a generalized framework consisting of a family of kernels that jointly incorporates syntax and semantics. We show that both components can be flexibly adapted and tuned towards the particular application domain. We demonstrate the power of this approach in a series of experiments on two diverse datasets, each of which presents a non-standard text categorization problem: one for the classification of natural language questions from a TREC question answering dataset and the other for the automated assignment of ICT-9 categories to short textual fragments of medical diagnoses.

Learning multiple tasks with kernel methods

Journal of Machine Learning Research, 2006

We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard single-task kernel methods, such as support vector machines and regularization networks, are extended to the case of multi-task learning. Our analysis shows that the problem of estimating many task functions with regularization can be cast as a single task learning problem if a family of multi-task kernel functions we define is used. These kernels model relations among the tasks and are ...

Kernel-Based Machines for Abstract and Easy Modeling of Automatic Learning

Lecture Notes in Computer Science, 2011

The modeling of system semantics (in several ICT domains) by means of pattern analysis or relational learning is a product of latest results in statistical learning theory. For example, the modeling of natural language semantics expressed by text, images, speech in information search (e.g. Google, Yahoo,..) or DNA sequence labeling in Bioinformatics represent distinguished cases of successful use of statistical machine learning. The reason of this success is due to the ability to overcome the concrete limitations of logic/rule-based approaches to semantic modeling: although, from a knowledge engineer perspective, rules are natural methods to encode system semantics, noise, ambiguity and errors affecting dynamic systems, prevent such approached from being effective, e.g. they are not flexible enough.

Relational Kernel Machines for Learning from Graph-Structured RDF Data

2011

Despite the increased awareness that exploiting the large amount of semantic data requires statistics-based inference capabilities, only little work can be found on this direction in the Semantic Web research. On semantic data, supervised approaches, particularly kernel-based Support Vector Machines (SVM), are promising. However, obtaining the right features to be used in kernels is an open problem because the amount of features that can be extracted from the complex structure of semantic data might be very large. Further, combining several kernels can help to deal with efficiency and data sparsity but creates the additional challenge of identifying and joining different subsets of features or kernels, respectively. In this work, we solve these two problems by employing the strategy of dynamic feature construction to compute a hypothesis, representing the relevant features for a set of examples. Then, a composite kernel is obtained from a set of clause kernels derived from components of the hypothesis. The learning of the hypothesis and kernel(s) is performed in an interleaving fashion. Based on experiments on real-world datasets, we show that the resulting relational kernel machine improves the SVM baseline.

Learning with Semantic Kernels for Clausal Knowledge Bases

Lecture Notes in Computer Science, 2011

Many applicative domains require complex multi-relational representations. We propose a family of kernels for relational representations to produce statistical classifiers that can be effectively employed in a variety of such tasks. The kernel functions are defined over the set of objects in a knowledge base parameterized on a notion of context, represented by a committee of concepts expressed through logic clauses. A preliminary feature construction phase based on genetic programming allows for the selection of optimized contexts. An experimental session on the task of similarity search proves the practical effectiveness of the method.