Pasquale Minervini | Università degli Studi di Bari (original) (raw)

Uploads

Papers by Pasquale Minervini

Research paper thumbnail of Towards Numeric Prediction on OWL Knowledge Bases through Terminological Regression Trees

ABSTRACT In the context of semantic knowledge bases, among the possible problems that may be tack... more ABSTRACT In the context of semantic knowledge bases, among the possible problems that may be tackled by means of data-driven inductive strategies, one can consider those that require the prediction of the unknown values of existing numeric features or the definition of new features to be derived from the data model. These problems can be cast as regression problems so that suitable solutions can be devised based on those found for multi-relational databases. In this paper, a new framework for the induction of logical regression trees is presented. Differently from the classic logical regression trees and the recent fork of the terminological classification trees, the novel terminological regression trees aim at predicting continuous values, while tests at the tree nodes are expressed with Description Logic concepts. They are intended for multiple uses with knowledge bases expressed in the standard ontology languages for the Semantic Web. A top-down method for growing such trees is proposed as well as algorithms for making predictions with the trees and deriving rules. The system that implements these methods is experimentally evaluated on ontologies selected from popular repositories.

Research paper thumbnail of TPGR

Research paper thumbnail of Learning Probabilistic Description Logic Concepts Under Alternative Assumptions on Incompleteness

Lecture Notes in Computer Science, 2014

Real-world knowledge often involves various degrees of uncertainty. For such a reason, in the Sem... more Real-world knowledge often involves various degrees of uncertainty. For such a reason, in the Semantic Web context, difficulties arise when modeling real-world domains using only purely logical formalisms. Alternative approaches almost always assume the availability of probabilistically-enriched knowledge, while this is hardly known in advance. In addition, purely deductive exact inference may be infeasible for Web-scale ontological knowledge bases, and does not exploit statistical regularities in data. Approximate deductive and inductive inferences were proposed to alleviate such problems. This article proposes casting the concept-membership prediction problem (predicting whether an individual in a Description Logic knowledge base is a member of a concept) as estimating a conditional probability distribution which models the posterior probability of the aforementioned individual's concept-membership given the knowledge that can be entailed from the knowledge base regarding the individual. Specifically, we model such posterior probability distribution as a generative, discriminatively structured, Bayesian network, using the individual's concept-membership w.r.t. a set of feature concepts standing for the available knowledge about such individual.

Research paper thumbnail of Graph-Based Regularization for Transductive Class-Membership Prediction

Lecture Notes in Computer Science, 2014

Considering the increasing availability of structured machine processable knowledge in the contex... more Considering the increasing availability of structured machine processable knowledge in the context of the Semantic Web, only relying on purely deductive inference may be limiting. This work proposes a new method for similarity-based class-membership prediction in Description Logic knowledge bases. The underlying idea is based on the concept of propagating class-membership information among similar individuals; it is non-parametric in nature and characterized by interesting complexity properties, making it a potential candidate for large-scale transductive inference. We also evaluate its effectiveness with respect to other approaches based on inductive inference in SW literature.

Research paper thumbnail of A Graph Regularization Approach to Transductive Class-Membership Learning

Research paper thumbnail of Learning Terminological Bayesian Classifiers - A Comparison of Alternative Approaches to Dealing with Unknown Concept-Memberships

Knowledge available through Semantic Web representation formalisms can be missing, i.e. it is not... more Knowledge available through Semantic Web representation formalisms can be missing, i.e. it is not always possible to infer the truth value of an assertion (due to the Open World Assumption). We propose a method for incrementally inducing terminological (tree-augmented) naïve Bayesian classifiers, which aim at estimating the probability that an individual belongs to a target concept given its membership to a learned set of Description Logic concepts. We then evaluate the impact of employing different methods of handling assertions whose truth value is unknown, each consistent with a different assumption on the ignorance model.

Research paper thumbnail of Rank prediction for semantically annotated resources

Proceedings of the 28th Annual ACM Symposium on Applied Computing - SAC '13, 2013

ABSTRACT In the context of semantic knowledge bases, we tackle the problem of ranking resources w... more ABSTRACT In the context of semantic knowledge bases, we tackle the problem of ranking resources w.r.t. some criterion. The proposed solution is a method for learning functions that can approximately predict the correct ranking. Differently from other related methods proposed, that assume the ranking criteria to be explicitly expressed (e.g. as a query or a function), our approach is data-driven, being able to produce a predictor detecting the implicit underlying criteria from assertions regarding the resources in the knowledge base. The usage of specific kernel functions encoding the similarity between individuals in the context of knowledge bases allows the application of the method to ontologies in the standard representations for the Semantic Web. The method is based on a kernelized version of the Perceptron Ranking algorithm which is suitable for batch but also online problem settings. Moreover, differently from other approaches based on regression, the method takes advantage from the underlying ordering on the ranking labels. The reported empirical evaluation proves the effectiveness of the method at the task of predicting the rankings of single users in the Linked User Feedback dataset, by integrating knowledge from the Linked Open Data cloud during the learning process.

Research paper thumbnail of Learning Terminological Bayesian Classifiers

Abstract. Knowledge available through Semantic Web representation formalisms can be missing, ie i... more Abstract. Knowledge available through Semantic Web representation formalisms can be missing, ie it is not always possible to infer the truth value of an assertion (due to the Open World Assumption). We propose a method for incrementally inducing terminological (tree-augmented) naıve Bayesian classifiers, which aim at estimating the probability that an individual belongs to a target concept given its membership to a learned set of Description Logic concepts. We then evaluate the impact of employing different methods of handling ...

Research paper thumbnail of A Graph Regularization Based Approach to Transductive Class-Membership Prediction

8th International Workshop on Uncertainty Reasoning for the Semantic Web, Nov 1, 2012

Abstract. Considering the increasing availability of structured machine processable knowledge in ... more Abstract. Considering the increasing availability of structured machine processable knowledge in the context of the Semantic Web, only relying on purely deductive inference may be limiting. This work proposes a new method for similaritybased class-membership prediction in Description Logic knowledge bases. The underlying idea is based on the concept of propagating class-membership information among similar individuals; it is non-parametric in nature and characterised by interesting complexity properties, making it a ...

Research paper thumbnail of A Gaussian Process Model for Knowledge Propagation in Web Ontologies

2014 IEEE International Conference on Data Mining, 2014

We consider the problem of predicting missing class-memberships and property values of individual... more We consider the problem of predicting missing class-memberships and property values of individual resources in Web ontologies. We first identify which relations tend to link similar individuals by means of a finite-set Gaussian Process regression model, and then efficiently propagate knowledge about individuals across their relations. Our experimental evaluation demonstrates the effectiveness of the proposed method.

Research paper thumbnail of Learning Terminological Naıve Bayesian Classifiers

Knowledge available through Semantic Web standards can easily be missing, generally because of th... more Knowledge available through Semantic Web standards can easily be missing, generally because of the adoption of the Open World Assumption (i.e. the truth value of an assertion is not necessarily known). However, the rich relational structure that characterizes ontologies can be exploited for handling such missing knowledge in an explicit way. We present a Statistical Relational Learning system designed for learning terminological naïve Bayesian classifiers, which estimate the probability that a generic individual belongs to the target concept given its membership to a set of Description Logic concepts. During the learning process, we consistently handle the lack of knowledge that may be introduced by the adoption of the Open World Assumption, depending on the varying nature of the missing knowledge itself.

Research paper thumbnail of Can Real-Time Machine Translation Overcome Language Barriers in Distributed Requirements Engineering?

… Engineering (ICGSE), 2010 …, Jan 1, 2010

In global software projects work takes place over long distances, meaning that communication will... more In global software projects work takes place over long distances, meaning that communication will often involve distant cultures with different languages and communication styles that, in turn, exacerbate communication problems. However, being aware of cultural distance is not sufficient to overcome many of the barriers that language differences bring in the way of global project success. In this paper, we investigate the adoption of machine translation (MT) services in synchronous text-based chat in order to overcome any language barrier existing among groups of stakeholders who are remotely negotiating software requirements. We report our findings from a simulated study that compares the efficiency and the effectiveness of two MT services, Google Translate and apertium-service, in translating the messages exchanged during four distributed requirements engineering workshops. The results show that (a) Google Translate produces significantly more adequate translations than Apertium from English to Italian; (b) both services can be used in text-based chat without disrupting real-time interaction.

Research paper thumbnail of Apertium goes SOA: an efficient and scalable service based on the Apertium rule-based machine translation platform

Service Oriented Architecture (SOA) is a paradigm for organising and using distributed services t... more Service Oriented Architecture (SOA) is a paradigm for organising and using distributed services that may be under the control of different ownership domains and implemented using various technology stacks. In some contexts, an organisation using an IT infrastructure implementing the SOA paradigm can take a great benefit from the integration, in its business processes, of efficient machine translation (MT) services to overcome language barriers. This paper describes the architecture and the design patterns used to develop an MT service that is efficient, scalable and easy to integrate in new and existing business processes. The service is based on Apertium, a free/opensource rule-based machine translation platform.

Research paper thumbnail of Towards Numeric Prediction on OWL Knowledge Bases through Terminological Regression Trees

ABSTRACT In the context of semantic knowledge bases, among the possible problems that may be tack... more ABSTRACT In the context of semantic knowledge bases, among the possible problems that may be tackled by means of data-driven inductive strategies, one can consider those that require the prediction of the unknown values of existing numeric features or the definition of new features to be derived from the data model. These problems can be cast as regression problems so that suitable solutions can be devised based on those found for multi-relational databases. In this paper, a new framework for the induction of logical regression trees is presented. Differently from the classic logical regression trees and the recent fork of the terminological classification trees, the novel terminological regression trees aim at predicting continuous values, while tests at the tree nodes are expressed with Description Logic concepts. They are intended for multiple uses with knowledge bases expressed in the standard ontology languages for the Semantic Web. A top-down method for growing such trees is proposed as well as algorithms for making predictions with the trees and deriving rules. The system that implements these methods is experimentally evaluated on ontologies selected from popular repositories.

Research paper thumbnail of TPGR

Research paper thumbnail of Learning Probabilistic Description Logic Concepts Under Alternative Assumptions on Incompleteness

Lecture Notes in Computer Science, 2014

Real-world knowledge often involves various degrees of uncertainty. For such a reason, in the Sem... more Real-world knowledge often involves various degrees of uncertainty. For such a reason, in the Semantic Web context, difficulties arise when modeling real-world domains using only purely logical formalisms. Alternative approaches almost always assume the availability of probabilistically-enriched knowledge, while this is hardly known in advance. In addition, purely deductive exact inference may be infeasible for Web-scale ontological knowledge bases, and does not exploit statistical regularities in data. Approximate deductive and inductive inferences were proposed to alleviate such problems. This article proposes casting the concept-membership prediction problem (predicting whether an individual in a Description Logic knowledge base is a member of a concept) as estimating a conditional probability distribution which models the posterior probability of the aforementioned individual's concept-membership given the knowledge that can be entailed from the knowledge base regarding the individual. Specifically, we model such posterior probability distribution as a generative, discriminatively structured, Bayesian network, using the individual's concept-membership w.r.t. a set of feature concepts standing for the available knowledge about such individual.

Research paper thumbnail of Graph-Based Regularization for Transductive Class-Membership Prediction

Lecture Notes in Computer Science, 2014

Considering the increasing availability of structured machine processable knowledge in the contex... more Considering the increasing availability of structured machine processable knowledge in the context of the Semantic Web, only relying on purely deductive inference may be limiting. This work proposes a new method for similarity-based class-membership prediction in Description Logic knowledge bases. The underlying idea is based on the concept of propagating class-membership information among similar individuals; it is non-parametric in nature and characterized by interesting complexity properties, making it a potential candidate for large-scale transductive inference. We also evaluate its effectiveness with respect to other approaches based on inductive inference in SW literature.

Research paper thumbnail of A Graph Regularization Approach to Transductive Class-Membership Learning

Research paper thumbnail of Learning Terminological Bayesian Classifiers - A Comparison of Alternative Approaches to Dealing with Unknown Concept-Memberships

Knowledge available through Semantic Web representation formalisms can be missing, i.e. it is not... more Knowledge available through Semantic Web representation formalisms can be missing, i.e. it is not always possible to infer the truth value of an assertion (due to the Open World Assumption). We propose a method for incrementally inducing terminological (tree-augmented) naïve Bayesian classifiers, which aim at estimating the probability that an individual belongs to a target concept given its membership to a learned set of Description Logic concepts. We then evaluate the impact of employing different methods of handling assertions whose truth value is unknown, each consistent with a different assumption on the ignorance model.

Research paper thumbnail of Rank prediction for semantically annotated resources

Proceedings of the 28th Annual ACM Symposium on Applied Computing - SAC '13, 2013

ABSTRACT In the context of semantic knowledge bases, we tackle the problem of ranking resources w... more ABSTRACT In the context of semantic knowledge bases, we tackle the problem of ranking resources w.r.t. some criterion. The proposed solution is a method for learning functions that can approximately predict the correct ranking. Differently from other related methods proposed, that assume the ranking criteria to be explicitly expressed (e.g. as a query or a function), our approach is data-driven, being able to produce a predictor detecting the implicit underlying criteria from assertions regarding the resources in the knowledge base. The usage of specific kernel functions encoding the similarity between individuals in the context of knowledge bases allows the application of the method to ontologies in the standard representations for the Semantic Web. The method is based on a kernelized version of the Perceptron Ranking algorithm which is suitable for batch but also online problem settings. Moreover, differently from other approaches based on regression, the method takes advantage from the underlying ordering on the ranking labels. The reported empirical evaluation proves the effectiveness of the method at the task of predicting the rankings of single users in the Linked User Feedback dataset, by integrating knowledge from the Linked Open Data cloud during the learning process.

Research paper thumbnail of Learning Terminological Bayesian Classifiers

Abstract. Knowledge available through Semantic Web representation formalisms can be missing, ie i... more Abstract. Knowledge available through Semantic Web representation formalisms can be missing, ie it is not always possible to infer the truth value of an assertion (due to the Open World Assumption). We propose a method for incrementally inducing terminological (tree-augmented) naıve Bayesian classifiers, which aim at estimating the probability that an individual belongs to a target concept given its membership to a learned set of Description Logic concepts. We then evaluate the impact of employing different methods of handling ...

Research paper thumbnail of A Graph Regularization Based Approach to Transductive Class-Membership Prediction

8th International Workshop on Uncertainty Reasoning for the Semantic Web, Nov 1, 2012

Abstract. Considering the increasing availability of structured machine processable knowledge in ... more Abstract. Considering the increasing availability of structured machine processable knowledge in the context of the Semantic Web, only relying on purely deductive inference may be limiting. This work proposes a new method for similaritybased class-membership prediction in Description Logic knowledge bases. The underlying idea is based on the concept of propagating class-membership information among similar individuals; it is non-parametric in nature and characterised by interesting complexity properties, making it a ...

Research paper thumbnail of A Gaussian Process Model for Knowledge Propagation in Web Ontologies

2014 IEEE International Conference on Data Mining, 2014

We consider the problem of predicting missing class-memberships and property values of individual... more We consider the problem of predicting missing class-memberships and property values of individual resources in Web ontologies. We first identify which relations tend to link similar individuals by means of a finite-set Gaussian Process regression model, and then efficiently propagate knowledge about individuals across their relations. Our experimental evaluation demonstrates the effectiveness of the proposed method.

Research paper thumbnail of Learning Terminological Naıve Bayesian Classifiers

Knowledge available through Semantic Web standards can easily be missing, generally because of th... more Knowledge available through Semantic Web standards can easily be missing, generally because of the adoption of the Open World Assumption (i.e. the truth value of an assertion is not necessarily known). However, the rich relational structure that characterizes ontologies can be exploited for handling such missing knowledge in an explicit way. We present a Statistical Relational Learning system designed for learning terminological naïve Bayesian classifiers, which estimate the probability that a generic individual belongs to the target concept given its membership to a set of Description Logic concepts. During the learning process, we consistently handle the lack of knowledge that may be introduced by the adoption of the Open World Assumption, depending on the varying nature of the missing knowledge itself.

Research paper thumbnail of Can Real-Time Machine Translation Overcome Language Barriers in Distributed Requirements Engineering?

… Engineering (ICGSE), 2010 …, Jan 1, 2010

In global software projects work takes place over long distances, meaning that communication will... more In global software projects work takes place over long distances, meaning that communication will often involve distant cultures with different languages and communication styles that, in turn, exacerbate communication problems. However, being aware of cultural distance is not sufficient to overcome many of the barriers that language differences bring in the way of global project success. In this paper, we investigate the adoption of machine translation (MT) services in synchronous text-based chat in order to overcome any language barrier existing among groups of stakeholders who are remotely negotiating software requirements. We report our findings from a simulated study that compares the efficiency and the effectiveness of two MT services, Google Translate and apertium-service, in translating the messages exchanged during four distributed requirements engineering workshops. The results show that (a) Google Translate produces significantly more adequate translations than Apertium from English to Italian; (b) both services can be used in text-based chat without disrupting real-time interaction.

Research paper thumbnail of Apertium goes SOA: an efficient and scalable service based on the Apertium rule-based machine translation platform

Service Oriented Architecture (SOA) is a paradigm for organising and using distributed services t... more Service Oriented Architecture (SOA) is a paradigm for organising and using distributed services that may be under the control of different ownership domains and implemented using various technology stacks. In some contexts, an organisation using an IT infrastructure implementing the SOA paradigm can take a great benefit from the integration, in its business processes, of efficient machine translation (MT) services to overcome language barriers. This paper describes the architecture and the design patterns used to develop an MT service that is efficient, scalable and easy to integrate in new and existing business processes. The service is based on Apertium, a free/opensource rule-based machine translation platform.