Pierre Andrews | University of Trento (original) (raw)

Books by Pierre Andrews

Several Networks of Excellence have been set up in the framework of the European FP5 research pro... more Several Networks of Excellence have been set up in the framework of the European FP5 research program. Among these Networks of Excellence, the NEMIS project focuses on the field of Text Mining.

Within this field, document processing and visualization was identified as one of the key topics and the WG1 working group was created in the NEMIS project, to carry out a detailed survey of techniques associated with the text mining process and to identify the relevant research topics in related research areas.

In this document we present the results of this comprehensive survey. The report includes a description of the current state-of-the-art and practice, a roadmap for follow-up research in the identified areas, and recommendations for anticipated technological development in the domain of text mining.

In the part dedicated to document processing, the discussion focuses on research topics in natural language processing and information retrieval. More precisely, the work covers the tasks related with data selection, filtering and cleaning, morphological normalization and parsing, document representation and similarity computation, and various aspects of data analysis that have all been developed and successfully used in data mining.

In the part dedicated to the visualization, the study essentially focuses on the issue of high dimensionality for document representation. Indeed, the high dimensional representations that are produced in the various stages of the text mining process are usually not well suited for a simple and easily exploitable presentation of text mining results which require specific interpretation techniques, tightly connected to the task of document summarization. In addition, the study has identified a clear need for the development of a unified methodology in the field of visualization.

Papers by Pierre Andrews

Controlled vocabularies that power semantic applications allow them to operate with high precisio... more Controlled vocabularies that power semantic applications allow them to operate with high precision, which comes with a price of having to disambiguate between senses of terms. Fully automatic disambiguation is a largely unsolved problem and semi-automatic approaches are preferred. These approaches involve users to do the disambiguation and require an adequate user interface.

Abstract The idea of the European project GLOCAL is to use events as the central concept for sear... more Abstract The idea of the European project GLOCAL is to use events as the central concept for search, organization and combination of multimedia content from various sources. For this purpose methods for event detection and event matching as well as media analysis are developed. Considered events range from private, over local, to global events.

Corporate portals make an important integral part of the enterprise infrastructure, facilitating ... more Corporate portals make an important integral part of the enterprise infrastructure, facilitating the creation, sharing, discovery and consumption of enterprise assets through blogs, news, forums, documents and information in general. However, as the amount of data grows, it becomes much more difficult to access the right asset in the precise moment when it is needed.

There is currently a trend in media management and the semantic web to develop new media processi... more There is currently a trend in media management and the semantic web to develop new media processing methods and knowledge representation techniques to organise and structure media around events. While this increased interest for events as the central aggregator when organising media is supported by strong research in the fields of knowledge representation and computer vision; it is not yet clear how the digital era users use events when sharing their personal media collection.

Folksonomies, often known as tagging systems, such as the ones used on the popular Delicious or f... more Folksonomies, often known as tagging systems, such as the ones used on the popular Delicious or flickr websites, use a very simple knowledge organisation system.
Users are thus quick to adopt this system and create extensive knowledge annotations on the Web.
However, because of the simplicity of the folksonomy model, the semantics of the tags used is not explicit and can only be inferred from the context of use of the tags.
This is a barrier for the automatic use of such knowledge organisation systems by computers and new techniques have to be developed to extract the semantic of the tags used.
In this paper we discuss an algorithm to detect new senses of terms in a folksonomy; we also propose a formal evaluation methodology that will enable to compare results between different approaches in the field.

This work has been partially supported by INSEMTIVES project (FP7-231181, see http://www.insemtives.eu).

In this paper we introduce an application that allows its users to have an explicit control on th... more In this paper we introduce an application that allows its users to have an explicit control on the meaning of tags they use when uploading photos on Flickr.
In fact, this application provides to the users an improved interface with which they can add concepts to photos instead of simple free-text tags.
They can thus directly provide semantic tags for their photos that can then be used to improve services such as search.

Social annotation systems such as del.icio.us, Flickr and others have gained tremendous popular... more Social annotation systems such as del.icio.us, Flickr and others have gained tremendous popularity among Web 2.0 users. One of the factors of success was the simplicity of the underlying model, which consists of a resource (e.g., a web page), a tag (e.g., a text string), and a user who annotates the resource with the tag. However, due to the syntactic nature of the underlying model, these systems have been criticised for not being able to take into account the explicit semantics implicitly encoded by the users in each tag.
In this article we: a) provide a formalisation of an annotation model in which tags are based on concepts instead of being free text strings; b) describe how an existing annotation system can be converted to the proposed model; c) report on the results of such a conversion on the example of a del.icio.us dataset; and d) show how the quality of search can be improved by the semantic in the converted dataset.

Understanding metadata written in natural language is a premise to successful automated integrati... more Understanding metadata written in natural language is a premise to successful automated integration of large scale, language-rich, classifications such as the ones used in digital libraries. We analyze the natural language labels within classification by exploring their syntactic structure, we then show how this structure can be used to detect patterns of language that can be processed by a lightweight parser with an average accuracy of 96.82%. This allows for a deeper understanding of natural language metadata semantics, which we show can improve by almost 18% the accuracy of the automatic translation of classifications into lightweight ontologies required by semantic matching, search and classification algorithms.

In the ﬁeld of natural argumentation and computer persuasion, there has not been any clear deﬁnit... more In the ﬁeld of natural argumentation and computer persuasion, there has not been any clear deﬁnition of the persuasiveness of a system trying to inﬂuence the user. In this paper, we describe a general evaluation task that can be instantiated on a number of domains to evaluate the beliefs change of participants. Through the use of a ranking task, we can measure the participant’s change of beliefs related to a behaviour or an attitude. This general metric allows a better comparison of state of the art persuasive systems.

Events have been recognised as important metadata to fill the semantic gap between our experience... more Events have been recognised as important metadata to fill the semantic gap between our experience of the world represented in media and its conceptualization. In this paper, we argue that, once event metadata can be extracted, there remains a gap between different users conceptualizations. We then show how a compositional event model can mitigate such a social semantic gap through higher level descriptions of events where an agreement can be reached. In turn, this enables semantic services which improve event-centric search and navigation of shared media.

We present the role of conversational agents in two task-oriented human-computer dialogue applica... more We present the role of conversational agents in two task-oriented human-computer dialogue applications: Interactive Question Answering and Persuasive Dialogue.

We show that conversational agents can be effectively deployed for interaction that goes beyond user entertainment and can be successfully used as a means to achieve complex tasks.

Conversational agents are a winning solution in Persuasive Dialogue because, combined with a planning infrastructure, they can help manage the parts of the dialogue that cannot be planned a priori and are primordial to keep the system persuasive. In Interactive Question Answering, conversational approaches lead users to the explicit formulation of queries, allow for the submission of further queries and accomodate related queries thanks to their ability to handle context.

This article describes a method for classifying dialogue utterances and detecting interlocutor’s ... more This article describes a method for classifying dialogue utterances and detecting interlocutor’s agreement or disagreement. This labelling can help improve dialogue management by providing additional information on the utterance’s content without deep parsing. The proposed technique improves upon state-of-the-art approaches by using a Support Vector Machine cascade. A combination of three binary support vector machines in a cascade is employed to filter out utterances that are easy to classify, thus reducing the noise in the learning of labels for more ambiguous utterances. The approach achieves higher accuracy (by 2.47%) than the state-of-the-art while using a simpler approach which relies only on shallow local features of the utterances.

In the current state of the art in ontology matching, diverse golden standards are used to evalua... more In the current state of the art in ontology matching, diverse golden standards are used to evaluate the algorithms. In this paper we show that by following appropriate rules in their construction and use, the quality of the evaluations can be signicantly improved, particularly in the accuracy of the precision and recall measures obtained.

Understanding metadata written in natural language is a premise to successful automated integrati... more Understanding metadata written in natural language is a premise to successful automated integration of large scale language-rich datasets, such as digital libraries. In this paper we describe an analysis of the part of speech structure of two different datasets of metadata, show how this structure can be used to detect structural patterns that can be parsed by lightweight grammars with an accuracy ranging from 95.3% to 99.8%. This allows deeper understanding of metadata semantics, important for such tasks as translating classifications into lightweight ontologies for use in semantic matching.

Evaluating and comparing different ontology matching techniques is a complex multifaceted problem... more Evaluating and comparing different ontology matching techniques is a complex multifaceted problem. Currently, diverse golden standards and various practices are used for evaluations. In this paper we show that, by following certain rules, the quality of the evaluations can be significantly improved, particularly in regard to the accuracy of precision and recall measures obtained.

Argumentation is an emerging topic in the field of human computer dialogue. In this paper we desc... more Argumentation is an emerging topic in the field of human computer dialogue. In this paper we describe a novel approach to dialogue management that has been developed to achieve persuasion using a textual argumentation dialogue system. The paper introduces a layered management architecture that mixes task oriented dialogue techniques with chatbot techniques to achieve better persuasiveness in the dialogue.

Human computer dialogue systems – despite being the subject of a long research – are limited to a... more Human computer dialogue systems – despite being the subject of a long research – are limited to a few restricted domains and are still considered austere by their users. There is evidence that humans act differently when engaged in computer dialogue than during human to human dialogue [Shechtman03Media]. This is because dialogue systems do not take into account aspects contributing to the natural effect of human to human conversation, such as emotions and social cues.

Our current research focuses on using human-computer dialogue for health-care counselling. In particular, we are developing a dialogue system that should be capable of changing the user health behaviour based on techniques of persuasion and argumentation.

In our opinion, natural argumentation – especially persuasive argumentation – to show empathy and use social cues to be effective [andrews06persuasive]. We describe here the design of a multi layer framework to separate the persuasion planning and the management of surface-level dialogue cues.

Keywords: natural argumentation, rhetorics, dialogue, persuasion, natural language processing

In the field of natural language dialogue, a new trend is exploring persuasive argumentation theo... more In the field of natural language dialogue, a new trend is exploring persuasive argumentation theories. Applying these theories to human-computer dialogue management could lead to a more comfortable experience for the user and give way to new applications.

In this paper, we study the different aspects of persuasive communication needed for health-care advising and how to implement them to produce efficient, computer directed persuasion. Our opinion is that a persuasive dialogue will have to combine the current logical approach to persuasion with novel emotional cues to render the dialogue more comfortable to the user.

Keywords: natural argumentation, rhetorics, dialogue, persuasion, health-care counselling, natural language processing

This work has been partially supported by INSEMTIVES project (FP7-231181, see http://www.insemtives.eu).

We show that conversational agents can be effectively deployed for interaction that goes beyond user entertainment and can be successfully used as a means to achieve complex tasks.

Understanding metadata written in natural language is a premise to successful automated integrati... more Understanding metadata written in natural language is a premise to successful automated integration of large scale language-rich datasets, such as digital libraries. In this paper we describe an analysis of the part of speech structure of two different datasets of metadata, show how this structure can be used to detect structural patterns that can be parsed by lightweight grammars with an accuracy ranging from 95.3% to 99.8%. This allows deeper understanding of metadata semantics, important for such tasks as translating classifications into lightweight ontologies for use in semantic matching.

Keywords: natural argumentation, rhetorics, dialogue, persuasion, natural language processing

Keywords: natural argumentation, rhetorics, dialogue, persuasion, health-care counselling, natural language processing

Automatic indexing is one of the important technologies used for Tex-tual Data Analysis applicati... more Automatic indexing is one of the important technologies used for Tex-tual Data Analysis applications. Standard document indexing techniques usually identify the most relevant keywords in the documents. This paper presents an alternative approach that aims at performing document indexing by associating concepts with the document to index instead of extracting keywords out of it. The concepts are extracted out of the EDR Electronic Dictionary that provides a con- cept hierarchy based on hyponym/hypernym relations. An experimental evaluation based on a probabilistic model was performed on a sample of the INSPEC biblio- graphic database and we present the promising results that were obtained during the evaluation experiments.

Introduction to the Insemtives.eu project and the need for creating better interfaces and incenti... more Introduction to the Insemtives.eu project and the need for creating better interfaces and incentives to help the users create more semantic content on the web.

The presentation discusses the cases of Flickr.com and del.icio.us and look at the tagging habits in these folksonomies. The study of two large datasets show that the majority of users do not use the tagging system and the social features available on this web 2.0 sites.