Davide Ceolin | Vrije Universiteit Amsterdam (original) (raw)

Papers by Davide Ceolin

Research paper thumbnail of rrdf: Release 2.1.2

This release has the following changes: updated documentation datatypes or properties now also in... more This release has the following changes: updated documentation datatypes or properties now also include double, NOTATION, and QName

Research paper thumbnail of Refining Software Quality Prediction with LOD

The complexity of software systems is growing and the computation of several software quality met... more The complexity of software systems is growing and the computation of several software quality metrics is challenging. Therefore, being able to use the already estimated quality metrics to predict their evolution is a crucial task. In this paper, we outline our idea to use Linked Open Data to enrich the information available for such prediction. We report our experience so far, and we outline the preliminary results obtained.

Research paper thumbnail of The many dimensions of truthfulness: Crowdsourcing misinformation assessments on a multidimensional scale

Information Processing & Management, 2021

Recent work has demonstrated the viability of using crowdsourcing as a tool for evaluating the tr... more Recent work has demonstrated the viability of using crowdsourcing as a tool for evaluating the truthfulness of public statements. Under certain conditions such as: (1) having a balanced set of workers with different backgrounds and cognitive abilities; (2) using an adequate set of mechanisms to control the quality of the collected data; and (3) using a coarse grained assessment scale, the crowd can provide reliable identification of fake news. However, fake news are a subtle matter: statements can be just biased ("cherrypicked"), imprecise, wrong, etc. and the unidimensional truth scale used in existing work cannot account for such differences. In this paper we propose a multidimensional notion of truthfulness and we ask the crowd workers to assess seven different dimensions of truthfulness selected based on existing literature: Correctness, Neutrality, Comprehensibility, Precision, Completeness, Speaker's Trustworthiness, and Informativeness. We deploy a set of quality control mechanisms to ensure that the thousands of assessments collected on 180 publicly available fact-checked statements distributed over two datasets are of adequate quality, including a custom search engine used by the crowd workers to find web pages supporting their truthfulness assessments. A comprehensive analysis of crowdsourced judgments shows that: (1) the crowdsourced assessments are reliable when compared to an expert-provided gold standard; (2) the proposed dimensions of truthfulness capture independent pieces of information; (3) the crowdsourcing task can be easily learned by the workers; and (4) the resulting assessments provide a useful basis for a more complete estimation of statement truthfulness.

Research paper thumbnail of Supporting Digital Humanities in Dealing with Quality of Web Documents

Research paper thumbnail of Transparent assessment of information quality of online reviews using formal argumentation theory

Research paper thumbnail of Nanopublication-Based Semantic Publishing and Reviewing: A Field Study with Formalization Papers

With the rapidly increasing amount of scientific literature, it is getting continuously more diff... more With the rapidly increasing amount of scientific literature, it is getting continuously more difficult for researchers in different disciplines to keep up-to-date with the recent findings in their field of study. Processing scientific articles in an automated fashion has been proposed as a solution to this problem, but the accuracy of such processing remains very poor for extraction tasks beyond the most basic ones (like locating and identifying entities and simple classification based on predefined categories). Few approaches have tried to change how we publish scientific results in the first place, such as by making articles machine-interpretable by expressing them with formal semantics from the start. In the work presented here, we set out to demonstrate that we can formally publish high-level scientific claims in formal logic, and publish the results in a special issue of an existing journal. We use the concept and technology of nanopublications for this endeavor, and represent not just the submissions and final papers in this RDF-based format, but also the whole process in between, including reviews, responses, and decisions. We do this by performing a field study with what we call formalization papers, which contribute a novel formalization of a previously published claim. We received 15 submissions from 18 authors, who then went through the whole publication process leading to the publication of their contributions in the special issue. Our evaluation shows the technical and practical feasibility of our approach. The participating authors mostly showed high levels of interest and confidence, and mostly experienced the process as not very difficult, despite the technical nature of the current user interfaces. We believe that these results indicate that it is possible to publish scientific results from different fields with machineinterpretable semantics from the start, which in turn opens countless possibilities to radically improve in the future the effectiveness and efficiency of the scientific endeavor as a whole.

Research paper thumbnail of The Vaccination Debate: An Exploration of Long Term Concept and Perspective Mining

years of vaccination debates, by using a mix of distantand close reading, computational analyses ... more years of vaccination debates, by using a mix of distantand close reading, computational analyses and NLP techniques. Our aim is to map the variations and changes in perspectives in the vaccination debate both from a synchronic and diachronic perspective. For centuries, the debate was dominated by the dichotomy between a vast majority that was in favour of vaccination, versus a minority that was against the use of vaccinations. These debates showed remarkable consistency: the same arguments were repeated over and over again. Due to the internet revolution, the debate became larger and has a different dynamic, with group polarization and the formation of so-called ‘Echo Chambers’ where like minded people easily find each other and confirm each other’s beliefs. Digital analysis can help us distinguish the multiple perspectives and see how they relate to the previous ‘offline’ debates. We will present our methods for historical and contemporary conceptand perspective mining and aim to s...

Research paper thumbnail of Combining User Reputation and Provenance Analysis for Trust Assessment

Journal of Data and Information Quality, 2016

Trust is a broad concept that in many systems is often reduced to user reputation alone. However,... more Trust is a broad concept that in many systems is often reduced to user reputation alone. However, user reputation is just one way to determine trust. The estimation of trust can be tackled from other perspectives as well, including by looking at provenance. Here, we present a complete pipeline for estimating the trustworthiness of artifacts given their provenance and a set of sample evaluations. The pipeline is composed of a series of algorithms for (1) extracting relevant provenance features, (2) generating stereotypes of user behavior from provenance features, (3) estimating the reputation of both stereotypes and users, (4) using a combination of user and stereotype reputations to estimate the trustworthiness of artifacts, and (5) selecting sets of artifacts to trust. These algorithms rely on the W3C PROV recommendations for provenance and on evidential reasoning by means of subjective logic. We evaluate the pipeline over two tagging datasets: tags and evaluations from the Netherl...

Research paper thumbnail of Semi-automated assessment of annotation trustworthiness

2013 Eleventh Annual Conference on Privacy, Security and Trust, 2013

Cultural heritage institutions and multimedia archives often delegate the task of annotating thei... more Cultural heritage institutions and multimedia archives often delegate the task of annotating their collections of artifacts to Web users. The use of crowdsourced annotations from the Web gives rise to trust issues. We propose an algorithm that, by making use of a combination of subjective logic, semantic relatedness measures and clustering, automates the process of evaluation for annotations represented by means of the Open Annotation ontology. The algorithm is evaluated over two different datasets coming from the cultural heritage domain.

Research paper thumbnail of 10th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW 2014)

Research paper thumbnail of Two Procedures for Analyzing the Reliability of Open Government Data

Communications in Computer and Information Science, 2014

Open Government Data often contain information that, in more or less detail, regard private citiz... more Open Government Data often contain information that, in more or less detail, regard private citizens. For this reason, before publishing them, public authorities manipulate data to remove any sensitive information while trying to preserve their reliability. This paper addresses the lack of tools aimed at measuring the reliability of these data. We present two procedures for the assessment of the Open Government Data reliability, one based on a comparison between open and closed data, and the other based on analysis of open data only. We evaluate the procedures over data from the data.police.uk website and from the Hampshire Police Constabulary in the United Kingdom. The procedures effectively allow estimating the reliability of open data and, actually, their reliability is high even though they are aggregated and smoothed.

Research paper thumbnail of The Effects of Crowd Worker Biases in Fact-Checking Tasks

2022 ACM Conference on Fairness, Accountability, and Transparency

Due to the increasing amount of information shared online every day, the need for sound and relia... more Due to the increasing amount of information shared online every day, the need for sound and reliable ways of distinguishing between trustworthy and non-trustworthy information is as present as ever. One technique for performing fact-checking at scale is to employ human intelligence in the form of crowd workers. Although earlier work has suggested that crowd workers can reliably identify misinformation, cognitive biases of crowd workers may reduce the quality of truthfulness judgments in this context. We performed a systematic exploratory analysis of publicly available crowdsourced data to identify a set of potential systematic biases that may occur when crowd workers perform fact-checking tasks. Following this exploratory study, we collected a novel data set of crowdsourced truthfulness judgments to validate our hypotheses. Our findings suggest that workers generally overestimate the truthfulness of statements and that different individual characteristics (i.e., their belief in science) and cognitive biases (i.e., the affect heuristic and overconfidence) can affect their annotations. Interestingly, we find that, depending on the general judgment tendencies of workers, their biases may sometimes lead to more accurate judgments.

Research paper thumbnail of Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Lecture Notes in Computer Science, 2016

Automatic estimation of the quality of Web documents is a challenging task, especially because th... more Automatic estimation of the quality of Web documents is a challenging task, especially because the definition of quality heavily depends on the individuals who define it, on the context where it applies, and on the nature of the tasks at hand. Our long-term goal is to allow automatic assessment of Web document quality tailored to specific user requirements and context. This process relies on the possibility to identify document characteristics that indicate their quality. In this paper, we investigate these characteristics as follows: (1) we define features of Web documents that may be indicators of quality; (2) we design a procedure for automatically extracting those features; (3) develop a Web application to present these results to niche users to check the relevance of these features as quality indicators and collect quality assessments; (4) we analyse user's qualitative assessment of Web documents to refine our definition of the features that determine quality, and establish their relevant weight in the overall quality, i.e., in the summarizing score users attribute to a document, determining whether it meets their standards or not. Hence, our contribution is threefold: a Web application for nichesourcing quality assessments; a curated dataset of Web document assessments; and a thorough analysis of the quality assessments collected by means of two case studies involving experts (journalists and media scholars). The dataset obtained is limited in size but highly valuable because of the quality of the experts that provided it. Our analyses show that: (1) it is possible to automate the process of Web document quality estimation to a level of high accuracy; (2) document features shown in isolation are poorly informative to users; and (3) related to the tasks we propose (i.e., choosing Web documents to use as a source for writing an article on the vaccination debate), the most important quality dimensions are accuracy, trustworthiness, and precision.

Research paper thumbnail of Predicting Quality of Crowdsourced Annotations using Graph Kernels

Annotations obtained by Cultural Heritage institutions from the crowd need to be automatically as... more Annotations obtained by Cultural Heritage institutions from the crowd need to be automatically assessed for their quality. Machine learning using graph kernels is an effective technique to use structural information in datasets to make predictions. We employ the Weisfeiler-Lehman graph kernel for RDF to make predictions about the quality of crowdsourced annotations in Steve.museum dataset, which is modelled and enriched as RDF. Our results indicate that we could predict quality of crowdsourced annotations with an accuracy of 75%. We also employ the kernel to understand which features from the RDF graph are relevant to make predictions about different categories of quality.

Research paper thumbnail of Identifying and Classifying Uncertainty Layers in Web Document Quality Assessment

International Semantic Web Conference, 2016

Assessing the quality of Web documents is crucial, but challenging. In this paper, we outline the... more Assessing the quality of Web documents is crucial, but challenging. In this paper, we outline the different uncertainty bottlenecks that such task implies, and we propose a strategy to tackle them.

Research paper thumbnail of Overview of METHOD 2014: the 3rd International Workshop on Methods for Establishing Trust of (Open) Data

International Semantic Web Conference, 2014

Research paper thumbnail of Laying the Explorative Groundwork for Document Quality Assessment and Perspective Detection

In this abstract, we address the problem of information overload that web users face when researc... more In this abstract, we address the problem of information overload that web users face when researching controversial topics on the web. The ultimate goal of our project is to design a tool to aid users in reviewing and learning about the quality and content of large document collections online. We highlight the challenges as well as the previous and future steps taken to develop a document information quality visualization tool. This lays the explorative groundwork for developing a web browser tool which efciently informs users about the quality of web documents according to 8 quality dimensions. We also address the challenges of identifying author and source perspectives in documents and collections of controversial debate related text. We use Natural Language Processing (NLP) methods to explore these linguistic features and to extract important textual content with which to contextualize the document information quality. Finally, we motivate our planned crowd-sourced study which is...

Research paper thumbnail of Efficient semi-automated assessment of annotation trustworthiness

Abstract-Cultural heritage institutions and multimedia archives often delegate the task of annota... more Abstract-Cultural heritage institutions and multimedia archives often delegate the task of annotating their collections of artifacts to Web users. The use of crowdsourced annotations from the Web gives rise to trust issues. We propose an algorithm that, by making use of a combination of subjective logic, semantic relatedness measures and clustering, automates the process of evaluation for annotations represented by means of the Open Annotation ontology. The algorithm is evaluated over two different datasets coming from the cultural heritage domain.

Research paper thumbnail of V.: Towards the definition of an ontology for trust in (web) data

Abstract. This paper introduces an ontology for representing trust that extends existing ones by ... more Abstract. This paper introduces an ontology for representing trust that extends existing ones by integrating them with recent trust theories. Then, we propose an extension of such an ontology, tailored for repre-senting trust assessments of data, and we outline its specificity and its relevance.

Research paper thumbnail of Automated Evaluation of Annotators for Museum Collections using Subjective Logic

Abstract. Museums are rapidly digitizing their collections, and face a huge challenge to annotate... more Abstract. Museums are rapidly digitizing their collections, and face a huge challenge to annotate every digitized artifact in store. Therefore they are opening up their archives for receiving annotations from experts world-wide. This paper presents an architecture for choosing the most eligible set of annotators for a given artifact, based on semantic relatedness measures between the subject matter of the artifact and topics of expertise of the annotators. We also employ mechanisms for evaluating the quality of provided annotations, and constantly manage and update the trust, reputation and expertise information of registered annotators. 1 1

Research paper thumbnail of rrdf: Release 2.1.2

This release has the following changes: updated documentation datatypes or properties now also in... more This release has the following changes: updated documentation datatypes or properties now also include double, NOTATION, and QName

Research paper thumbnail of Refining Software Quality Prediction with LOD

The complexity of software systems is growing and the computation of several software quality met... more The complexity of software systems is growing and the computation of several software quality metrics is challenging. Therefore, being able to use the already estimated quality metrics to predict their evolution is a crucial task. In this paper, we outline our idea to use Linked Open Data to enrich the information available for such prediction. We report our experience so far, and we outline the preliminary results obtained.

Research paper thumbnail of The many dimensions of truthfulness: Crowdsourcing misinformation assessments on a multidimensional scale

Information Processing & Management, 2021

Recent work has demonstrated the viability of using crowdsourcing as a tool for evaluating the tr... more Recent work has demonstrated the viability of using crowdsourcing as a tool for evaluating the truthfulness of public statements. Under certain conditions such as: (1) having a balanced set of workers with different backgrounds and cognitive abilities; (2) using an adequate set of mechanisms to control the quality of the collected data; and (3) using a coarse grained assessment scale, the crowd can provide reliable identification of fake news. However, fake news are a subtle matter: statements can be just biased ("cherrypicked"), imprecise, wrong, etc. and the unidimensional truth scale used in existing work cannot account for such differences. In this paper we propose a multidimensional notion of truthfulness and we ask the crowd workers to assess seven different dimensions of truthfulness selected based on existing literature: Correctness, Neutrality, Comprehensibility, Precision, Completeness, Speaker's Trustworthiness, and Informativeness. We deploy a set of quality control mechanisms to ensure that the thousands of assessments collected on 180 publicly available fact-checked statements distributed over two datasets are of adequate quality, including a custom search engine used by the crowd workers to find web pages supporting their truthfulness assessments. A comprehensive analysis of crowdsourced judgments shows that: (1) the crowdsourced assessments are reliable when compared to an expert-provided gold standard; (2) the proposed dimensions of truthfulness capture independent pieces of information; (3) the crowdsourcing task can be easily learned by the workers; and (4) the resulting assessments provide a useful basis for a more complete estimation of statement truthfulness.

Research paper thumbnail of Supporting Digital Humanities in Dealing with Quality of Web Documents

Research paper thumbnail of Transparent assessment of information quality of online reviews using formal argumentation theory

Research paper thumbnail of Nanopublication-Based Semantic Publishing and Reviewing: A Field Study with Formalization Papers

With the rapidly increasing amount of scientific literature, it is getting continuously more diff... more With the rapidly increasing amount of scientific literature, it is getting continuously more difficult for researchers in different disciplines to keep up-to-date with the recent findings in their field of study. Processing scientific articles in an automated fashion has been proposed as a solution to this problem, but the accuracy of such processing remains very poor for extraction tasks beyond the most basic ones (like locating and identifying entities and simple classification based on predefined categories). Few approaches have tried to change how we publish scientific results in the first place, such as by making articles machine-interpretable by expressing them with formal semantics from the start. In the work presented here, we set out to demonstrate that we can formally publish high-level scientific claims in formal logic, and publish the results in a special issue of an existing journal. We use the concept and technology of nanopublications for this endeavor, and represent not just the submissions and final papers in this RDF-based format, but also the whole process in between, including reviews, responses, and decisions. We do this by performing a field study with what we call formalization papers, which contribute a novel formalization of a previously published claim. We received 15 submissions from 18 authors, who then went through the whole publication process leading to the publication of their contributions in the special issue. Our evaluation shows the technical and practical feasibility of our approach. The participating authors mostly showed high levels of interest and confidence, and mostly experienced the process as not very difficult, despite the technical nature of the current user interfaces. We believe that these results indicate that it is possible to publish scientific results from different fields with machineinterpretable semantics from the start, which in turn opens countless possibilities to radically improve in the future the effectiveness and efficiency of the scientific endeavor as a whole.

Research paper thumbnail of The Vaccination Debate: An Exploration of Long Term Concept and Perspective Mining

years of vaccination debates, by using a mix of distantand close reading, computational analyses ... more years of vaccination debates, by using a mix of distantand close reading, computational analyses and NLP techniques. Our aim is to map the variations and changes in perspectives in the vaccination debate both from a synchronic and diachronic perspective. For centuries, the debate was dominated by the dichotomy between a vast majority that was in favour of vaccination, versus a minority that was against the use of vaccinations. These debates showed remarkable consistency: the same arguments were repeated over and over again. Due to the internet revolution, the debate became larger and has a different dynamic, with group polarization and the formation of so-called ‘Echo Chambers’ where like minded people easily find each other and confirm each other’s beliefs. Digital analysis can help us distinguish the multiple perspectives and see how they relate to the previous ‘offline’ debates. We will present our methods for historical and contemporary conceptand perspective mining and aim to s...

Research paper thumbnail of Combining User Reputation and Provenance Analysis for Trust Assessment

Journal of Data and Information Quality, 2016

Trust is a broad concept that in many systems is often reduced to user reputation alone. However,... more Trust is a broad concept that in many systems is often reduced to user reputation alone. However, user reputation is just one way to determine trust. The estimation of trust can be tackled from other perspectives as well, including by looking at provenance. Here, we present a complete pipeline for estimating the trustworthiness of artifacts given their provenance and a set of sample evaluations. The pipeline is composed of a series of algorithms for (1) extracting relevant provenance features, (2) generating stereotypes of user behavior from provenance features, (3) estimating the reputation of both stereotypes and users, (4) using a combination of user and stereotype reputations to estimate the trustworthiness of artifacts, and (5) selecting sets of artifacts to trust. These algorithms rely on the W3C PROV recommendations for provenance and on evidential reasoning by means of subjective logic. We evaluate the pipeline over two tagging datasets: tags and evaluations from the Netherl...

Research paper thumbnail of Semi-automated assessment of annotation trustworthiness

2013 Eleventh Annual Conference on Privacy, Security and Trust, 2013

Cultural heritage institutions and multimedia archives often delegate the task of annotating thei... more Cultural heritage institutions and multimedia archives often delegate the task of annotating their collections of artifacts to Web users. The use of crowdsourced annotations from the Web gives rise to trust issues. We propose an algorithm that, by making use of a combination of subjective logic, semantic relatedness measures and clustering, automates the process of evaluation for annotations represented by means of the Open Annotation ontology. The algorithm is evaluated over two different datasets coming from the cultural heritage domain.

Research paper thumbnail of 10th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW 2014)

Research paper thumbnail of Two Procedures for Analyzing the Reliability of Open Government Data

Communications in Computer and Information Science, 2014

Open Government Data often contain information that, in more or less detail, regard private citiz... more Open Government Data often contain information that, in more or less detail, regard private citizens. For this reason, before publishing them, public authorities manipulate data to remove any sensitive information while trying to preserve their reliability. This paper addresses the lack of tools aimed at measuring the reliability of these data. We present two procedures for the assessment of the Open Government Data reliability, one based on a comparison between open and closed data, and the other based on analysis of open data only. We evaluate the procedures over data from the data.police.uk website and from the Hampshire Police Constabulary in the United Kingdom. The procedures effectively allow estimating the reliability of open data and, actually, their reliability is high even though they are aggregated and smoothed.

Research paper thumbnail of The Effects of Crowd Worker Biases in Fact-Checking Tasks

2022 ACM Conference on Fairness, Accountability, and Transparency

Due to the increasing amount of information shared online every day, the need for sound and relia... more Due to the increasing amount of information shared online every day, the need for sound and reliable ways of distinguishing between trustworthy and non-trustworthy information is as present as ever. One technique for performing fact-checking at scale is to employ human intelligence in the form of crowd workers. Although earlier work has suggested that crowd workers can reliably identify misinformation, cognitive biases of crowd workers may reduce the quality of truthfulness judgments in this context. We performed a systematic exploratory analysis of publicly available crowdsourced data to identify a set of potential systematic biases that may occur when crowd workers perform fact-checking tasks. Following this exploratory study, we collected a novel data set of crowdsourced truthfulness judgments to validate our hypotheses. Our findings suggest that workers generally overestimate the truthfulness of statements and that different individual characteristics (i.e., their belief in science) and cognitive biases (i.e., the affect heuristic and overconfidence) can affect their annotations. Interestingly, we find that, depending on the general judgment tendencies of workers, their biases may sometimes lead to more accurate judgments.

Research paper thumbnail of Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Lecture Notes in Computer Science, 2016

Automatic estimation of the quality of Web documents is a challenging task, especially because th... more Automatic estimation of the quality of Web documents is a challenging task, especially because the definition of quality heavily depends on the individuals who define it, on the context where it applies, and on the nature of the tasks at hand. Our long-term goal is to allow automatic assessment of Web document quality tailored to specific user requirements and context. This process relies on the possibility to identify document characteristics that indicate their quality. In this paper, we investigate these characteristics as follows: (1) we define features of Web documents that may be indicators of quality; (2) we design a procedure for automatically extracting those features; (3) develop a Web application to present these results to niche users to check the relevance of these features as quality indicators and collect quality assessments; (4) we analyse user's qualitative assessment of Web documents to refine our definition of the features that determine quality, and establish their relevant weight in the overall quality, i.e., in the summarizing score users attribute to a document, determining whether it meets their standards or not. Hence, our contribution is threefold: a Web application for nichesourcing quality assessments; a curated dataset of Web document assessments; and a thorough analysis of the quality assessments collected by means of two case studies involving experts (journalists and media scholars). The dataset obtained is limited in size but highly valuable because of the quality of the experts that provided it. Our analyses show that: (1) it is possible to automate the process of Web document quality estimation to a level of high accuracy; (2) document features shown in isolation are poorly informative to users; and (3) related to the tasks we propose (i.e., choosing Web documents to use as a source for writing an article on the vaccination debate), the most important quality dimensions are accuracy, trustworthiness, and precision.

Research paper thumbnail of Predicting Quality of Crowdsourced Annotations using Graph Kernels

Annotations obtained by Cultural Heritage institutions from the crowd need to be automatically as... more Annotations obtained by Cultural Heritage institutions from the crowd need to be automatically assessed for their quality. Machine learning using graph kernels is an effective technique to use structural information in datasets to make predictions. We employ the Weisfeiler-Lehman graph kernel for RDF to make predictions about the quality of crowdsourced annotations in Steve.museum dataset, which is modelled and enriched as RDF. Our results indicate that we could predict quality of crowdsourced annotations with an accuracy of 75%. We also employ the kernel to understand which features from the RDF graph are relevant to make predictions about different categories of quality.

Research paper thumbnail of Identifying and Classifying Uncertainty Layers in Web Document Quality Assessment

International Semantic Web Conference, 2016

Assessing the quality of Web documents is crucial, but challenging. In this paper, we outline the... more Assessing the quality of Web documents is crucial, but challenging. In this paper, we outline the different uncertainty bottlenecks that such task implies, and we propose a strategy to tackle them.

Research paper thumbnail of Overview of METHOD 2014: the 3rd International Workshop on Methods for Establishing Trust of (Open) Data

International Semantic Web Conference, 2014

Research paper thumbnail of Laying the Explorative Groundwork for Document Quality Assessment and Perspective Detection

In this abstract, we address the problem of information overload that web users face when researc... more In this abstract, we address the problem of information overload that web users face when researching controversial topics on the web. The ultimate goal of our project is to design a tool to aid users in reviewing and learning about the quality and content of large document collections online. We highlight the challenges as well as the previous and future steps taken to develop a document information quality visualization tool. This lays the explorative groundwork for developing a web browser tool which efciently informs users about the quality of web documents according to 8 quality dimensions. We also address the challenges of identifying author and source perspectives in documents and collections of controversial debate related text. We use Natural Language Processing (NLP) methods to explore these linguistic features and to extract important textual content with which to contextualize the document information quality. Finally, we motivate our planned crowd-sourced study which is...

Research paper thumbnail of Efficient semi-automated assessment of annotation trustworthiness

Abstract-Cultural heritage institutions and multimedia archives often delegate the task of annota... more Abstract-Cultural heritage institutions and multimedia archives often delegate the task of annotating their collections of artifacts to Web users. The use of crowdsourced annotations from the Web gives rise to trust issues. We propose an algorithm that, by making use of a combination of subjective logic, semantic relatedness measures and clustering, automates the process of evaluation for annotations represented by means of the Open Annotation ontology. The algorithm is evaluated over two different datasets coming from the cultural heritage domain.

Research paper thumbnail of V.: Towards the definition of an ontology for trust in (web) data

Abstract. This paper introduces an ontology for representing trust that extends existing ones by ... more Abstract. This paper introduces an ontology for representing trust that extends existing ones by integrating them with recent trust theories. Then, we propose an extension of such an ontology, tailored for repre-senting trust assessments of data, and we outline its specificity and its relevance.

Research paper thumbnail of Automated Evaluation of Annotators for Museum Collections using Subjective Logic

Abstract. Museums are rapidly digitizing their collections, and face a huge challenge to annotate... more Abstract. Museums are rapidly digitizing their collections, and face a huge challenge to annotate every digitized artifact in store. Therefore they are opening up their archives for receiving annotations from experts world-wide. This paper presents an architecture for choosing the most eligible set of annotators for a given artifact, based on semantic relatedness measures between the subject matter of the artifact and topics of expertise of the annotators. We also employ mechanisms for evaluating the quality of provided annotations, and constantly manage and update the trust, reputation and expertise information of registered annotators. 1 1