Diana Maynard | The University of Sheffield (original) (raw)
Papers by Diana Maynard
Semantic Web databases allow efficient storage and access to RDF statements. Applications are abl... more Semantic Web databases allow efficient storage and access to RDF statements. Applications are able to use expressive query languages in order to retrieve relevant metadata to perform different tasks. However, access to metadata may not be public to just ...
Social Science Research Network, 2017
This paper presents a framework for collecting and analysing large volume social media content. T... more This paper presents a framework for collecting and analysing large volume social media content. The real-time analytics framework comprises semantic annotation, Linked Open Data, semantic search, and dynamic result aggregation components. In addition, exploratory search and sense-making are supported through information visualisation interfaces, such as co-occurrence matrices, term clouds, treemaps, and choropleths. There is also an interactive semantic search interface (Prospector), where users can save, refine, and analyse the results of semantic search queries over time. Practical use of the framework is exemplified through three case studies: a general scenario analysing tweets from UK politicians and the public's response to them in the run up to the 2015 UK general election, an investigation of attitudes towards climate change expressed by these politicians and the public, via their engagement with environmental topics, and an analysis of public tweets leading up to the UK's referendum on leaving the EU (Brexit) in 2016. The paper also presents a brief evaluation and discussion of some of the key text analysis components, which are specifically adapted to the domain and task, and demonstrate scalability and efficiency of our toolkit in the case studies.
In this paper, we present a healthcare-oriented vision of dynamic ontology lifecycle that has bee... more In this paper, we present a healthcare-oriented vision of dynamic ontology lifecycle that has been recently developed within Knowledge Web-EU Network of Excellence aimed at transition of Semantic Web technologies to industry. The core contribution of this paper is the proposal of methodologically and technically integral ontology lifecycle scenario, provided with extensive use case from the domain of translational medicine, showing the forthcoming impact on the biomedicine industry.
Proceedings of the International AAAI Conference on Web and Social Media, Jun 15, 2018
Concerns have reached the mainstream about how social media are affecting political outcomes. One... more Concerns have reached the mainstream about how social media are affecting political outcomes. One trajectory for this is the exposure of politicians to online abuse. In this paper we use 1.4 million tweets from the months before the 2015 and 2017 UK general elections to explore the abuse directed at politicians. Results show that abuse increased substantially in 2017 compared with 2015. Abusive tweets show a strong relationship with total tweets received, indicating for the most part impersonality, but a second pathway targets less prominent individuals, suggesting different kinds of abuse. Accounts that send abuse are more likely to be throwaway. Economy and immigration were major foci of abusive tweets in 2015, whereas terrorism came to the fore in 2017.
Future Internet, Aug 13, 2014
In this paper, we describe a set of reusable text processing components for extracting opinionate... more In this paper, we describe a set of reusable text processing components for extracting opinionated information from social media, rating it for interestingness, and for detecting opinion events. We have developed applications in GATE to extract named entities, terms and events and to detect opinions about them, which are then used as the starting point for opinion event detection. The opinions are then aggregated over larger sections of text, to give some overall sentiment about topics and documents, and also some degree of information about interestingness based on opinion diversity. We go beyond traditional opinion mining techniques in a number of ways: by focusing on specific opinion-target extraction related to key terms and events, by examining and dealing with a number of specific linguistic phenomena, by analysing and visualising opinion dynamics over time, and by aggregating the opinions in different ways for a more flexible view of the information contained in the documents.
Springer eBooks, 2017
Whether you call it the Semantic Web, Linked Data, or Web 3.0, a new generation of Web technologi... more Whether you call it the Semantic Web, Linked Data, or Web 3.0, a new generation of Web technologies is offering major advances in the evolution of the World Wide Web. As the first generation of this technology transitions out of the laboratory, new research is exploring how the growing Web of Data will change our world. While topics such as ontology-building and logics remain vital, new areas such as the use of semantics in Web search, the linking and use of open data on the Web, and future applications that will be supported by these technologies are becoming important research areas in their own right. Whether they be scientists, engineers or practitioners, Web users increasingly need to understand not just the new technologies of the Semantic Web, but to understand the principles by which those technologies work, and the best practices for assembling systems that integrate the different languages, resources, and functionalities that will be important in keeping the Web the rapidly expanding, and constantly changing, information space that has changed our lives. Topics to be included: ABSTRACT is book introduces core natural language processing (NLP) technologies to non-experts in an easily accessible way, as a series of building blocks that lead the user to understand key technologies, why they are required, and how to integrate them into Semantic Web applications. Natural language processing and Semantic Web technologies have different, but complementary roles in data management. Combining these two technologies enables structured and unstructured data to merge seamlessly. Semantic Web technologies aim to convert unstructured data to meaningful representations, which benefit enormously from the use of NLP technologies, thereby enabling applications such as connecting text to Linked Open Data, connecting texts to each other, semantic searching, information visualization, and modeling of user behavior in online networks. e first half of this book describes the basic NLP processing tools: tokenization, part-ofspeech tagging, and morphological analysis, in addition to the main tools required for an information extraction system (named entity recognition and relation extraction) which build on these components. e second half of the book explains how Semantic Web and NLP technologies can enhance each other, for example via semantic annotation, ontology linking, and population. ese chapters also discuss sentiment analysis, a key component in making sense of textual data, and the difficulties of performing NLP on social media, as well as some proposed solutions. e book finishes by investigating some applications of these tools, focusing on semantic search and visualization, modeling user behavior, and an outlook on the future.
Ubiquity Press eBooks, Dec 14, 2022
Scenic landscapes are a main attractor for local and international tourism, and in many cases hav... more Scenic landscapes are a main attractor for local and international tourism, and in many cases have become designated as protected areas such as national parks or scenic areas that promote their aesthetic qualities to attract visitors. But what directs touristic attention to certain landscapes, and to specific places within such landscapes? We argue that in order to find out how touristic landscapes come into being, we need to turn our focus on how such landscapes become constructed as idealised landscapes.
Synthesis lectures on data, semantics and knowledge, 2017
Synthesis lectures on data, semantics and knowledge, 2017
PLOS ONE
The explosion of disinformation accompanying the COVID-19 pandemic has overloaded fact-checkers a... more The explosion of disinformation accompanying the COVID-19 pandemic has overloaded fact-checkers and media worldwide, and brought a new major challenge to government responses worldwide. Not only is disinformation creating confusion about medical science amongst citizens, but it is also amplifying distrust in policy makers and governments. To help tackle this, we developed computational methods to categorise COVID-19 disinformation. The COVID-19 disinformation categories could be used for a) focusing fact-checking efforts on the most damaging kinds of COVID-19 disinformation; b) guiding policy makers who are trying to deliver effective public health messages and counter effectively COVID-19 disinformation. This paper presents: 1) a corpus containing what is currently the largest available set of manually annotated COVID-19 disinformation categories; 2) a classification-aware neural topic model (CANTM) designed for COVID-19 disinformation category classification and topic discovery; 3...
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism, 2017
News media typically present biased accounts of news stories, and different publications present ... more News media typically present biased accounts of news stories, and different publications present different angles on the same event. In this research, we investigate how different publications differ in their approach to stories about climate change, by examining the sentiment and topics presented. To understand these attitudes, we find sentiment targets by combining Latent Dirichlet Allocation (LDA) with SentiWordNet, a general sentiment lexicon. Using LDA, we generate topics containing keywords which represent the sentiment targets, and then annotate the data using SentiWordNet before regrouping the articles based on topic similarity. Preliminary analysis identifies clearly different attitudes on the same issue presented in different news sources. Ongoing work is investigating how systematic these attitudes are between different publications, and how these may change over time.
EU-IST Network of Excellence (NoE) IST-2004-507482 KWEB Deliverable D2.1.4 (WP2.1) This deliverab... more EU-IST Network of Excellence (NoE) IST-2004-507482 KWEB Deliverable D2.1.4 (WP2.1) This deliverable proposes a benchmarking framework to be used in the benchmarking activities that will be performed in Knowledge Web. This framework includes a benchmarking methodology, guidelines for building benchmark suites, a list of tools that can be useful when performing benchmarking, and specific considerations for benchmarking the different types of tools that will be considered in workpackage 2.1 (ontology development tools, ontology-based annotation tools, ontology-based reasoning tools, and semantic web service technology).
Semantic Web databases allow efficient storage and access to RDF statements. Applications are abl... more Semantic Web databases allow efficient storage and access to RDF statements. Applications are able to use expressive query languages in order to retrieve relevant metadata to perform different tasks. However, access to metadata may not be public to just ...
Social Science Research Network, 2017
This paper presents a framework for collecting and analysing large volume social media content. T... more This paper presents a framework for collecting and analysing large volume social media content. The real-time analytics framework comprises semantic annotation, Linked Open Data, semantic search, and dynamic result aggregation components. In addition, exploratory search and sense-making are supported through information visualisation interfaces, such as co-occurrence matrices, term clouds, treemaps, and choropleths. There is also an interactive semantic search interface (Prospector), where users can save, refine, and analyse the results of semantic search queries over time. Practical use of the framework is exemplified through three case studies: a general scenario analysing tweets from UK politicians and the public's response to them in the run up to the 2015 UK general election, an investigation of attitudes towards climate change expressed by these politicians and the public, via their engagement with environmental topics, and an analysis of public tweets leading up to the UK's referendum on leaving the EU (Brexit) in 2016. The paper also presents a brief evaluation and discussion of some of the key text analysis components, which are specifically adapted to the domain and task, and demonstrate scalability and efficiency of our toolkit in the case studies.
In this paper, we present a healthcare-oriented vision of dynamic ontology lifecycle that has bee... more In this paper, we present a healthcare-oriented vision of dynamic ontology lifecycle that has been recently developed within Knowledge Web-EU Network of Excellence aimed at transition of Semantic Web technologies to industry. The core contribution of this paper is the proposal of methodologically and technically integral ontology lifecycle scenario, provided with extensive use case from the domain of translational medicine, showing the forthcoming impact on the biomedicine industry.
Proceedings of the International AAAI Conference on Web and Social Media, Jun 15, 2018
Concerns have reached the mainstream about how social media are affecting political outcomes. One... more Concerns have reached the mainstream about how social media are affecting political outcomes. One trajectory for this is the exposure of politicians to online abuse. In this paper we use 1.4 million tweets from the months before the 2015 and 2017 UK general elections to explore the abuse directed at politicians. Results show that abuse increased substantially in 2017 compared with 2015. Abusive tweets show a strong relationship with total tweets received, indicating for the most part impersonality, but a second pathway targets less prominent individuals, suggesting different kinds of abuse. Accounts that send abuse are more likely to be throwaway. Economy and immigration were major foci of abusive tweets in 2015, whereas terrorism came to the fore in 2017.
Future Internet, Aug 13, 2014
In this paper, we describe a set of reusable text processing components for extracting opinionate... more In this paper, we describe a set of reusable text processing components for extracting opinionated information from social media, rating it for interestingness, and for detecting opinion events. We have developed applications in GATE to extract named entities, terms and events and to detect opinions about them, which are then used as the starting point for opinion event detection. The opinions are then aggregated over larger sections of text, to give some overall sentiment about topics and documents, and also some degree of information about interestingness based on opinion diversity. We go beyond traditional opinion mining techniques in a number of ways: by focusing on specific opinion-target extraction related to key terms and events, by examining and dealing with a number of specific linguistic phenomena, by analysing and visualising opinion dynamics over time, and by aggregating the opinions in different ways for a more flexible view of the information contained in the documents.
Springer eBooks, 2017
Whether you call it the Semantic Web, Linked Data, or Web 3.0, a new generation of Web technologi... more Whether you call it the Semantic Web, Linked Data, or Web 3.0, a new generation of Web technologies is offering major advances in the evolution of the World Wide Web. As the first generation of this technology transitions out of the laboratory, new research is exploring how the growing Web of Data will change our world. While topics such as ontology-building and logics remain vital, new areas such as the use of semantics in Web search, the linking and use of open data on the Web, and future applications that will be supported by these technologies are becoming important research areas in their own right. Whether they be scientists, engineers or practitioners, Web users increasingly need to understand not just the new technologies of the Semantic Web, but to understand the principles by which those technologies work, and the best practices for assembling systems that integrate the different languages, resources, and functionalities that will be important in keeping the Web the rapidly expanding, and constantly changing, information space that has changed our lives. Topics to be included: ABSTRACT is book introduces core natural language processing (NLP) technologies to non-experts in an easily accessible way, as a series of building blocks that lead the user to understand key technologies, why they are required, and how to integrate them into Semantic Web applications. Natural language processing and Semantic Web technologies have different, but complementary roles in data management. Combining these two technologies enables structured and unstructured data to merge seamlessly. Semantic Web technologies aim to convert unstructured data to meaningful representations, which benefit enormously from the use of NLP technologies, thereby enabling applications such as connecting text to Linked Open Data, connecting texts to each other, semantic searching, information visualization, and modeling of user behavior in online networks. e first half of this book describes the basic NLP processing tools: tokenization, part-ofspeech tagging, and morphological analysis, in addition to the main tools required for an information extraction system (named entity recognition and relation extraction) which build on these components. e second half of the book explains how Semantic Web and NLP technologies can enhance each other, for example via semantic annotation, ontology linking, and population. ese chapters also discuss sentiment analysis, a key component in making sense of textual data, and the difficulties of performing NLP on social media, as well as some proposed solutions. e book finishes by investigating some applications of these tools, focusing on semantic search and visualization, modeling user behavior, and an outlook on the future.
Ubiquity Press eBooks, Dec 14, 2022
Scenic landscapes are a main attractor for local and international tourism, and in many cases hav... more Scenic landscapes are a main attractor for local and international tourism, and in many cases have become designated as protected areas such as national parks or scenic areas that promote their aesthetic qualities to attract visitors. But what directs touristic attention to certain landscapes, and to specific places within such landscapes? We argue that in order to find out how touristic landscapes come into being, we need to turn our focus on how such landscapes become constructed as idealised landscapes.
Synthesis lectures on data, semantics and knowledge, 2017
Synthesis lectures on data, semantics and knowledge, 2017
PLOS ONE
The explosion of disinformation accompanying the COVID-19 pandemic has overloaded fact-checkers a... more The explosion of disinformation accompanying the COVID-19 pandemic has overloaded fact-checkers and media worldwide, and brought a new major challenge to government responses worldwide. Not only is disinformation creating confusion about medical science amongst citizens, but it is also amplifying distrust in policy makers and governments. To help tackle this, we developed computational methods to categorise COVID-19 disinformation. The COVID-19 disinformation categories could be used for a) focusing fact-checking efforts on the most damaging kinds of COVID-19 disinformation; b) guiding policy makers who are trying to deliver effective public health messages and counter effectively COVID-19 disinformation. This paper presents: 1) a corpus containing what is currently the largest available set of manually annotated COVID-19 disinformation categories; 2) a classification-aware neural topic model (CANTM) designed for COVID-19 disinformation category classification and topic discovery; 3...
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism, 2017
News media typically present biased accounts of news stories, and different publications present ... more News media typically present biased accounts of news stories, and different publications present different angles on the same event. In this research, we investigate how different publications differ in their approach to stories about climate change, by examining the sentiment and topics presented. To understand these attitudes, we find sentiment targets by combining Latent Dirichlet Allocation (LDA) with SentiWordNet, a general sentiment lexicon. Using LDA, we generate topics containing keywords which represent the sentiment targets, and then annotate the data using SentiWordNet before regrouping the articles based on topic similarity. Preliminary analysis identifies clearly different attitudes on the same issue presented in different news sources. Ongoing work is investigating how systematic these attitudes are between different publications, and how these may change over time.
EU-IST Network of Excellence (NoE) IST-2004-507482 KWEB Deliverable D2.1.4 (WP2.1) This deliverab... more EU-IST Network of Excellence (NoE) IST-2004-507482 KWEB Deliverable D2.1.4 (WP2.1) This deliverable proposes a benchmarking framework to be used in the benchmarking activities that will be performed in Knowledge Web. This framework includes a benchmarking methodology, guidelines for building benchmark suites, a list of tools that can be useful when performing benchmarking, and specific considerations for benchmarking the different types of tools that will be considered in workpackage 2.1 (ontology development tools, ontology-based annotation tools, ontology-based reasoning tools, and semantic web service technology).