Emma Tonkin | University of Bristol (original) (raw)
Publications by Emma Tonkin
Labelling user data is a central part of the design and evaluation of pervasive systems that aim ... more Labelling user data is a central part of the design and evaluation of pervasive systems that aim to support the user through situation-aware reasoning. It is essential both in designing and training the system to recognise and reason about the situation, either through the definition of a suitable situation model in knowledge-driven applications, or through the preparation of training data for learning tasks in data-driven models. Hence, the quality of annotations can have a significant impact on the performance of the derived systems. Labelling is also vital for validating and quantifying the performance of applications. In particular, comparative evaluations require the production of benchmark datasets based on high-quality and consistent annotations. With pervasive systems relying increasingly on large datasets for designing and testing models of users' activities, the process of data labelling is becoming a major concern for the community. In this work we present a qualitative and quantitative analysis of the challenges associated with annotation of user data and possible strategies towards addressing these challenges. The analysis was based on the data gathered during the 1st International Workshop on Annotation of useR Data for UbiquitOUs Systems (ARDUOUS) and consisted of brainstorming as well as annotation and questionnaire data gathered during the talks, poster session, live annotation session, and discussion session.
Delivering effortless interactions and appropriate interventions through pervasive systems requir... more Delivering effortless interactions and appropriate interventions through pervasive systems requires making sense of multiple streams of sensor data. This is particularly challenging when these concern people’s natural behaviours in the real world. This paper takes a multidisciplinary perspective of annotation and draws on an exploratory study of 12 people, who were encouraged to use a multi-modal annotation app while living in a prototype smart home. Analysis of the app usage data and of semi-structured interviews with the participants revealed strengths and limitations regarding self-annotation in a naturalistic context. Handing control of the annotation process to research participants enabled them to reason about their own data, while generating accounts that were appropriate and acceptable to them. Self-annotation provided participants an opportunity to reflect on themselves and their routines, but it was also a means to express themselves freely and sometimes even a backchannel to communicate playfully with the researchers. However, self-annotation may not be an effective way to capture accurate start and finish times for activities, or location associated with activity information. This paper offers new insights and recommendations for the design of self-annotation tools for deployment in the real world.
Ubiquitous eHealth systems based on sensor technologies are seen as key enablers in the effort to... more Ubiquitous eHealth systems based on sensor technologies are seen as key enablers in the effort to reduce the financial impact of an ageing society. At the heart of such systems sit activity recognition algorithms, which need sensor data to reason over, and a ground truth of adequate quality used for training and validation purposes. The large set up costs of such research projects and their complexity limit rapid developments in this area. Therefore, information sharing and reuse, especially in the context of collected datasets, is key in overcoming these barriers. One approach which facilitates this process by reducing ambiguity is the use of ontologies. This article presents a hierarchical ontology for activities of daily living (ADL), together with two use cases of ground truth acquisition in which this ontology has been successfully utilised. Requirements placed on the ontology by ongoing work are discussed.
Introduction: Over 160,000 people with severe hip or knee pain caused by osteoarthritis undergo t... more Introduction: Over 160,000 people with severe hip or knee pain caused by osteoarthritis undergo total hip (THR) or knee replacement (TKR) surgery each year in the UK within the National Health Service (NHS) and this number is expected to increase. Innovative approaches to evaluating surgical outcomes will be needed to respond to the increasing burden of joint replacement surgery. The Sensor Platform for Healthcare in a Residential Environment, Interdisciplinary Research Collaboration (SPHERE-IRC) have developed a system of sensors that can monitor the health-related behaviours of people living at home. The system includes sensors for the home environment (measuring temperature, humidity, room occupancy, water and electricity usage) a wrist-band body-worn activity monitor and silhouette (body outline) sensors. The aim of HEmiSPHERE (Hip and KnEe Study of a Sensor Platform of HEalthcare in a Residential Environment) is to a) determine, the accuracy and feasibility of the sensory data as it compares to conventional assessment of health outcomes after surgery using patient self-reported questionnaires, and (b) to explore how the SPHERE system is useful for every day clinical decision making. Methods and Analysis: A feasibility study recruiting and installing the SPHERE system in the homes of up to 30 NHS adult patients as they undergo a THR or TKR. Through a mixed methods design the SPHERE system will monitor and record continuous measurements of daily behaviour. Main outcomes will assess the relationships between environmental, behavioural and movement data and the parameters of interest from the standard clinical assessments measuring patient outcomes over time. Patient interviews and focus groups with consultant orthopaedic surgeons will provide depth in understanding of the acceptability, feasibility and accuracy of the data.Ethics and Dissemination: Ethical approval has been obtained through South West – Central Bristol Research Ethics Committee (17/SW/0121). We aim to disseminate the findings through regional talks and seminars, international conferences and peer-reviewed journals and social medi
Healthcare professionals currently lack the means to gather unbiased and quantitative multi-modal... more Healthcare professionals currently lack the means to gather unbiased and quantitative multi-modal data about the long-term behaviors of patients in their home environments. SPHERE is a multi-modal platform of non-medical sensors for behavior monitoring in residential environments that aims to overcome this major limitation of healthcare provision through using the inherently cost-efficient and scalable technologies of the Internet of Things (IoT). One of SPHERE’s key tasks is to help to bring the next-generation low-power wireless networking and sensing technologies from the lab to the field by applying them in real-world environments. In this article we describe the highlights of SPHERE’s system requirements, architecture, practical challenges, as well as of the design and deployment lessons learned. By leveraging novel IoT technologies such as the IEEE 802.15.4 TSCH network protocol, SPHERE has achieved successful initial deployments in twelve volunteer houses at the time of writing.
The SPHERE project has developed a multi-modal sensor platform for health and behavior monitoring... more The SPHERE project has developed a multi-modal sensor platform for health and behavior monitoring in residential environments. So far, the SPHERE platform has been deployed for data collection in approximately 50 homes for duration up to one year. This document describes the format of the SPHERE dataset, as well as the known and potential data quality problems, their workarounds, and other information important to people working with the SPHERE data, software, and hardware.
This paper describes HyperStream, a large-scale, flexible and robust software package, written in... more This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments. HyperStream is a general purpose tool that is well-suited for the design, development, and deployment of Machine Learning algorithms and predictive models in a wide space of sequential predictive problems. Source code, installation instructions, examples, and documentation can be found at: https://github.com/IRC-SPHERE/HyperStream.
ABSTRACT In this paper we present a system called paperBase that aids users in entering metadata ... more ABSTRACT In this paper we present a system called paperBase that aids users in entering metadata for preprints. PaperBase extracts metadata from the preprint. Using a Dublin-Core based REST API, third-party repository software populates a web form that the user can then proofread ...
With increasing public interest in the area of historical climate change and in models of climate... more With increasing public interest in the area of historical climate change and in models of climate change in general, comes a corresponding increase in the importance of maintaining open, accessible and usable research data repositories. In this paper, we introduce an e-Science data repository containing extensive research data from palaeoclimatology. Initially designed to support internal collaboration and organise data, the sharing of research outputs became an increasingly significant role for the service over several years of practical use. We report on a data preservation and interoperability assessment currently under way. Finally, we discuss the ongoing significance of open research data and capacity for analysis in the area of climate research, with palaeoclimatology as a case study.
Ariadne, 2005
Software use cases are necessarily incomplete, a failing which seems to intensify in reverse prop... more Software use cases are necessarily incomplete, a failing which seems to intensify in reverse proportion to the degree of simplicity in the software in question. Complex software responds to a given set of requirements, simple software as a partial solution to a much broader ...
Bulletin of the American …, 2012
Ariadne, 2008
Persistent identifiers (PIs) are simply maintainable identifiers that allow us to refer to a digi... more Persistent identifiers (PIs) are simply maintainable identifiers that allow us to refer to a digital object a file or set of files, such as an e-print (article, paper or report), an image or an installation file for a piece of software. The only interesting persistent identifiers are also persistently ...
In this paper we study how to provide metadata for a pre-print archive. Metadata includes, but is... more In this paper we study how to provide metadata for a pre-print archive. Metadata includes, but is not limited to, title, authors, citations, and keywords, and is used to both present data to the user in a meaningful way, and to index and cross-reference the pre-prints. We are particularly interested in studying different methods to obtain metadata for a pre-print. We have developed a system that automatically extracts metadata, and that allows the user to verify and correct metadata before it is accepted by the system.
International Conference on Dublin Core and …, 2009
As an emerging technology, the Web is full of unique challenges for developers, designers and e... more As an emerging technology, the Web is full of unique challenges for developers, designers and engineers. Its use for increasingly complex applications such as e-commerce and banking, involving connection to many data sources, highlighted a number of common ...
There are multiple areas in knowledge management research, including an interest in the area of l... more There are multiple areas in knowledge management research, including an interest in the area of language development from a social networking perspective, but this is often ignored in favor of a strictly structured ontological model. This poster brings together perspectives on the linkage between the social theory of language development and that of ontology development, sketching out an approach that links user contributions, historical data from the candidate domain and information retrieved via automated machine learning and pattern recognition across the candidate domain in an iterative design process . To facilitate evaluation of this model, we describe its use via a case study demonstrating ontology development and evolution across a dataset of e-prints, examining and evaluating how social and contextual backgrounds play into ontological development.
D-lib Magazine, 2006
A folksonomy is a type of distributed classification system. It is usually created by a group of ... more A folksonomy is a type of distributed classification system. It is usually created by a group of individuals, typically the resource users. Users add tags to online items, such as images, videos, bookmarks and text. These tags are then shared and sometimes refined. In this ...
This paper describes an investigation of user-centred design methodologies intended to apply to m... more This paper describes an investigation of user-centred design methodologies intended to apply to metadata or information architecture evaluation and deployment. The primary focus of this work is investigation of user conceptual models and comparison with formally architected models. We describe related work, primarily from the domain of information architecture, such as free-listing, contextual enquiry, card-sorting and evaluation, and then describes the design, initial evaluation and practical use of a multi-stage prototyping method designed for elicitation of user knowledge and concepts of a domain, common conceptual models in that domain and the objects, collections and relations between objects considered relevant by users. A simple approach to the analysis of results is described.
Video streaming and videoconferencing technology is now attainable using inexpensive and widely a... more Video streaming and videoconferencing technology is now attainable using inexpensive and widely available equipment. This paper uses of a set of case studies conducted at a recent conference in the UK to investigate the technical and organizational issues related to differing approaches to the technology. Two approaches, videoconferencing over the Access Grid with VRVS and a simple mono-directional video stream, were used back-to-back. Effectiveness, scalability and applicability of each approach are compared in various applications. In each case, a synchronous but asymmetric feedback channel was made available, making use of a modality of lower bandwidth; a simple, moderated IRC chat system. Asynchronous feedback was also collected post factum using blogs and content distribution services such as Flickr. Feedback from users of each channel is analysed, and recommendations are given for future use of video streaming in conferences, workshops and interactive events. Relevant current research and opportunities for future work are identified.
Labelling user data is a central part of the design and evaluation of pervasive systems that aim ... more Labelling user data is a central part of the design and evaluation of pervasive systems that aim to support the user through situation-aware reasoning. It is essential both in designing and training the system to recognise and reason about the situation, either through the definition of a suitable situation model in knowledge-driven applications, or through the preparation of training data for learning tasks in data-driven models. Hence, the quality of annotations can have a significant impact on the performance of the derived systems. Labelling is also vital for validating and quantifying the performance of applications. In particular, comparative evaluations require the production of benchmark datasets based on high-quality and consistent annotations. With pervasive systems relying increasingly on large datasets for designing and testing models of users' activities, the process of data labelling is becoming a major concern for the community. In this work we present a qualitative and quantitative analysis of the challenges associated with annotation of user data and possible strategies towards addressing these challenges. The analysis was based on the data gathered during the 1st International Workshop on Annotation of useR Data for UbiquitOUs Systems (ARDUOUS) and consisted of brainstorming as well as annotation and questionnaire data gathered during the talks, poster session, live annotation session, and discussion session.
Delivering effortless interactions and appropriate interventions through pervasive systems requir... more Delivering effortless interactions and appropriate interventions through pervasive systems requires making sense of multiple streams of sensor data. This is particularly challenging when these concern people’s natural behaviours in the real world. This paper takes a multidisciplinary perspective of annotation and draws on an exploratory study of 12 people, who were encouraged to use a multi-modal annotation app while living in a prototype smart home. Analysis of the app usage data and of semi-structured interviews with the participants revealed strengths and limitations regarding self-annotation in a naturalistic context. Handing control of the annotation process to research participants enabled them to reason about their own data, while generating accounts that were appropriate and acceptable to them. Self-annotation provided participants an opportunity to reflect on themselves and their routines, but it was also a means to express themselves freely and sometimes even a backchannel to communicate playfully with the researchers. However, self-annotation may not be an effective way to capture accurate start and finish times for activities, or location associated with activity information. This paper offers new insights and recommendations for the design of self-annotation tools for deployment in the real world.
Ubiquitous eHealth systems based on sensor technologies are seen as key enablers in the effort to... more Ubiquitous eHealth systems based on sensor technologies are seen as key enablers in the effort to reduce the financial impact of an ageing society. At the heart of such systems sit activity recognition algorithms, which need sensor data to reason over, and a ground truth of adequate quality used for training and validation purposes. The large set up costs of such research projects and their complexity limit rapid developments in this area. Therefore, information sharing and reuse, especially in the context of collected datasets, is key in overcoming these barriers. One approach which facilitates this process by reducing ambiguity is the use of ontologies. This article presents a hierarchical ontology for activities of daily living (ADL), together with two use cases of ground truth acquisition in which this ontology has been successfully utilised. Requirements placed on the ontology by ongoing work are discussed.
Introduction: Over 160,000 people with severe hip or knee pain caused by osteoarthritis undergo t... more Introduction: Over 160,000 people with severe hip or knee pain caused by osteoarthritis undergo total hip (THR) or knee replacement (TKR) surgery each year in the UK within the National Health Service (NHS) and this number is expected to increase. Innovative approaches to evaluating surgical outcomes will be needed to respond to the increasing burden of joint replacement surgery. The Sensor Platform for Healthcare in a Residential Environment, Interdisciplinary Research Collaboration (SPHERE-IRC) have developed a system of sensors that can monitor the health-related behaviours of people living at home. The system includes sensors for the home environment (measuring temperature, humidity, room occupancy, water and electricity usage) a wrist-band body-worn activity monitor and silhouette (body outline) sensors. The aim of HEmiSPHERE (Hip and KnEe Study of a Sensor Platform of HEalthcare in a Residential Environment) is to a) determine, the accuracy and feasibility of the sensory data as it compares to conventional assessment of health outcomes after surgery using patient self-reported questionnaires, and (b) to explore how the SPHERE system is useful for every day clinical decision making. Methods and Analysis: A feasibility study recruiting and installing the SPHERE system in the homes of up to 30 NHS adult patients as they undergo a THR or TKR. Through a mixed methods design the SPHERE system will monitor and record continuous measurements of daily behaviour. Main outcomes will assess the relationships between environmental, behavioural and movement data and the parameters of interest from the standard clinical assessments measuring patient outcomes over time. Patient interviews and focus groups with consultant orthopaedic surgeons will provide depth in understanding of the acceptability, feasibility and accuracy of the data.Ethics and Dissemination: Ethical approval has been obtained through South West – Central Bristol Research Ethics Committee (17/SW/0121). We aim to disseminate the findings through regional talks and seminars, international conferences and peer-reviewed journals and social medi
Healthcare professionals currently lack the means to gather unbiased and quantitative multi-modal... more Healthcare professionals currently lack the means to gather unbiased and quantitative multi-modal data about the long-term behaviors of patients in their home environments. SPHERE is a multi-modal platform of non-medical sensors for behavior monitoring in residential environments that aims to overcome this major limitation of healthcare provision through using the inherently cost-efficient and scalable technologies of the Internet of Things (IoT). One of SPHERE’s key tasks is to help to bring the next-generation low-power wireless networking and sensing technologies from the lab to the field by applying them in real-world environments. In this article we describe the highlights of SPHERE’s system requirements, architecture, practical challenges, as well as of the design and deployment lessons learned. By leveraging novel IoT technologies such as the IEEE 802.15.4 TSCH network protocol, SPHERE has achieved successful initial deployments in twelve volunteer houses at the time of writing.
The SPHERE project has developed a multi-modal sensor platform for health and behavior monitoring... more The SPHERE project has developed a multi-modal sensor platform for health and behavior monitoring in residential environments. So far, the SPHERE platform has been deployed for data collection in approximately 50 homes for duration up to one year. This document describes the format of the SPHERE dataset, as well as the known and potential data quality problems, their workarounds, and other information important to people working with the SPHERE data, software, and hardware.
This paper describes HyperStream, a large-scale, flexible and robust software package, written in... more This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments. HyperStream is a general purpose tool that is well-suited for the design, development, and deployment of Machine Learning algorithms and predictive models in a wide space of sequential predictive problems. Source code, installation instructions, examples, and documentation can be found at: https://github.com/IRC-SPHERE/HyperStream.
ABSTRACT In this paper we present a system called paperBase that aids users in entering metadata ... more ABSTRACT In this paper we present a system called paperBase that aids users in entering metadata for preprints. PaperBase extracts metadata from the preprint. Using a Dublin-Core based REST API, third-party repository software populates a web form that the user can then proofread ...
With increasing public interest in the area of historical climate change and in models of climate... more With increasing public interest in the area of historical climate change and in models of climate change in general, comes a corresponding increase in the importance of maintaining open, accessible and usable research data repositories. In this paper, we introduce an e-Science data repository containing extensive research data from palaeoclimatology. Initially designed to support internal collaboration and organise data, the sharing of research outputs became an increasingly significant role for the service over several years of practical use. We report on a data preservation and interoperability assessment currently under way. Finally, we discuss the ongoing significance of open research data and capacity for analysis in the area of climate research, with palaeoclimatology as a case study.
Ariadne, 2005
Software use cases are necessarily incomplete, a failing which seems to intensify in reverse prop... more Software use cases are necessarily incomplete, a failing which seems to intensify in reverse proportion to the degree of simplicity in the software in question. Complex software responds to a given set of requirements, simple software as a partial solution to a much broader ...
Bulletin of the American …, 2012
Ariadne, 2008
Persistent identifiers (PIs) are simply maintainable identifiers that allow us to refer to a digi... more Persistent identifiers (PIs) are simply maintainable identifiers that allow us to refer to a digital object a file or set of files, such as an e-print (article, paper or report), an image or an installation file for a piece of software. The only interesting persistent identifiers are also persistently ...
In this paper we study how to provide metadata for a pre-print archive. Metadata includes, but is... more In this paper we study how to provide metadata for a pre-print archive. Metadata includes, but is not limited to, title, authors, citations, and keywords, and is used to both present data to the user in a meaningful way, and to index and cross-reference the pre-prints. We are particularly interested in studying different methods to obtain metadata for a pre-print. We have developed a system that automatically extracts metadata, and that allows the user to verify and correct metadata before it is accepted by the system.
International Conference on Dublin Core and …, 2009
As an emerging technology, the Web is full of unique challenges for developers, designers and e... more As an emerging technology, the Web is full of unique challenges for developers, designers and engineers. Its use for increasingly complex applications such as e-commerce and banking, involving connection to many data sources, highlighted a number of common ...
There are multiple areas in knowledge management research, including an interest in the area of l... more There are multiple areas in knowledge management research, including an interest in the area of language development from a social networking perspective, but this is often ignored in favor of a strictly structured ontological model. This poster brings together perspectives on the linkage between the social theory of language development and that of ontology development, sketching out an approach that links user contributions, historical data from the candidate domain and information retrieved via automated machine learning and pattern recognition across the candidate domain in an iterative design process . To facilitate evaluation of this model, we describe its use via a case study demonstrating ontology development and evolution across a dataset of e-prints, examining and evaluating how social and contextual backgrounds play into ontological development.
D-lib Magazine, 2006
A folksonomy is a type of distributed classification system. It is usually created by a group of ... more A folksonomy is a type of distributed classification system. It is usually created by a group of individuals, typically the resource users. Users add tags to online items, such as images, videos, bookmarks and text. These tags are then shared and sometimes refined. In this ...
This paper describes an investigation of user-centred design methodologies intended to apply to m... more This paper describes an investigation of user-centred design methodologies intended to apply to metadata or information architecture evaluation and deployment. The primary focus of this work is investigation of user conceptual models and comparison with formally architected models. We describe related work, primarily from the domain of information architecture, such as free-listing, contextual enquiry, card-sorting and evaluation, and then describes the design, initial evaluation and practical use of a multi-stage prototyping method designed for elicitation of user knowledge and concepts of a domain, common conceptual models in that domain and the objects, collections and relations between objects considered relevant by users. A simple approach to the analysis of results is described.
Video streaming and videoconferencing technology is now attainable using inexpensive and widely a... more Video streaming and videoconferencing technology is now attainable using inexpensive and widely available equipment. This paper uses of a set of case studies conducted at a recent conference in the UK to investigate the technical and organizational issues related to differing approaches to the technology. Two approaches, videoconferencing over the Access Grid with VRVS and a simple mono-directional video stream, were used back-to-back. Effectiveness, scalability and applicability of each approach are compared in various applications. In each case, a synchronous but asymmetric feedback channel was made available, making use of a modality of lower bandwidth; a simple, moderated IRC chat system. Asynchronous feedback was also collected post factum using blogs and content distribution services such as Flickr. Feedback from users of each channel is analysed, and recommendations are given for future use of video streaming in conferences, workshops and interactive events. Relevant current research and opportunities for future work are identified.
arXiv (Cornell University), Aug 7, 2019
This paper describes HyperStream, a large-scale, flexible and robust software package, written in... more This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments. HyperStream is a general purpose tool that is well-suited for the design, development, and deployment of Machine Learning algorithms and predictive models in a wide space of sequential predictive problems. Source code, installation instructions, examples, and documentation can be found at: https://github.com/IRC-SPHERE/HyperStream.
and include the following information in your message: • Your contact details • Bibliographic det... more and include the following information in your message: • Your contact details • Bibliographic details for the item, including a URL • An outline of the nature of the complaint On receipt of your message the Open Access Team will immediately investigate your claim, make an initial judgement of the validity of the claim and, where appropriate, withdraw the item in question from public view.
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
We present a case study that informs the creation of a 'companion guide' providing transp... more We present a case study that informs the creation of a 'companion guide' providing transparency to potential non-expert users of a ubiquitous machine learning (ML) platform during the initial onboarding. Ubiquitous platforms (e.g., smart home systems, including smart meters and conversational agents) are increasingly commonplace and increasingly apply complex ML methods. Understanding how non-ML experts comprehend these platforms is important in supporting participants in making an informed choice about if and how they adopt these platforms. To aid this decision-making process, we created a companion guide for a home health platform through an iterative user-centred-design process, seeking additional input from platform experts at all stages of the process to ensure the accuracy of explanations. This user-centred and expert informed design process highlights the need to present the platform's entire ecosystem at an appropriate level for those with differing backgrounds t...
2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops)
This paper looks to explore the challenges faced when producing a set of annotations from videos ... more This paper looks to explore the challenges faced when producing a set of annotations from videos produced by a pilot study evaluating 24 participants (12 with Parkinson's disease, each accompanied by a healthy volunteer control participant) who are free-living in a house embedded with a platform of sensors. We discuss the outcome measures chosen to annotate from the videos and the controlled vocabularies formulated for this task, the tools and processes, how we intend to achieve standardisation and normalisation of the annotations, and how to improve quality and re-usability of the annotation dataset.
This paper presents an exploratory study, which uses dynamic social network analysis of posts fro... more This paper presents an exploratory study, which uses dynamic social network analysis of posts from the Tumblr blogging site relating to the Tate galleries to observe user community change. The findings of this research were presented at the 1st Int. Workshop on Semantic Change & Evolving Semantics (SuCCESS'16) organised by PERICLES partners to explore emerging research in the areas of semantic change and evolving semantics.
The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the ex... more The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperati ...
2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), 2020
In hardware deployments, it is often necessary to test platforms for suitability for particular p... more In hardware deployments, it is often necessary to test platforms for suitability for particular purposes. As lengthy data collection processes often outlive specific iterations of hardware and firmware, it is likely that migration between platforms may become necessary. In this short paper we describe a practical approach employed for acceptance testing, comparison and validation of two iterations of a wearable accelerometer and localisation platform, based on an annotated 15-task activity script. We present an analysis on the data generated for the different activities, and compare device performance using common machine learning algorithms for activity recognition.
The UK health service sees around 160,000 total hip or knee replacements every year and this numb... more The UK health service sees around 160,000 total hip or knee replacements every year and this number is expected to rise. Expectations of surgical outcome are changing alongside demographic trends, whilst aftercare may be fractured as a result of resource limitation or other factors. Conventional assessments of health outcomes must evolve to keep up with these changing trends. In practice, patients may visit a health care professional to discuss recovery and will provide survey feedback to clinicians using standardised instruments, such as the Oxford Hip & Knee score, in the months following surgery. To aid clinicians in providing accurate assessment of patient recovery a continuous home health care monitoring system would be beneficial. In this paper the authors explore how the SPHERE sensor network can be used to automatically generate measures of recovery from arthroplasty to facilitate continuous monitoring of behaviour, including location, room transitions, movement and activity...
2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), 2020
Creation and testing of methods, tools, vocabularies, taxonomies, and ontologies for annotation o... more Creation and testing of methods, tools, vocabularies, taxonomies, and ontologies for annotation of user data have been reported in many places, including prior editions of this workshop [1]. How can we work together to increase the impact and visibility of these outcomes? In this panel, we discuss means to develop the sustainability, reusability and ultimately the impact of our work. In so doing, we draw insight from the approaches taken in other domains. For example, the fields of metadata and subject indexing publish and regularly update design artefacts such as taxonomies, ontologies and knowledge schemas. The sustainability of data publication is limited by the long-term availability of the platforms on which they are published and the technologies on which they depend. A prior study has found that many knowledge structures are not formally published, leading to low preservation of these artefacts [2]. Standardisation efforts may significantly increase the likelihood of uptake a...
The current deliverable summarises the work conducted within task T4.4 of WP4, presenting our pro... more The current deliverable summarises the work conducted within task T4.4 of WP4, presenting our proposed models for semantically representing digital content and its respective context – the latter r ...
ArXiv, 2018
The SPHERE project has developed a multi-modal sensor platform for health and behavior monitoring... more The SPHERE project has developed a multi-modal sensor platform for health and behavior monitoring in residential environments. So far, the SPHERE platform has been deployed for data collection in approximately 50 homes for duration up to one year. This technical document describes the format and the expected content of the SPHERE dataset(s) under preparation. It includes a list of some data quality problems (both known to exist in the dataset(s) and potential ones), their workarounds, and other information important to people working with the SPHERE data, software, and hardware. This document does not aim to be an exhaustive descriptor of the SPHERE dataset(s); it also does not aim to discuss or validate the potential scientific uses of the SPHERE data.
Abbreviation / Acronym Meaning BPMN Business Process Model and Notation. A graphical language for... more Abbreviation / Acronym Meaning BPMN Business Process Model and Notation. A graphical language for describing processes. Cassandra A distributed database system which is part of the Apache foundation. 1 CDMI Cloud Data Management Interface is a protocol for accessing cloud storage. CEPH CEPH is a distributed file system. Content-based (or intellectual) appraisal Acquisition and retention decisions or assignment of value based on the content of the digital entities themselves. CQL Query language to access the Cassandra database. DBA Digital-born Archive Digital Ecosystem (DE) Network of technical systems, communities, digital objects, processes, policies, and the relations and interactions between them. This is the object of interest that is modelled with the Digital Ecosystem Model ontology. Digital ecosystem management Control layer to provide support and manage change in the digital ecosystem and its entities. In the scope of this task, the QA methods are supporting the validation of changes in the digital ecosystem with respect to policies and high value digital media. Digital Ecosystem Model (DEM) Ontology developed by the PERICLES project that allows to model Digital Ecosystems: technical systems, processes, digital objects, policies and users to answer and simulate change related questions. Digital Object "Digital objects (or digital materials) refer to any item that is available digitally." (JISC, "Definition of Digital Object") DoW Description of Work ERMR Entity Repository Model Repository this refers to the T5.1 component. iRODS The Integrated Rule-Oriented Data System (iRODS) is an open source data management software that virtualizes data storage resources. The application can be used for data management infrastructure building.
It is a familiar observation that digital cultural heritage brings with it new challenges. One su... more It is a familiar observation that digital cultural heritage brings with it new challenges. One such challenge is the effect of age on digital objects held within heritage databases, and on the array of materials that surround and support access to these resources. In this position paper, we discuss effects of long-term societal change on data preservation in digital cultural heritage, and present a means by which ongoing user modelling processes drawing on contemporary resources can support ‘just-in-time’ preemptive review of material to be presented to the public, as well as feeding into enhancement of data retrieval processes. We remark that similar issues and principles apply in contemporary information access contexts: for example, the processes of information sharing between expert practitioners and non-expert members of the public may exhibit similar effects. ACM Classification
2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), 2017
Reliably discerning human activity from sensor data is a nontrivial task in ubiquitous computing,... more Reliably discerning human activity from sensor data is a nontrivial task in ubiquitous computing, which is central to enabling smart environments. Ground-truth acquisition techniques for such environments can be broadly divided into observational and self-reporting approaches. In this paper we explore one self-reporting approach, using speech-enabled logging to generate ground-truth data. We report the results of a user study in which participants (N=12) used both a smart-watch and a smart-phone app to record their activities of daily living using primarily voice, then answered questionnaires comprising the System Usability Scale (SUS) as well as open ended questions about their experiences. Our findings indicate that even though user satisfaction with the voice-enabled activity logging apps was relatively high, this approach presented significant challenges regarding compliance, effectiveness, and privacy. We discuss the implications of these findings with a view to offering new insights and recommendations for designing systems for ground-truth acquisition 'in the wild'.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018
The SPHERE project is devoted to advancing eHealth in a smarthome context, and supports full-scal... more The SPHERE project is devoted to advancing eHealth in a smarthome context, and supports full-scale sensing and data analysis to enable a generic healthcare service. We describe, from a datascience perspective, our experience of taking the system out of the laboratory into more than thirty homes in Bristol, UK. We describe the infrastructure and processes that had to be developed along the way, describe how we train and deploy Machine Learning systems in this context, and give a realistic appraisal of the state of the deployed systems. CCS CONCEPTS • Computing methodologies → Machine learning; • Applied computing → Health informatics; • Hardware → Sensor applications and deployments; • Information systems → Sensor networks; Data streaming; • Social and professional topics → Remote medicine;
Springer Series on Cultural Computing, 2018
Accessible systems, in digital heritage as elsewhere, should 'speak the user's language'. However... more Accessible systems, in digital heritage as elsewhere, should 'speak the user's language'. However, over long time periods, this may change significantly, and the system must still keep track of it. Conceptualising and tracking change in a population may be achieved using a functional and computable model based on representative datasets. Such a model must encompass relevant characteristics in that population and support pre-defined functionality, such as the ability to track current trends in language use. Individual published viewpoints on any given platform may be observed in aggregate by means of a large-scale text mining approach. We have made use of social media platforms such as Twitter and Tumblr to collect statistical information about anonymous users' perspectives on cultural heritage items and institutions. Through longitudinal studies, it is possible to identify indicators pointing to an evolution of discourse surrounding cultural heritage items, and provide an estimate of trends relating to represented items and creators. We describe a functional approach to building useful models of shift in contemporary language use, using data collection across social networks. This approach is informed by existing theoretical approaches to modelling of semantic change. As a case study, we present a means by which such ongoing user modelling processes drawing on contemporary resources can support 'just-in-time' pre-emptive review of material to be presented to the public. We also show that this approach can feed into enhancement of the data retrieval processes.
Journal of Healthcare Informatics Research, 2020
The UK health service sees around 160,000 total hip or knee replacements every year and this numb... more The UK health service sees around 160,000 total hip or knee replacements every year and this number is expected to rise with an ageing population. Expectations of surgical outcomes are changing alongside demographic trends, whilst aftercare may be fractured as a result of resource limitations. Conventional assessments of health outcomes must evolve to keep up with these changing trends. Health outcomes may be assessed largely by self-report using Patient Reported Outcome Measures (PROMs), such as the Oxford Hip or Oxford Knee Score, in the months up to and following surgery. Though widely used, many PROMs have methodological limitations and there is debate about how to interpret results and definitions of clinically meaningful change. With the development of a home-monitoring system, there is opportunity to characterise the relationship between PROMs and behaviour in a natural setting and to develop methods of passive monitoring of outcome and recovery after surgery. In this paper, ...
Sensors, 2018
Ubiquitous eHealth systems based on sensor technologies are seen as key enablers in the effort to... more Ubiquitous eHealth systems based on sensor technologies are seen as key enablers in the effort to reduce the financial impact of an ageing society. At the heart of such systems sit activity recognition algorithms, which need sensor data to reason over, and a ground truth of adequate quality used for training and validation purposes. The large set up costs of such research projects and their complexity limit rapid developments in this area. Therefore, information sharing and reuse, especially in the context of collected datasets, is key in overcoming these barriers. One approach which facilitates this process by reducing ambiguity is the use of ontologies. This article presents a hierarchical ontology for activities of daily living (ADL), together with two use cases of ground truth acquisition in which this ontology has been successfully utilised. Requirements placed on the ontology by ongoing work are discussed.
BMJ Open, 2018
IntroductionOver 160 000 people with severe hip or knee pain caused by osteoarthritis undergo tot... more IntroductionOver 160 000 people with severe hip or knee pain caused by osteoarthritis undergo total hip (THR) or knee replacement (TKR) surgery each year in the UK within the National Health Service (NHS), and this number is expected to increase. Innovative approaches to evaluating surgical outcomes will be needed to respond to the increasing burden of joint replacement surgery. The Sensor Platform for Healthcare in a Residential Environment, Interdisciplinary Research Collaboration (SPHERE-IRC) have developed a system of sensors that can monitor the health-related behaviours of people living at home. The system includes sensors for the home environment (measuring temperature, humidity, room occupancy, water and electricity usage), a wristband body-worn activity monitor and silhouette (body outline) sensors. The aim of HEmiSPHERE (Hip and knEe study of a Sensor Platform of HEalthcare in a Residential Environment) is to (1) determine the accuracy and feasibility of the sensory data a...
This paper looks to explore the challenges faced when producing a set of annotations from videos... more This paper looks to explore the challenges faced when producing a set of annotations from videos produced by a pilot study evaluating 24 participants (12 with Parkinson's disease, each accompanied by a healthy volunteer control participant) who are free-living in a house embedded with a platform of sensors. We discuss the outcome measures chosen to annotate from the videos and the controlled vocabularies formulated for this task, the tools and processes, how we intend to achieve standardisation and normalisation of the annotations, and how to improve quality and re-usability of the annotation dataset.
Motivated by the broader issues of open justice and access to justice, this paper explores the et... more Motivated by the broader issues of open justice and access to justice, this paper explores the ethical application of judicial analytics through the lens of an assessment of readability of written judicial decisions. To that end the paper aims 1) to review and reproduce for the UK context previous work that assesses readability of legal texts, and 2) to reflect critically on the ethical implications of applied judicial analytics. Focusing on the use case of assessing the readability of judicial Immigration and Asylum decisions in the UK, we put forward recommendations for ethical judicial analytics that aim to produce results that meet the needs of and are accepted by the stakeholders of the legal system.
It is often said that the 'S' in IoT stands for security. In a similar vein, the 'P' in the name ... more It is often said that the 'S' in IoT stands for security. In a similar vein, the 'P' in the name might be said to stand for privacy-first design. There is a large and challenging gap between functional adequacy and best practice.In this talk, we describe the process of developing the 'home gateway' for the SPHERE 100-homes project, a Linux-based research data aggregator installed into participant homes around Bristol in order to act as an endpoint for healthcare data collection on human participants.We begin by briefly describing the regulatory landscape that applies to human-centred research data. We tested and used open-source packages and services designed to fill as many gaps in our service design as possible. We also had to find solutions for the further, specific challenges raised by the particular requirements of the project, such as data encryption at rest, robust behaviour in the face of unexpected input or events, and auditable data workflows. Finally, we look at the everyday challenges of safely and securely maintaining a sustainable platform in the face of the risks posed by real-world vulnerabilities - patching, system updates and responding to new flaws discovered in standards, hardware and firmware.
The ongoing accessibility of digital material is challenged by the constantly changing environmen... more The ongoing accessibility of digital material is challenged by the constantly changing environment in which it exists. In particular, application profiles are threatened by a number of factors such as loss of context, social change and linguistic change. In this paper, we draw on observations taken from a number of application domains to build simple mathematical modelsfor community growth and change, to explore the impact of community structure on the sustainability model required for application profiles over time. Finally, we discuss the use of similar models in evaluating application profile sustainability in general, and lessons to be drawn for DCMI.
Understanding user communities and identifying change within them is important for a range of org... more Understanding user communities and identifying change within them is important for a range of organisations, including those concerned with cultural heritage. In this paper we present an exploratory study which uses dynamic social network analysis of posts from the Tumblr blogging site relating to the Tate galleries to observe user community change. In addition, we apply twoversions of topic modeling to the text of the posts in order to examine user community concerns and changes within these over time. In general, the most noticeable changes in topics within the user communities tends to occur when there has been a major physical change in the social network, such as an increase in membership, with these new members bringing new concerns andinterests. After summarising the findings of our approach in detail, we propose practical methods which could be incorporated in to real time monitoring of user community change by cultural heritage organisations.
Pervasive computing and, specifically, the Internet of Things aspire to deliver smart services an... more Pervasive computing and, specifically, the Internet of Things aspire to deliver smart services and effortless interactions for their users. Achieving this requires making sense of multiple streams of sensor data, which becomes particularly challenging when these concern people’s activities in the real world. In this paper we describe the exploration of different approaches that allow users to self-annotate their activities in near real-time, which in turn can be used as ground-truth to develop algorithms for automated and accurate activity recognition. We offer the lessons we learnt during each design iteration of a smart-phone app and detail how we arrived at our current approach to acquiring ground-truth data ‘in the wild’. In doing so, we uncovered tensions between researchers’ data annotation requirements and users’ interaction requirements, which need equal consideration if an acceptable self-annotation solution is to be achieved. We present an ongoing user study of a hybrid approach, which supports activity logging that is appropriate to different individuals and contexts.
Reliably discerning human activity from sensor data is a nontrivial task in ubiquitous computing,... more Reliably discerning human activity from sensor data is a nontrivial task in ubiquitous computing, which is central to enabling smart environments. Ground-truth acquisition techniques for such environments can be broadly divided into observational and self-reporting approaches. In this paper we explore one self-reporting approach, using speech-enabled logging to generate ground-truth data. We report the results of a user study in which participants (N=12) used both a smart-watch and a smart-phone app to record their activities of daily living using primarily voice, then answered questionnaires comprising the System Usability Scale (SUS) as well as open ended questions about their experiences. Our findings indicate that even though user satisfaction with the voice-enabled activity logging apps was relatively high, this approach presented significant challenges regarding compliance, effectiveness, and privacy. We discuss the implications of these findings with a view to offering new insights and recommendations for designing systems for ground-truth acquisition ’in the wild’.
The development of smart home and ambient healthcare systems benefits from the availability of gr... more The development of smart home and ambient healthcare systems benefits from the availability of ground truth data - annotations describing participant actions, such as current state, location or activity. Such annotations are useful to support building and testing systems to automatically extract this information from the data. This paper describes a prototype annotation system based on a Google Wear OS watch, usingtactile input via Graffiti-style on-screen handwriting, combined with unobtrusive haptic feedback to acknowledge the receipt of data.
The UK health service sees around 160,000 total hip or knee replacements every year and this numb... more The UK health service sees around 160,000 total hip or knee replacements every year and this number is expected to rise. Expectations of surgical outcome are changing alongside demographic trends, whilst aftercare may be fractured as a result of resource limitation or other factors. Conventional assessments of health outcomes must evolve to keep up with these changing trends. In practice, patients may visit a health care professional to discuss recovery and will provide survey feedback to clinicians using standardised instruments, such as the Oxford Hip & Knee score, in the months following surgery. To aid clinicians in providing accurate assessment of patient recovery a continuous home health care monitoring system would be beneficial. In this paper the authors explore how the SPHERE sensor network can be used to automatically generate measures of recovery from arthroplasty to facilitate continuous monitoring of behaviour, including location, room transitions, movement and activity; in terms of frequency and duration; in a domestic environment. The authors present a case study of data collected from a home equipped with the SPHERE sensor network. Machine learning algorithms are applied to a week of continuous observational data to generate insights into the domestic routine of the occupant. Testing of models shows that location and activity are classified with 86{\%} and 63{\%} precision, respectively.
Activity and decision-making around nutrition are important aspects of naturalistic human behavio... more Activity and decision-making around nutrition are important aspects of naturalistic human behaviour. This paper describes a participant-centric free-text annotation process intended to facilitate activity recognition in the kitchen environment. The resulting annotations are characterised. Data from the study are reviewed to establish the extent to which this annotation dataset and the sensor data it accompanies, including environmental, current and water usage, support estimation of meal complexity. Three metrics of meal complexity are identified. Finally, planned future work in this area is discussed.
The UK health service sees around 160,000 total hip or knee replacements every year and this numb... more The UK health service sees around 160,000 total hip or knee replacements every year and this number is expected to rise with an ageing population. Expectations of surgical outcomes are changing alongside demographic trends, whilst aftercare may be fractured as a result of resource limitations. Conventional assessments of health outcomes must evolve to keep up with these changing trends. Health outcomes may be assessed largely by self report using Patient Reported Outcome Measures (PROMs), such as the Oxford Hip or Oxford Knee Score, in the months up to and following surgery. Though widely used, many PROMs have methodological limitations and there is debate about how to interpret results and definitions of clinically meaningful change. With the development of a home-monitoring system, there is opportunity to characterise the relationship between PROMs and behaviour in a natural setting and to develop methods of passive monitoring of outcome and recovery after surgery. In this paper we discuss the motivation and technology used in long-term continuous observation of movement, sleep and domestic routine for healthcare applications, such as the HEmiSPHERE project for hip and knee replacement patients. In this case study, we evaluate trends evident in data of two patients, collected over a three months observation period post-surgery, by comparison to scores from PROMs for sleep and movement quality, and by comparison to a third control home. We find that accelerometer and indoor localisation data correctly highlights long-term trends in sleep and movement quality and can be used to predict sleep and wake times and measure sleep and wake routine variance over time, while indoor localisation provides context for the domesticroutine and mobility of the patient. Finally, we discuss a visual method of sharing findings with healthcare professionals.
We present a case study that informs the creation of a `companion guide' providing transparency t... more We present a case study that informs the creation of a `companion guide' providing transparency to potential non-expert users of a ubiquitous machine learning (ML) platform during the initial onboarding. Ubiquitous platforms (e.g., smart home systems, including smart meters and conversational agents) are increasingly commonplace and increasingly apply complex ML methods. Understanding how non-ML experts comprehend these platforms is important in supporting participants in making an informed choice about if and how they adopt these platforms. To aid this decision-making process, we created a companion guide for a home health platform through an iterative user-centred-design process, seeking additional input from platform experts at all stages of the process to ensure the accuracy of explanations. This user-centred and expert informed design process highlights the need to present the platform's entire ecosystem at an appropriate level for those with differing backgrounds to understand, in order to support informed consent and decision making.
Japanese translation of {"}Persistent Identifiers: Considering the options{"} 永続識別子(Persistent Id... more Japanese translation of {"}Persistent Identifiers: Considering the options{"} 永続識別子(Persistent Identifier)とはデジタルオブジェクト、すなわち、e-プリント(記事や論文、報告書)や画像ファイル、ソフトウェアインストールファイルなどをいつまでも参照できるただの識別子である。ここで興味のある永続識別子は同時に永続的に動作可能な(すなわち、「クリック」できる)ものだけである。しかし、永続識別子は単なるハイパーリンクとは異なり、リソースが他のサーバはおろか他の組織に移管されたとしても、リソースへのアクセスを継続的に提供しなければならない。デジタルオブジェクトは、様々な理由で移管、削除、改名される可能性がある。本論文では、永続識別子の現状を調査し、現在提供されているいくつかのサービスについて説明し、その構造と利用の背後にある理論的背景を考察する。また、各自の目的のために何らかの標準の採用を検討しているすべての人に関連すると思われる問題を提起する。
Over the decades, landmine causes a tremendous number of casualties and has negative effects on e... more Over the decades, landmine causes a tremendous number of casualties and has negative effects on economic development. The estimated time required to eradicate the landmines is over 700 years, and they remain active semi-permanently. Therefore, the rapid clearance of landmines is needed to be addressed.In this project, we propose the landmine detection methods by UAV (Unmanned Aerial Vehicle) and machine learning to improve the speed of the operation. UAV has advantages over the manual landmine clearance, in the sense that it can conduct detection faster than manual operation, without the risk of explosion. It also allows us to explore a vast extent of land. Machine learning can make a faster decision, and it is possible to achieve higher accuracy than humans. Therefore, the combination of UAV and machine learning has a high possibility of increasing the demining process{\textquoteright}s efficiency. We adopted synthetic datasets, transfer learning, and fine-tuning to increase the machine learning model{\textquoteright}s performance. With the synthetic dataset approach, we achieved 90.07\% mAP, 18.5\% higher than the existing research. In the experiment process, we generated a reusable dataset on landmine detection research and established the method to generate a synthetic dataset. By fine-tuning, we achieved a partial increase in the performance of detecting the specific object.We also demonstrated the mobile and the backend application, which allows us to detect the location of the landmine suspicious area and plot it on the map with the input of photos taken with UAV. With the proposed method, we detected all of the scattered landmines over 900㎡area in 3 minutes flight.
This deliverable, Basic Tools for Digital Ecosystem Management, describes the current state of th... more This deliverable, Basic Tools for Digital Ecosystem Management, describes the current state of the WP5 digital ecosystem (DE) research and tools developed for the PERICLES project, together with the concepts, models and software associated with them. The developed tools can be used independently or in combination with test scenarios, which will be part of the WP6 testbeds.
The PERICLES integration framework is designed for the flexible execution of varied and varying p... more The PERICLES integration framework is designed for the flexible execution of varied and varying processing and control components in typical preservation workflows, while itself being controllable by abstract models of the overall preservation system. It is the project’s focal point for connecting tools, models and application use-cases to demonstrate the potential of model-driven digital preservation.This final design for the integration framework has changed slightly from the initial version presented in PERICLES deliverable D6.1 [10]. We describe the changes and the reasons for them in the early chapters of this report.The integration framework is built from standard encapsulation technologies – Docker containers and RESTful web services – and controlled by a standard workflow environment – jBPM controlled by the Jenkins continuous integration system. On this execution layer, arbitrary workflows representing digital preservation activities can be deployed, run and evaluated. Standard tools – mediainfo, bagit, fido and so on – can be encapsulated and deployed, as can new preservation tools developed within the project.Two new subsystems have been designed to couple the workflow execution layer to the abstract models developed through the research activities of the project: the Process Compiler (PC) and the Entity Registry-Model Repository (ERMR). The ERMR also provides the key link to the Linked Resource Model Service, an external semantic reasoning service under development by partner XEROX Research. These two subsystems provide the means to couple powerful semantic reasoning and policy-driven models to a “live” digital preservation system.The API designs and technology choices for the test bed are now settled and implementation of the underlying (standard) test bed infrastructure is complete. The APIs and communication patterns are based on RESTful web services and JSON payloads and are described in detail in Section 5 and in the appendices.Implementation of the new ERMR and PC subsystems is well underway. The focus for the integrated test bed over the final stages of the project will be on demonstrating the full end-to-end power of the model-driven preservation approach through the implementation of key application scenarios using models, tools and components drawn from across the PERICLES project. Examples of suchscenarios are given in the appendices.
The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the ex... more The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner.Additionally, the deliverable discusses novel techniques for retrieving and analysing the context of use of digital objects. Although this topic has not been extensively studied by existing literature, we believe use context is vital in augmenting the semantic information and maintaining the usability and preservability of the digital objects, as well as their ability to be accurately interpreted as initially intended.
The current deliverable summarises the work conducted within task T4.4 of WP4, presenting our pro... more The current deliverable summarises the work conducted within task T4.4 of WP4, presenting our proposed models for semantically representing digital content and its respective context– the latter refers to any information coming from the environment of the digital object (DO) that offers a better insight into the object’s status, its interrelationships with other content items and information about the object’s context of use.Within PERICLES,we refer to the content semantics enriched with the contextual perspective as “contextualised semantics”. The deliverable presents two complementary modelling approaches, based respectively on (a) ontologies and logics, and, (b) multivariate statistics. Additionally, D4.4 also studies semantic change and discusses our proposed methodologies for its detection, measurement and nterpretation, presenting a set of relevant experiments with different aspects of partner data aiming at visualising and finding solutions to semantic drifts.
The ProjectFor much of the later Middle Ages, south-western France (Aquitaine) was under English ... more The ProjectFor much of the later Middle Ages, south-western France (Aquitaine) was under English rule. Every year from 1273 a ‘Gascon roll’ was drawn up by the English royal administration, recording a wide range of business and mentioning many people and places. The rolls were continued until 1468 even though the area was lost by the English in 1453, and are to be found today in The National Archives at Kew in class C 61. The rolls are almost entirely written in Latin. Some entries are in Anglo-Norman (French), generally forming transcripts of documents recited in entries on the rolls, or copied verbatim in the form of confirmations of documents issued by previous rulers or by officers of the administration in Aquitaine.In 2009 a project began to produce an on-line calendar of the rolls. This was funded by the AHRC and led by Dr Malcolm Vale (Oxford), Dr Paul Booth (Liverpool), and Paul Spence (Department of Digital Humanities, King’s College London). The site currently contains full calendars for the last ten years of the reign of Edward II (1317-27), and for many years of the first half of the reign of Edward III, produced in this project.In 2012 funding was gained from the Laboratoire d’excellence LaScArBx, the Banque Num{\'e}rique des Savoirs d’Aquitaine, the Ch{\^a}teau Ausone (Saint-{\'E}milion) and Jonathan Sumption. This has enabled the continuation of the project and the development of a parallel French site, coordinated by Professor Fr{\'e}d{\'e}ric Boutoulle and Emeritus Professor Fran{\cc}oise Lain{\'e} (Bordeaux) and Paul Spence.Funding was awarded by the Leverhulme Trust for a two year project from 1 May 2013, led by Professor Anne Curry (Southampton) and Dr Philip Morgan (Keele), and Paul Spence (King’s College London). It also involved the active collaboration of the Universit{\'e} Bordeaux Montaigne (formerly Universit{\'e} Michel de Montaigne-Bordeaux 3) and UMR Ausonius. Research associates Dr Simon Harris and Dr Guilhem P{\'e}pin - the former being helped voluntarily by Nigel Coulton, a skilled palaeographer and Latinist, and the latter by Fran{\cc}oise Lain{\'e}, emeritus professor of the Universit{\'e} Bordeaux-Montaigne - and a research team based at the Department of Digital Humanities, King’s College London, all involved in the 2009 and 2012 projects, also worked on the Leverhulme project. The project also continue its association with the Ranulf Higden Society, a group of learned individuals whose work editing one of the earlier Gascon Rolls lay in part behind the ideas for the original projects, and whose full editions appear on the website.Resources on CKANThe Gascon Rolls CKAN Dataset makes available the following resources: The set of rolls in TEI-XML format; An Eatsml file in XML containing the set of entity records (for persons, places, etc.) out of which the indexes and search available on the project site are generated; The project guidelines for encoding the calendared versions of the rolls into TEI-XML.Project team: Frank Byrne Paul Caton Nigel Coulton Jon Denton Nathan Dobson Elise Dud{\'e}zert Dilys Firn Eric Foster John Gowling Nicholas A. Gribit Simon Harris Catherine Howarth Neil Jakeman Adrian Jobson Maureen Jurkowski Faith Lawrence Margaret Lynch Jonathan Mackman Nelly Martin Christa Mee Eleonora Litta Modignani Jamie Norrish Guilhem P{\'e}pin Elena Pierazzo Nathalie Pr{\'e}vot James Ross Jean Sibers Emma Tonkin Paul Vetch Jos{\'e} Miguel Vieira Raffaele Viglianti Prue Vipond Chris Watson
When considering the range of legal and ethical issues that can arise from text/data mining pract... more When considering the range of legal and ethical issues that can arise from text/data mining practices in academic research, the comparative paucity of literature addressing those issues, as well as the apparent lack of any community or discipline-generated ethical framework or initiative, is striking. It is suggested that while technical expertise in this space may be developing apace, and there is increasing recognition of its potential economic and commercial importance, that academic data/text mining researchers would be remiss not to seize the opportunity, as other research communities have done, to seek to ensure that the legal and ethics research paradigm within which their institutions want them to operate, appropriately reflects the contexts and risks actually applicable to their work.
Accessible systems, in digital heritage as elsewhere, should ‘speak the user’s language’. However... more Accessible systems, in digital heritage as elsewhere, should ‘speak the user’s language’. However, over long time periods, this may change significantly, and the system must still keep track of it. Conceptualising and tracking change in a population may be achieved using a functional and computable model based on representative datasets. Such a model must encompass relevant characteristics in that population and support predefined functionality, such as the ability to track current trends in language use. Individual published viewpoints on any given platform may be observed in aggregate by means of a large-scale text mining approach. We have made use of social media platforms such as Twitter and Tumblr to collect statistical information about anonymous users’ perspectives on cultural heritage items and institutions. Through longitudinal studies, it is possible to identify indicators pointing to an evolution of discourse surrounding cultural heritage items, and provide an estimate of trends relating to represented items and creators. We describe a functional approach to building useful models of shift in contemporary language use, using data collection across social networks. This approach is informed by existing theoretical approaches to modelling of semantic change. As a case study, we present a means by which such ongoing user modelling processes drawing on contemporary resources can support ‘just-in-time’ pre-emptive review of material to be presented to the public. We also show that this approach can feed into enhancement of the data retrieval processes.
The aim of mapping IESR to UDDI, has proven to be reasonable, and the technologies available are ... more The aim of mapping IESR to UDDI, has proven to be reasonable, and the technologies available are adequate for the task. However, a great deal of work still remains, mostly in defining a set of tModels that are sufficiently extensive to cover the task at hand. The technical side of the problem is essentially solved. Yet this is not the most significant question, as the major issue is to evaluate UDDI itself. The ability of a UDDI implementation to act as a repository for this data is not in doubt; rather, the usability of the result remains questionable, as does the appropriateness of the approach for IESR purposes.
Una folksonomia un tipo di sistema di classificazione distribuito ed tipicamente creato da un gru... more Una folksonomia un tipo di sistema di classificazione distribuito ed tipicamente creato da un gruppo di individui ovvero dagli utenti stessi di tale sistema. Gli utenti etichettano con i tags tutto ci che trovano online, come immagini, video, bookmarks e testi. Questi tag vengono quindi condivisi, aggiornati e rivisti. In questo articolo vedremo cosa effettivamente fanno le folksonomie.
Una folcsonom{\'i}a es un tipo de sistema de clasificaci{\'o}n distribuida. Generalmente es cread... more Una folcsonom{\'i}a es un tipo de sistema de clasificaci{\'o}n distribuida. Generalmente es creada por un grupo de individuos, t{\'i}picamente los usuarios de recursos. Los usuarios agregan tags (etiquetas) a los {\'i}tems online, tales como im{\'a}genes, videos, marcadores y texto. Esos tags son entonces compartidos y algunas veces refinados. En este art{\'i}culo examinaremos qu{\'e} es lo que hace que funcionen las folcsonom{\'i}as.
We present preliminary experiments towards extracting error of law findings and outcome from seco... more We present preliminary experiments towards extracting error of law findings and outcome from second-instance judicial decisions. The overall aim of the PhD is to use ML/NLP approaches to quantify error patterns over time as found in decisions of the United Kingdom's Upper Tribunal Immigration and Asylum Chamber in order to 1) gain a better understanding of the corrective mechanism between first and second instance courts, and 2) to identify possible patterns of training needs that could usefully inform first instance judicial training. Running several simple binary classifiers, we find best performance (average ROC AUC score of 0.82) in a five-fold cross-validation k-Nearest Neighbor Model. We discuss challenges in error extraction and plans for future work.