FAIR environmental and health registry (FAIREHR)- supporting the science to policy interface and life science research, development and innovation (original) (raw)

From Raw Data to FAIR Data: The FAIRification Workflow for Health Research

Methods of Information in Medicine, 2020

Background FAIR (findability, accessibility, interoperability, and reusability) guiding principles seek the reuse of data and other digital research input, output, and objects (algorithms, tools, and workflows that led to that data) making them findable, accessible, interoperable, and reusable. GO FAIR - a bottom-up, stakeholder driven and self-governed initiative - defined a seven-step FAIRification process focusing on data, but also indicating the required work for metadata. This FAIRification process aims at addressing the translation of raw datasets into FAIR datasets in a general way, without considering specific requirements and challenges that may arise when dealing with some particular types of data. Objectives This scientific contribution addresses the architecture design of an open technological solution built upon the FAIRification process proposed by “GO FAIR” which addresses the identified gaps that such process has when dealing with health datasets. Methods A common FA...

FAIR4Health: Findable, Accessible, Interoperable and Reusable data to foster Health Research

Open Research Europe, 2022

Due to the nature of health data, its sharing and reuse for research are limited by ethical, legal and technical barriers. The FAIR4Health project facilitated and promoted the application of FAIR principles in health research data, derived from the publicly funded health research initiatives to make them Findable, Accessible, Interoperable, and Reusable (FAIR). To confirm the feasibility of the FAIR4Health solution, we performed two pathfinder case studies to carry out federated machine learning algorithms on FAIRified datasets from five health research organizations. The case studies demonstrated the potential impact of the developed FAIR4Health solution on health outcomes and social care research. Finally, we promoted the FAIRified data to share and reuse in the European Union Health Research community, defining an effective EU-wide strategy for the use of FAIR principles in health research and preparing the ground for a roadmap for health research institutions to offer access to ...

Sharing SRP data to reduce environmentally associated disease and promote transdisciplinary research

Reviews on Environmental Health

The National Institute of Environmental Health Sciences (NIEHS) Superfund Basic Research and Training Program (SRP) funds a wide range of projects that span biomedical, environmental sciences, and engineering research and generate a wealth of data resulting from hypothesis-driven research projects. Combining or integrating these diverse data offers an opportunity to uncover new scientific connections that can be used to gain a more comprehensive understanding of the interplay between exposures and health. Integrating and reusing data generated from individual research projects within the program requires harmonization of data workflows, ensuring consistent and robust practices in data stewardship, and embracing data sharing from the onset of data collection and analysis. We describe opportunities to leverage data within the SRP and current SRP efforts to advance data sharing and reuse, including by developing an SRP dataset library and fostering data integration through Data Managem...

Linking complex disease and exposure data—insights from an environmental and occupational health study

Journal of Exposure Science & Environmental Epidemiology, 2022

The disparate measurement protocols used to collect study data are an intrinsic barrier to combining information from environmental health studies. Using standardized measurement protocols and data standards for environmental exposures addresses this gap by improving data collection quality and consistency. To assess the prevalence of environmental exposures in National Institutes of Health (NIH) public data repositories and resources and to assess the commonality of the data elements, we analyzed clinical measures and exposure assays by comparing the Caribbean Consortium for Research in Environmental and Occupational Health study with selected NIH environmental health resources and studies. Our assessment revealed that (1) environmental assessments are widely collected in these resources, (2) biological assessments are less prevalent, and (3) NIH resources can help identify common data for meta-analysis. We highlight resources to help link environmental exposure data across studies...

FAIRness for HL7 FHIR: supporting interoperability of health datasets

2021

The FAIR (Finable Accessible Interoperable Reusable) principles have been established as best practice in the generation of health datasets for open science, research, and innovation. The FAIR4Health project (fair4health.eu) aims to encourage FAIRification and reuse of research data generated by publicly funded research projects. The Research Data Alliance (RDA) research community organization aims at building the social and technical infrastructure to enable open sharing and re-use of data. FAIR4Health and the RDA FAIR data maturity model WG initiated the HL7 FAIRness for FHIR implementation guide (IG) project to provide guidance on how to assess the FAIRness of health data sets with the RDA FAIR Data Maturity Model and support delivery of FAIR health datasets using the HL7 FHIR standard. In this paper, we present how we applied the guidance of the HL7 FAIRness for FHIR IG on selected PhysioNet datasets and reflect on the experience gained by discussing the recommendations of the I...

Advancing Exposure Science through Chemical Data Curation and Integration in the Comparative Toxicogenomics Database

Environmental Health Perspectives, 2016

Background: Exposure science studies the interactions and outcomes between environmental stressors and human or ecological receptors. To augment its role in understanding human health and the exposome, we aimed to centralize and integrate exposure science data into the broader biological framework of the Comparative Toxicogenomics Database (CTD), a public resource that promotes understanding of environmental chemicals and their effects on human health. oBjectives: We integrated exposure data within the CTD to provide a centralized, freely available resource that facilitates identification of connections between real-world exposures, chemicals, genes/proteins, diseases, biological processes, and molecular pathways. Methods: We developed a manual curation paradigm that captures exposure data from the scientific literature using controlled vocabularies and free text within the context of four primary exposure concepts: stressor, receptor, exposure event, and exposure outcome. Using data from the Agricultural Health Study, we have illustrated the benefits of both centralization and integration of exposure information with CTD core data. results: We have described our curation process, demonstrated how exposure data can be accessed and analyzed in the CTD, and shown how this integration provides a broad biological context for exposure data to promote mechanistic understanding of environmental influences on human health. conclusions: Curation and integration of exposure data within the CTD provides researchers with new opportunities to correlate exposures with human health outcomes, to identify underlying potential molecular mechanisms, and to improve understanding about the exposome.

Data governance in predictive toxicology: A review

Journal of cheminformatics, 2011

Background: Due to recent advances in data storage and sharing for further data processing in predictive toxicology, there is an increasing need for flexible data representations, secure and consistent data curation and automated data quality checking. Toxicity prediction involves multidisciplinary data. There are hundreds of collections of chemical, biological and toxicological data that are widely dispersed, mostly in the open literature, professional research bodies and commercial companies. In order to better manage and make full use of such large amount of toxicity data, there is a trend to develop functionalities aiming towards data governance in predictive toxicology to formalise a set of processes to guarantee high data quality and better data management. In this paper, data quality mainly refers in a data storage sense (e.g. accuracy, completeness and integrity) and not in a toxicological sense (e.g. the quality of experimental results). Results: This paper reviews seven widely used predictive toxicology data sources and applications, with a particular focus on their data governance aspects, including: data accuracy, data completeness, data integrity, metadata and its management, data availability and data authorisation. This review reveals the current problems (e.g. lack of systematic and standard measures of data quality) and desirable needs (e.g. better management and further use of captured metadata and the development of flexible multi-level user access authorisation schemas) of predictive toxicology data sources development. The analytical results will help to address a significant gap in toxicology data quality assessment and lead to the development of novel frameworks for predictive toxicology data and model governance. Conclusions: While the discussed public data sources are well developed, there nevertheless remain some gaps in the development of a data governance framework to support predictive toxicology. In this paper, data governance is identified as the new challenge in predictive toxicology, and a good use of it may provide a promising framework for developing high quality and easy accessible toxicity data repositories. This paper also identifies important research directions that require further investigation in this area.

Achieving FAIR Data Principles at the Environmental Data Initiative, the US-LTER Data Repository

Biodiversity Information Science and Standards

The Environmental Data Initiative (EDI) is a continuation and expansion of the original United Stated Long-Term Ecological Research Program (US-LTER) data repository which went into production in 2013. Building on decades of data management experience in LTER, EDI is addressing the challenge of publishing a diverse corpus of research data (Servilla et al. 2016). EDI’s accomplishments span all aspects of the data curation and publication lifecycle, including repository cyberinfrastructure, outreach and training, and enhancements to data documentation methodologies used by the environmental and ecological research communities. EDI is managing almost 43,000 unique data packages and their revisions from a community of nearly 2,300 individual data authors, most of which are contributed by LTER sites, and are openly accessible and documented with rich science metadata in the Ecological Metadata Language (EML) standard. Here we will present how EDI achieves FAIR data principles (Wilkinson ...

The FAIR Funder pilot programme to make it easy for funders to require and for grantees to produce FAIR Data

2019

There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of workshops to encourage the creation of Metadata for Machines (M4M), enabling any self-identified stakeholder to define and promote the reuse of standardized, comprehensive machine-actionable metadata. The funders of scientific research recognize that they have an important role to play in ensuring that experimental results are FAIR, and that high quality metadata and careful planning for FAIR data stewardship are central to these goals. We describe the outcome of a recent M4M workshop that has led to a pilot programme involving two national science funders, the Health Research Board of Ireland (HRB) and the Netherlands Organisation for Health Research and Development (ZonMW). These funding organizations will explore new technologies to define at the time that a request for proposals is issued the minimal set of machine-actionable metadata that they would like investigators to use to annotate their datasets, to enable investigators to create such metadata to help make their data FAIR, and to develop data-stewardship plans that ensure that experimental data will be managed appropriately abiding by the FAIR principles. The FAIR Funders design envisions a data-management workflow having seven essential stages, where solution providers are openly invited to participate. The initial pilot programme will launch using existing computer-based tools of those who attended the M4M Workshop.