Eliciting Well-Formed Quality Indicators And Metadata In GEOSS Earth Observation Products (original) (raw)

Díaz, P., Masó, J., Sevillano, E., Ninyerola, M., Zabala, A., Serral, I., Pons, X. (2012). Analysis of quality metadata in the GEOSS Clearinghouse. International Journal of Spatial Data Infrastructures Research. Vol 7 (2012), pp. 352-377

The Global Earth Observation System of Systems (GEOSS) Clearinghouse is part of the GEOSS Common Infrastructure (GCI) that supports the discovery of the data made available by the Group on Earth Observations (GEO) members and participant organizations in GEOSS. It also acts as a unified metadata catalogue that stores complete metadata records, not only about datasets but also for other kinds of components and services. By exploring these records, users often try to find the fit-for-use data. Quality indicators and provenance are included in the metadata and are potentially useful variables that allow users to make an informed decision avoiding to download and to assess the data themselves. However, no previous studies have been made on the completeness and correctness of the metadata records in the Clearinghouse. The objective of this paper is to analyze the data quality information distributed by the GEOSS Clearinghouse. The aim is to quantify its completeness and to provide clues on how the current status of the Clearinghouse could be improved and how useful quality aware tools could be. The methodology used in the current analysis consists in first harvesting of the Clearinghouse and then quantify the quality information found in 97203 metadata records, by using a semi-automatic approach. The results reveal that the inclusion of quality information on metadata records is not rare: 19.66% of the metadata records contain some quality element. However, this is not general enough and several aspects could be  This work is licensed under the Creative Commons Attribution-Non commercial Works 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ or send a letter to Creative Commons, 353 improved. For instance, 77.78% of quantitative measures lack measure units. When quality indicators are not sufficient, the lineage metadata information could be used to mitigate this situation by analysing the process steps and sources used to create a dataset. However, even though lineage is reported in 15.55% of the records, only 1.27% of the cases return a complete list of process steps with sources. This paper also provides indications on what is lacking in the current producer metadata model and, detected a gap in usage or user feedback metadata in GEOSS. Moreover, information extracted from GeoViQua interviews with users indicates that they value informal comments and user feedback on datasets as a complement of the more formal producer-oriented metadata description of the data. Although, many efforts within the scientific community and the Quality Assurance Framework for Earth Observation (QA4EO) group have been invested in describing how to parameterize data quality and uncertainty, we conclude that still extra work can be done to provide complete quality information in the metadata catalogues. In brief, since the GEOSS Clearinghouse references data from the most important agencies and research organizations, the results presented in this paper provide a perspective on how well quality is disseminated in the Earth observation community in general.

Analysis of quality metadata in the GEOSS Clearinghouse

The Global Earth Observation System of Systems (GEOSS) Clearinghouse is part of the GEOSS Common Infrastructure (GCI) that supports the discovery of the data made available by the Group on Earth Observations (GEO) members and participant organizations in GEOSS. It also acts as a unified metadata catalogue that stores complete metadata records, not only about datasets but also for other kinds of components and services. By exploring these records, users often try to find the fit-for-use data. Quality indicators and provenance are included in the metadata and are potentially useful variables that allow users to make an informed decision avoiding to download and to assess the data themselves. However, no previous studies have been made on the completeness and correctness of the metadata records in the Clearinghouse. The objective of this paper is to analyze the data quality information distributed by the GEOSS Clearinghouse. The aim is to quantify its completeness and to provide clues on how the current status of the Clearinghouse could be improved and how useful quality aware tools could be. The methodology used in the current analysis consists in first harvesting of the Clearinghouse and then quantify the quality information found in 97203 metadata records, by using a semi-automatic approach. The results reveal that the inclusion of quality information on metadata records is not rare: 19.66% of the metadata records contain some quality element. However, this is not general enough and several aspects could be 

Data Quality Analysis in the GEOSS Clearinghouse

The Global Earth Observation System of Systems (GEOSS) Clearinghouse is part of the GEOSS Common Infrastructure (GCI) that supports the discovery of the data made available by the Group on Earth Observations (GEO) members and participant organizations in GEOSS. It also acts as a unified metadata catalogue that stores complete metadata records, not only for datasets but also for other kinds of components and services. By exploring these records, users often try to find the fit-foruse data. Quality indicators and provenance are potentially useful variables that allow users to make an informed decision without having to download and assess the data themselves. The GEO Portal could be extended to better support users in this task by ranging, allowing metadata intercomparison, etc. However, no previous studies have been made on the completeness and correctness of the Clearinghouse. This paper analyzes the quality metadata in the catalogue, quantifies its completeness and provides clues on how the current status of the Clearinghouse could be improved and how useful new quality aware tools could be. Since the GEOSS Clearinghouse references data coming from the most important agencies and research organizations, the results presented in this paper provide a perspective on how well quality is disseminated in the Earth observation metadata catalogues in general. Moreover, there is a general opinion that user feedback and informal comments on datasets can complement a more formal producer-oriented metadata description of the data. This study also provides some indications as to what is lacking in the current producer metadata and could help to design a future user feedback system for GEOSS. The methodology developed in the current analysis harvests the Clearinghouse and quantifies the quality information found in 97203 metadata records, by using a semiautomatic approach. Many efforts from within the scientific community and the QA4EO group have been invested in describing how to parameterize quality and uncertainty, but this study reveals that their inclusion on metadata documents is not rare (19.66%), but not general enough and several aspects could be improved. For instance, 77.78% of quantitative measurements results lack the measurements units. When quality indicators are not sufficient, the lineage metadata information could be used to mitigate this situation by analysing the process steps and sources used to create a dataset. However, even though lineage is reported in 15.55% of the records, only 1.27% of the cases return a complete list of process steps with sources. This study is conducted in the context of the EC FP7 GeoViQua project that intends to develop tools to elicit search and visualize quality information in GEOSS.

Rubric-Q: Adding Quality-Related Elements to the GEOSS Clearinghouse Datasets

Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2013

Geospatial data have become a crucial input for the scientific community for understanding the environment and developing environmental management policies. The Global Earth Observation System of Systems (GEOSS) Clearinghouse is a catalogue and search engine that provides access to the Earth Observation metadata. However, metadata are often not easily understood by users, especially when presented in ISO XML encoding. Data quality included in the metadata is basic for users to select datasets suitable for them. This work aims to help users to understand the quality information held in metadata records and to provide the results to geospatial users in an understandable and comparable way. Thus, we have developed an enhanced tool (Rubric-Q) for visually assessing the metadata quality information and quantifying the degree of metadata population. Rubric-Q is an extension of a previous NOAA Rubric tool used as a metadata training and improvement instrument. The paper also presents a thorough assessment of the quality information by applying the Rubric-Q to all dataset metadata records available in the GEOSS Clearinghouse. The results reveal that just 8.7% of the datasets have some quality element described in the metadata, 63.4% have some lineage element documented, and merely 1.2% has some usage element described.

Visualisation of quality information for geospatial and remote sensing data:providing the GIS community with the decision support tools for geospatial dataset quality evaluation

2015

The evaluation of geospatial data quality and trustworthiness presents a major challenge to geospatial data users when making a dataset selection decision. The research presented here therefore focused on defining and developing a GEO label – a decision support mechanism to assist data users in efficient and effective geospatial dataset selection on the basis of quality, trustworthiness and fitness for use. This thesis thus presents six phases of research and development conducted to: (a) identify the informational aspects upon which users rely when assessing geospatial dataset quality and trustworthiness; (2) elicit initial user views on the GEO label role in supporting dataset comparison and selection; (3) evaluate prototype label visualisations; (4) develop a Web service to support GEO label generation; (5) develop a prototype GEO label-based dataset discovery and intercomparison decision support tool; and (6) evaluate the prototype tool in a controlled human-subject study. The r...

Geospatial data quality indicators

2012

Indicators which summarise the characteristics of spatiotemporal data coverages significantly simplify quality evaluation, decision making and justification processes by providing a number of quality cues that are easy to manage and avoiding information overflow. Criteria which are commonly prioritised in evaluating spatial data quality and assessing a dataset's fitness for use include lineage, completeness, logical consistency, positional accuracy, temporal and attribute accuracy. However, user requirements may go far beyond these broadlyaccepted spatial quality metrics, to incorporate specific and complex factors which are less easily measured. This paper discusses the results of a study of high level user requirements in geospatial data selection and data quality evaluation. It reports on the geospatial data quality indicators which were identified as user priorities, and which can potentially be standardised to enable intercomparison of datasets against user requirements. We briefly describe the implications for tools and standards to support the communication and intercomparison of data quality, and the ways in which these can contribute to the generation of a GEO label.

Spatial data quality: From metadata to quality indicators and contextual end-user manual

The context within which geospatial data are used has changed significantly during the past ten years. Users have now easier access to geospatial data but typically have less knowledge in the geographical information domain, so have limited knowledge of the risk related to the use of geospatial data. This sometimes leads to faulty decision-making that may have significant consequences. In order to reduce these risks, geospatial data producers provide metadata to help users to assess the fitness for use of the data they are using within the context of their application. However, experience shows that these metadata have several limitations and do not reach their information goal for this new group of non-expert users. In addition, geospatial data are becoming a mass product that has to follow legal requirements related to this class of products. Metadata, as currently defined, do not reach these obligations, especially concerning the requirements for easily understood information abo...

An integrated view of data quality in Earth observation

International Journal of Geographical Information Science, 2011

Spatial data infrastructure (SDI) actors have great expectations for the second-generation SDI currently under development. However, SDIs have many implementation problems at different levels that are delaying the development of the SDI framework. The aims of this article are to identify these difficulties, in the literature and based on our own experience, in order to determine how mature and useful the current SDI phenomena are. We can then determine whether a general reconceptualization is necessary or rather a set of technical improvements and good practices needs to be developed before the second-generation SDI is completed. This study is based on the following aspects: metadata about data and services, data models, data download, data and processing services, data portrayal and symbolization, and mass market aspects. This work aims to find an equilibrium between user-focused geoportals and web service interconnection (the user side vs. the server side). These deep reflections are motivated by a use case in the healthcare area in which we employed the Catalan regional SDI. The use case shows that even one of the best regional SDI implementations can fail to provide the required information and processes even when the required data exist. Several previous studies recognize the value of applying Web 2.0 and user participation approaches but few of these studies provide a real implementation. Another objective of this work is to show that it is easy to complement the classical, international standard-based SDI with a participative Web 2.0 approach. To do so, we present a mash-up portal built on top of the Catalan SDI catalogues.