Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data (original) (raw)

Automated Data Quality Assessment of Marine Sensors

Sensors, 2011

The automated collection of data (e.g., through sensor networks) has led to a massive increase in the quantity of environmental and other data available. The sheer quantity of data and growing need for real-time ingestion of sensor data (e.g., alerts and forecasts from physical models) means that automated Quality Assurance/Quality Control (QA/QC) is necessary to ensure that the data collected is fit for purpose. Current automated QA/QC approaches provide assessments based upon hard classifications of the gathered data; often as a binary decision of good or bad data that fails to quantify our confidence in the data for use in different applications. We propose a novel framework for automated data quality assessments that uses Fuzzy Logic to provide a continuous scale of data quality. This continuous quality scale is then used to compute error bars upon the data, which quantify the data uncertainty and provide a more meaningful measure of the data's fitness for purpose in a particular application compared with hard quality classifications. The design principles of the framework are presented and enable both data statistics and expert knowledge to be incorporated into the uncertainty assessment. We have implemented and tested the framework upon a real time platform of temperature and conductivity sensors that have been deployed to monitor the Derwent Estuary in Hobart, Australia. Results indicate that the error bars generated from the Fuzzy QA/QC implementation are in good agreement with the error bars manually encoded by a domain expert.

Automated quality control methods for sensor data: a novel observatory approach

Biogeosciences, 2013

National and international networks and observatories of terrestrial-based sensors are emerging rapidly. As such, there is demand for a standardized approach to data quality control, as well as interoperability of data among sensor networks. The National Ecological Observatory Network (NEON) has begun constructing their first terrestrial observing sites, with 60 locations expected to be distributed across the US by 2017. This will result in over 14 000 automated sensors recording more than > 100 Tb of data per year. These data are then used to create other datasets and subsequent "higher-level" data products. In anticipation of this challenge, an overall data quality assurance plan has been developed and the first suite of data quality control measures defined. This data-driven approach focuses on automated methods for defining a suite of plausibility test parameter thresholds. Specifically, these plausibility tests scrutinize the data range and variance of each measurement type by employing a suite of binary checks. The statistical basis for each of these tests is developed, and the methods for calculating test parameter thresholds are explored here. While these tests have been used elsewhere, we apply them in a novel approach by calculating their relevant test parameter thresholds. Finally, implementing automated quality control is demonstrated with preliminary data from a NEON prototype site.

Real-Time Quality Control (QC) Processing, Notification, and Visualization Services, Supporting Data Management of the Intelligent River©

Quality Assurance (QA) of real-time environmental data is realized through the planning and implementation of standard operating procedures. A critical component of any QA program supporting realtime data is Quality Control (QC). These processes need to be performed using automated and manual methods. The need for automatic QC has become even more critical as web-based technology has enabled access to data by end-users following data collection and limited resources can hinder the timeliness and scope of manual QC methods. Ultimately, the end goal of the Intelligent River© QA program is to produce research quality data that is available to end users following data collection. To support this goal, automated QC methods were implemented that provide signals, realized with a flagging mechanism in the event of a failure or anomaly and in some cases correcting data for systematic errors. These QC processes are organized into levels, with each level performing more robust checks and corrections as the data are entering the Intelligent River© network. Level-1 algorithms apply simple heuristics to identify invalid or suspect observations. Specifically, observations with erroneous timestamps originating from the same device, data that exceed expected high and low thresholds for any given site location, missing observations, and observations exhibiting excessive variability originating from the same device are flagged. Flagged observations are republished to a level-2 process responsible for correcting the errors when appropriate. These QC processes are augmented by tools that support automated notification such as email and RSS feeds that can be tuned to varying levels of notification dependent on the end user needs. QC error notifications are ingested by web-based mapping technologies and visualized in a similar manner as the environmental data observations. Further, a level-3 product (in development) focuses on the identification of sensor drift and overall sensor performance, resulting in the automated publication of a PDF report that details a series of statistical analyses on any given sensor.

Systematic quality control for long term ocean observations and applications

ACTA IMEKO, 2016

With the advances of last year's technologies many new observation platforms have been created and connected on network for the diffusion of numerous and diverse observations, and also it provided a great possibility to connect all kind of people facilitating the creations of great scale and long-term studies. This paper is focused on the marine observations and platforms employed for this scope. Real time data and the big data have to accomplish some minimal quality of data requirements. Usually, the task to ensure these quality requirements is accomplished by the platforms responsible. The aim of paper is to explain the design of these quality control systems and its implementation in an ocean observation platform.

Mastering Real-Time Data Quality Control - How to Measure and Manage the Quality of (Rig) Sensor Data

SPE/IADC Middle East Drilling and Technology Conference, 2007

The amount of data collected in the information age has grown to amounts barely manageable. Currently available technologies are already capable of transmitting the readings of any sensor to worldwide locations at high frequencies and with nearly no time delay. With an ever-increasing flow of data, the need for criteria to measure and evaluate data quality are more pressing than ever, as this data forms the basis for many critical business decisions. This paper addresses these problems and shows essential steps to a successful data and quality management strategy:Quality control and improvementData quality benchmarkingAccessibility of controlled data Simple but very effective signal processing algorithms are presented to ensure that the data is in the right value range, outliers are removed and missing values are substituted where possible. More complex control instances may not be able to correct data completely by automation. Here the human expert is still necessary for correction...

Advancing Near-Real-Time Quality Controls of Meteorological Observations

Bulletin of the American Meteorological Society, 2022

Meteorological observations from ground weather stations are of the upmost importance to implement and run sectoral climate services and to perform scientific activities, such as models’ evaluation and data assimilation. However, meteorological observations may be affected by errors deteriorating the quality and the reliability of the products based on these data. To avoid such a cascade of effects, it is essential to quality check meteorological observations in near–real time. A new system based on several levels of increasing complexity checks is proposed here. Besides the standard quality controls, this system applies dynamic and adaptive checks targeting exactly the locations and the data under analysis. The proposed system is designed to be flexible and easily modifiable thanks to its combined R and XML implementation. To facilitate its applicability, the proposed system is made available as a free open-source software: QuackMe.

Applying Open Geospatial Consortium's Sensor Web Enablement to address real-time oceanographic data quality, secondary data use, and long-term preservation

… 2009, MTS/IEEE …, 2010

Key to the appropriate use of data is the knowledge of data quality. This knowledge is critical for products and decisionsupport tools that utilize real-time data, and it is also essential for the longer term application of data as well. Guidance by the National Archives and Records Administration (NARA) for appraising observational data for archive states that factors favoring long-term or permanent retention include the uniqueness, completeness, and quality of observational data and the quality and completeness of metadata [1]. The National Oceanographic Data Center (NODC), the designated archive center for oceanographic data in the U.S., requires that data submitted be documented to enable secondary use and ensure data posterity. Such metadata should include not only geospatial characteristics and time periods of observations, but also the collection methods, instrumentation used, units of measure, acceptable values, error tolerance, processing history, quality assessments and explanations of quality flags, data aggregation methods, and other pertinent information [2]. Providing this information in a consistent manner can be a challenge. However, an approach to capturing and conveying this metadata using community-developed practices for ocean observing system data and metadata is well underway. This paper presents methods of capturing data and provenance of data quality using the Open Geospatial Consortium (OGC) Sensor Web Enablement (SWE) framework. It describes the types of metadata content captured and demonstrates the utility and significance of defining and registering terms to enable semantic, as well as syntactic, interoperability. 1 The SWE framework provides an avenue for conveying quality flags and methods used to make assurances about the integrity of oceanographic data for real-time consumption and for potential submittal to permanent archives such as NODC.