Data Quality Assessment and Enhancement on Social and Sensor Data (original) (raw)

Data Quality Observation in Pervasive Environments

2012 IEEE 15th International Conference on Computational Science and Engineering, 2012

Pervasive applications are based on acquisition and consumption of real-time data from various environments. The quality of such data fluctuates constantly because of the dynamic nature of pervasive environments. Although data quality has notable impact on applications, little has been done on handling data quality in such environments. On the one hand past data quality research is mostly in the scope of database applications. On the other hand the work on Quality of Context still lacks feasibility in practice, thus has not yet been adopted by most context-aware systems. This paper proposes three metric definitions-Currency, Availability and Validity-for pervasive applications to quantitatively observe the quality of real-time data and data sources. Compared to previous work, the definitions ensure that all the parameters are interpretable and obtainable. Furthermore, the paper demonstrates the feasibility of proposed metrics by applying them to real-world data sources on open IoT platform Cosm 1 (formerly Pachube).

Data quality and the Internet of Things

Computing

The Internet of Things (IoT) is driving technological change and the development of new products and services that rely heavily on the quality of the data collected by IoT devices. There is a large body of research on data quality management and improvement in IoT, however, to date a systematic review of data quality measurement in IoT is not available. This paper presents a systematic literature review (SLR) about data quality in IoT from the emergence of the term IoT in 1999 to 2018. We reviewed and analyzed 45 empirical studies to identify research themes on data quality in IoT. Based on this analysis we have established the links between data quality dimensions, manifestations of data quality problems, and methods utilized to measure data quality. The findings of this SLR suggest new research areas for further investigation and identify implications for practitioners in defining and measuring data quality in IoT.

Validity as a Measure of Data Quality in Internet of Things Systems

Wireless Personal Communications, 2022

Data quality became significant with the emergence of data warehouse systems. While accuracy is intrinsic data quality, validity of data presents a wider perspective, which is more representational and contextual in nature. Through our article we present a different perspective in data collection and collation. We focus on faults experienced in data sets and present validity as a function of allied parameters such as completeness, usability, availability and timeliness for determining the data quality. We also analyze the applicability of these metrics and apply modifications to make it conform to IoT applications. Another major focus of this article is to verify these metrics on aggregated data set instead of separate data values. This work focuses on using the different validation parameters for determining the quality of data generated in a pervasive environment. Analysis approach presented is simple and can be employed to test the validity of collected data, isolate faults in the data set and also measure the suitability of data before applying algorithms for analysis.

Context-aware big data quality assessment: a scoping review

Journal of Data and Information Quality

The term data quality refers to measuring the fitness of data regarding the intended usage. Poor data quality leads to inadequate, inconsistent, and erroneous decisions that could escalate the computational cost, cause a decline in profits, and cause customer churn. Thus, data quality is crucial for researchers and industry practitioners. Different factors drive the assessment of data quality. Data context is deemed one of the key factors due to the contextual diversity of real-world use cases of various entities such as people and organizations. Data used in a specific context (e.g., an organization policy) may need to be more efficacious for another context. Hence, implementing a data quality assessment solution in different contexts is challenging. Traditional technologies for data quality assessment reached the pinnacle of maturity. Existing solutions can solve most of the quality issues. The data context in these solutions is defined as validation rules applied within the ETL (...

Characterizing IoT Data and its Quality for Use

2019

The Internet of Things (IoT) is a cyber physical social system that encompasses science, enterprise and societal domains. Data is the most important commodity in IoT, enabling the "smarts" through analytics and decision making. IoT environments can generate and consume vast amounts of data. But managing this data effectively and gaining meaningful insights from it requires us to understand its characteristics. Traditional scientific, enterprise and big data management approaches may not be adequate, and have to evolve. Further, these characteristics and the physical deployment environments also impact the quality of the data for use. In this paper, we offer a taxonomy of IoT data characteristics, along with data quality considerations, that are constructed from the ground-up based on the diverse IoT domains and applications we review. We emphasize on the essential features, rather than a vast array of attributes. We also indicate factors that influence the data quality. Su...

Statistical-Based Data Quality Model for Mobile Crowd Sensing Systems

Arabian Journal for Science and Engineering, 2018

Quality of information is an emerging issue in the mobile crowd sensing (MCS). MCS is an essential computing paradigm that tasks everyday mobile devices to form crowd sensing networks. Nowadays, there is an increasing demand to provide real-time environmental information such as air quality, noise level, traffic condition. However, the openness of crowd sensing exposes the system to malicious and erroneous participation, inevitably resulting in poor data quality. This brings forth an important issue of false data detection and correction in crowd sensing. Furthermore, data collected by participants normally include considerable missing values, which poses challenges for accurate false data detection. To improve the quality of the collected sensory data, the system server needs to consider the factors which are influencing the quality of the collected data. So, to acquire a high-quality sensory data, the MCS needs some efficient platforms to enhance the collected data, select the best participants from a group of users, and determine the perfect coverage type of sensing location and the exact sensing time which will achieve high-quality sensory data with low cost. In this paper, we will study the factors which are affected the MCS data quality, and propose a statistical MCS data quality model which can be used to collect the sensory data based on the data requester requirements to improve the quality of the data and selects the best users to participate in the sensing task for collecting the requested sensory data.

IoT Data Quality Issues and Potential Solutions: A Literature Review

ArXiv, 2021

The Internet of Things (IoT) is a paradigm that connects everyday items to the Internet. In the recent decade, the IoT’s spreading popularity is a promising opportunity for people and industries. IoT utilizes in a wide range of respects such as agriculture, healthcare, smart cities, and manufacturing sectors. IoT data quality is crucial in IoT real-life applications. IoT data quality dimensions and issues should be considered because we require data to make accurate and timely decisions, produce commodities, and gain insights about events, people, and the environment. It is essential to point out that we cannot reach valuable results by using poor quality data. This paper aims to develop a new category for IoT data quality. Hence, we examine existing IoT data quality dimensions and IoT data quality issues in general and specific domains and IoT data quality dimensions’ categories. It is worth considering that categories in the context of IoT are not many. We developed a new category...

Big Data Quality: From Content to Context

Journal of Information Technology Management, 2019

Over the last 20 years, and particularly with the advent of Big Data and analytics, the research area around Data and Information Quality (DIQ) is still a fast growing research area. There are many views and streams in DIQ research, generally aiming at improving the effectiveness of decision making in organizations. Although there are a lot of researches aimed at clarifying the role of BIG data quality for organizations, there is no comprehensive literature review that shows the main differences between traditional data quality researches and Big Data quality researches. This paper analyzed the papers published in Big data quality and find out that there is almost no new mainstream about Big Data quality. It is shown in this paper that the main concepts of data quality does not changes in Big Data context and that only some new issues have been added to this area.