DMMP-Net: diffusion model-based missing part patching network for station air quality data generation completion (original) (raw)
Abstract
Estimating the missing part of environmental monitoring ground station data is of great significance for environmental monitoring and prediction. However, it is difficult for existing methods to solve the problem of dealing with temporal correlation of station data, spatial correlation, and correlation between pollutant concentration values in missing data completions. Therefore, this paper proposes a diffusion model-based missing part patching network for station air quality data generation completion(DMMP-Net). First, the diffusion model is used to learn the data distribution pattern, and the data with missing values are used as conditional inputs to generate new data without missing values to fill in the data with missing values for the purpose of data enhancement, so that the data can be subsequently applied to the tasks of analyzing the sources of pollution, exploring the components of pollution, and predicting the air quality. Second, we use the attention mechanism to improve the noise estimation network to enhance the feature extraction capability of site air quality data in three dimensions: time, space, and between the concentrations of various pollutants, and to improve the ability of DMMP-Net to learn the features of the data distributions in order to generate accurate complementary data. Experiments are conducted on the data from Beijing regional air quality monitoring stations to prove the effectiveness of DMMP-Net. Compared with the forward substitution method, the mean padding method and the K-nearest neighbor padding algorithm, the evaluation indices of MAE and MRE have better results. In the three cases of random missing, time-continuous missing and space-continuous missing, the evaluation indices of MAE reach 5.532, 10.849 and 12.641, respectively, and the evaluation indices of MRE reach 0.129, 0.243 and 0.342, respectively, and the generation of the replacement effect is better than that of the traditional missing-value filling model.
Access this article
Subscribe and save
- Starting from 10 chapters or articles per month
- Access and download chapters and articles from more than 300k books and 2,500 journals
- Cancel anytime View plans
Buy Now
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Instant access to the full article PDF.
Similar content being viewed by others
Data availibility
Data will be made available on request.
References
- Wang C, Chang L, Wang X.-S, Zhang B, Stein A (2024) Interferometric synthetic aperture radar statistical inference in deformation measurement and geophysical inversion: A review. IEEE Geoscience and Remote Sensing Magazine
- Yu Y, James J, Li VO, Lam JC (2020) A novel interpolation-svt approach for recovering missing low-rank air quality data. IEEE Access 8:74291–74305
Article Google Scholar - Noor NM, Al Bakri Abdullah MM, Yahaya AS, Ramli NA (2015) Comparison of linear interpolation method and mean method to replace the missing values in environmental data set. In: Materials Science Forum, vol. 803, pp. 278–281 . Trans Tech Publ
- Metia S, Oduro S.D, Ha Q.P, Duc H, Azzi M (2013) Environmental time series analysis and estimation with extended kalman filtering. In: 2013 1st International Conference on Artificial Intelligence, Modelling and Simulation, pp. 235–240 . IEEE
- Tzanis CG, Alimissis A, Koutsogiannis I (2021) Addressing missing environmental data via a machine learning scheme. Atmosphere 12(4):499
Article Google Scholar - Kim T, Kim J, Yang W, Lee H, Choo J (2021) Missing value imputation of time-series air-quality data via deep neural networks. Int J Environ Res Public Health 18(22):12213
Article Google Scholar - Zhang X, Hu J, Zhou P, Wang G (2022) An improved multi-source spatiotemporal data fusion model based on the nearest neighbor grids for pm2. 5 concentration interpolation and prediction. In: International Conference on Data Mining and Big Data, pp. 273–287 . Springer
- Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Process Mag 35(1):53–65
Article Google Scholar - Zhang C, Zhang C, Zhang M, Kweon IS (2023) Text-to-image diffusion models in generative ai: A survey. arXiv preprint arXiv:2303.07909
- Zhang J, Zhao L, Yu K, Min G, Al-Dubai AY, Zomaya AY (2023) A novel federated learning scheme for generative adversarial networks. IEEE Trans Mobile Comput 23(5):3633–3649
Article Google Scholar - Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
- Mak HWL, Han R, Yin HH (2023) Application of variational autoencoder (vae) model and image processing approaches in game design. Sensors 23(7):3457
Article Google Scholar - Islam A, Belhaouari SB (2023) Fast and efficient image generation using variational autoencoders and k-nearest neighbor oversampling approach. IEEE Access 11:28416–28426
Article Google Scholar - Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265 . PMLR
- Zhang L, Rao A, Agrawala M (2013) Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847
- Zhang C, Zhang C, Zhang M, Kweon I.S (2023) Text-to-image diffusion models in generative ai: A survey. arXiv preprint arXiv:2303.07909
- Song Y, Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, Poole B (2020) Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456
- Dhariwal P, Nichol A (2021) Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst 34:8780–8794
Google Scholar - Hung ALY, Zhao K, Zheng H, Yan R, Raman SS, Terzopoulos D, Sung K (2023) Med-cdiff: Conditional medical image generation with diffusion models. Bioengineering 10(11):1258
Article Google Scholar - Gu S, Chen D, Bao J, Wen F, Zhang B, Chen D, Yuan L, Guo B (2022) Vector quantized diffusion model for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10696–10706
- Zhu D, Fu L, Kazei V, Li W (2023) Diffusion model for das-vsp data denoising. Sensors 23(20):8619
Article Google Scholar - Gong K, Johnson K, El Fakhri G, Li Q, Pan T (2024) Pet image denoising based on denoising diffusion probabilistic model. Eur J Nucl Med Mol Imaging 51(2):358–368
Article Google Scholar - Özdenizci O, Legenstein R (2023) Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Trans Pattern Anal Mach Intell 45(8):10346–10357
Article Google Scholar - Xiang T, Yurt M, Syed AB, Setsompop K, Chaudhari A (2023) \(\text{Ddm}^{2}\): Self-supervised diffusion mri denoising with generative diffusion models. arXiv preprint arXiv:2302.03018
- Yuan H, Yuan Z, Tan C, Huang F, Huang S (2022) Seqdiffuseq: Text diffusion with encoder-decoder transformers. arXiv preprint arXiv:2212.10325
- Balagansky N, Gavrilov D (2023) Democratized diffusion language model. arXiv preprint arXiv:2305.10818
- He H, Bai C, Xu K, Yang Z, Zhang W, Wang D, Zhao B, Li X (2024) Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning. In: Advances in neural information processing systems 36
- Zhou K, Li Y, Zhao WX, Wen J-R (2023) Diffusion-nat: Self-prompting discrete diffusion for non-autoregressive text generation. arXiv preprint arXiv:2305.04044
- Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851
Google Scholar - Tashiro Y, Song J, Song Y, Ermon S (2021) Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Adv Neural Inf Process Syst 34:24804–24816
Google Scholar - Park SW, Lee K, Kwon J (2021) Neural markov controlled sde: Stochastic optimization for continuous-time data. In: International Conference on Learning Representations
- Alcaraz JML, Strodthoff N (2022) Diffusion-based time series imputation and forecasting with structured state space models. arXiv preprint arXiv:2208.09399
- Yu Y, James J, Li VO, Lam JC (2020) A novel interpolation-svt approach for recovering missing low-rank air quality data. IEEE Access 8:74291–74305
Article Google Scholar - Şahin ÜA, Bayat C, Uçan ON (2011) Application of cellular neural network (cnn) to the prediction of missing air pollutant data. Atmos Res 101(1–2):314–326
Article Google Scholar - Arroyo Á, Herrero Á, Tricio V, Corchado E, Woźniak M et al (2018) Neural models for imputation of missing ozone data in air-quality datasets. Complexity 2018:7238015
Article Google Scholar - Yu Y, Li VO, Lam JC (2021) Hierarchical recovery of missing air pollution data via improved long-short term context encoder network. IEEE Trans Big Data 9(1):93–105
Article Google Scholar - Ma J, Cheng JC, Ding Y, Lin C, Jiang F, Wang M, Zhai C (2020) Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series. Adv Eng Inform 44:101092
Article Google Scholar - Ma S, Jiao J, Ren S, Song W (2023) Missing value filling for multi-variable urban air quality data based onattention mechanism. Comput Eng Sci 45(8):1354–1364
Google Scholar - Jiang N, Li Y, Zuo H, Zheng H, Zheng Q (2020) Bilstm-a: A missing value imputation method for pm2. 5 prediction. In: 2020 2nd International Conference on Applied Machine Learning (ICAML), pp. 23–28. IEEE
- Peña M, Ortega P, Orellana M (2019) A novel imputation method for missing values in air pollutant time series data. In: 2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI), pp. 1–6. IEEE
- Chandra W, Suprihatin B, Resti Y (2023) Median-knn regressor-smote-tomek links for handling missing and imbalanced data in air quality prediction. Symmetry 15(4):887
Article Google Scholar - Junger W, De Leon AP (2015) Imputation of missing data in time series for air pollutants. Atmos Environ 102:96–104
Article Google Scholar - Özdenizci O, Legenstein R (2023) Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Trans Pattern Anal Mach Intell 45:10346–10357
Article Google Scholar - Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722
- Lin C, Labzovskii LD, Mak HWL, Fung JC, Lau AK, Kenea ST, Bilal M, Hey JDV, Lu X, Ma J (2020) Observation of pm2.5 using a combination of satellite remote sensing and low-cost sensor network in Siberian urban areas with limited reference monitoring. Atmos Environ 227:117410
Article Google Scholar - DeSouza P, Anjomshoaa A, Duarte F, Kahn R, Kumar P, Ratti C (2020) Air quality monitoring using mobile low-cost sensors mounted on trash-trucks: methods development and lessons learned. Sustain Cities Soc 60:102239
Article Google Scholar - Hofman J, Do TH, Qin X, Bonet ER, Philips W, Deligiannis N, La Manna VP (2022) Spatiotemporal air quality inference of low-cost sensor data: evidence from multiple sensor testbeds. Environ Modell Softw 149:105306
Article Google Scholar
Acknowledgements
The authors express gratitude to the Key Scientific Research Projects of Colleges and Universities in Henan Province (NO.23A170013).
Funding
The Key Scientific Research Projects of Colleges and Universities in Henan Province (NO.23A170013).
Author information
Authors and Affiliations
- College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, Henan, China
Zhenying Li, Weidong Li, Xuehai Zhang & Jinlong Duan - Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, 100094, China
Linyan Bai
Authors
- Zhenying Li
- Weidong Li
- Xuehai Zhang
- Jinlong Duan
- Linyan Bai
Corresponding authors
Correspondence toWeidong Li or Linyan Bai.
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Z., Li, W., Zhang, X. et al. DMMP-Net: diffusion model-based missing part patching network for station air quality data generation completion.Int. J. Mach. Learn. & Cyber. 16, 3601–3612 (2025). https://doi.org/10.1007/s13042-024-02468-x
- Received: 17 June 2024
- Accepted: 11 November 2024
- Published: 27 November 2024
- Version of record: 27 November 2024
- Issue date: June 2025
- DOI: https://doi.org/10.1007/s13042-024-02468-x