DMMP-Net: diffusion model-based missing part patching network for station air quality data generation completion (original) (raw)

Abstract

Estimating the missing part of environmental monitoring ground station data is of great significance for environmental monitoring and prediction. However, it is difficult for existing methods to solve the problem of dealing with temporal correlation of station data, spatial correlation, and correlation between pollutant concentration values in missing data completions. Therefore, this paper proposes a diffusion model-based missing part patching network for station air quality data generation completion(DMMP-Net). First, the diffusion model is used to learn the data distribution pattern, and the data with missing values are used as conditional inputs to generate new data without missing values to fill in the data with missing values for the purpose of data enhancement, so that the data can be subsequently applied to the tasks of analyzing the sources of pollution, exploring the components of pollution, and predicting the air quality. Second, we use the attention mechanism to improve the noise estimation network to enhance the feature extraction capability of site air quality data in three dimensions: time, space, and between the concentrations of various pollutants, and to improve the ability of DMMP-Net to learn the features of the data distributions in order to generate accurate complementary data. Experiments are conducted on the data from Beijing regional air quality monitoring stations to prove the effectiveness of DMMP-Net. Compared with the forward substitution method, the mean padding method and the K-nearest neighbor padding algorithm, the evaluation indices of MAE and MRE have better results. In the three cases of random missing, time-continuous missing and space-continuous missing, the evaluation indices of MAE reach 5.532, 10.849 and 12.641, respectively, and the evaluation indices of MRE reach 0.129, 0.243 and 0.342, respectively, and the generation of the replacement effect is better than that of the traditional missing-value filling model.

Access this article

Log in via an institution

Subscribe and save

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data availibility

Data will be made available on request.

References

  1. Wang C, Chang L, Wang X.-S, Zhang B, Stein A (2024) Interferometric synthetic aperture radar statistical inference in deformation measurement and geophysical inversion: A review. IEEE Geoscience and Remote Sensing Magazine
  2. Yu Y, James J, Li VO, Lam JC (2020) A novel interpolation-svt approach for recovering missing low-rank air quality data. IEEE Access 8:74291–74305
    Article Google Scholar
  3. Noor NM, Al Bakri Abdullah MM, Yahaya AS, Ramli NA (2015) Comparison of linear interpolation method and mean method to replace the missing values in environmental data set. In: Materials Science Forum, vol. 803, pp. 278–281 . Trans Tech Publ
  4. Metia S, Oduro S.D, Ha Q.P, Duc H, Azzi M (2013) Environmental time series analysis and estimation with extended kalman filtering. In: 2013 1st International Conference on Artificial Intelligence, Modelling and Simulation, pp. 235–240 . IEEE
  5. Tzanis CG, Alimissis A, Koutsogiannis I (2021) Addressing missing environmental data via a machine learning scheme. Atmosphere 12(4):499
    Article Google Scholar
  6. Kim T, Kim J, Yang W, Lee H, Choo J (2021) Missing value imputation of time-series air-quality data via deep neural networks. Int J Environ Res Public Health 18(22):12213
    Article Google Scholar
  7. Zhang X, Hu J, Zhou P, Wang G (2022) An improved multi-source spatiotemporal data fusion model based on the nearest neighbor grids for pm2. 5 concentration interpolation and prediction. In: International Conference on Data Mining and Big Data, pp. 273–287 . Springer
  8. Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Process Mag 35(1):53–65
    Article Google Scholar
  9. Zhang C, Zhang C, Zhang M, Kweon IS (2023) Text-to-image diffusion models in generative ai: A survey. arXiv preprint arXiv:2303.07909
  10. Zhang J, Zhao L, Yu K, Min G, Al-Dubai AY, Zomaya AY (2023) A novel federated learning scheme for generative adversarial networks. IEEE Trans Mobile Comput 23(5):3633–3649
    Article Google Scholar
  11. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
  12. Mak HWL, Han R, Yin HH (2023) Application of variational autoencoder (vae) model and image processing approaches in game design. Sensors 23(7):3457
    Article Google Scholar
  13. Islam A, Belhaouari SB (2023) Fast and efficient image generation using variational autoencoders and k-nearest neighbor oversampling approach. IEEE Access 11:28416–28426
    Article Google Scholar
  14. Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265 . PMLR
  15. Zhang L, Rao A, Agrawala M (2013) Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847
  16. Zhang C, Zhang C, Zhang M, Kweon I.S (2023) Text-to-image diffusion models in generative ai: A survey. arXiv preprint arXiv:2303.07909
  17. Song Y, Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, Poole B (2020) Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456
  18. Dhariwal P, Nichol A (2021) Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst 34:8780–8794
    Google Scholar
  19. Hung ALY, Zhao K, Zheng H, Yan R, Raman SS, Terzopoulos D, Sung K (2023) Med-cdiff: Conditional medical image generation with diffusion models. Bioengineering 10(11):1258
    Article Google Scholar
  20. Gu S, Chen D, Bao J, Wen F, Zhang B, Chen D, Yuan L, Guo B (2022) Vector quantized diffusion model for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10696–10706
  21. Zhu D, Fu L, Kazei V, Li W (2023) Diffusion model for das-vsp data denoising. Sensors 23(20):8619
    Article Google Scholar
  22. Gong K, Johnson K, El Fakhri G, Li Q, Pan T (2024) Pet image denoising based on denoising diffusion probabilistic model. Eur J Nucl Med Mol Imaging 51(2):358–368
    Article Google Scholar
  23. Özdenizci O, Legenstein R (2023) Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Trans Pattern Anal Mach Intell 45(8):10346–10357
    Article Google Scholar
  24. Xiang T, Yurt M, Syed AB, Setsompop K, Chaudhari A (2023) \(\text{Ddm}^{2}\): Self-supervised diffusion mri denoising with generative diffusion models. arXiv preprint arXiv:2302.03018
  25. Yuan H, Yuan Z, Tan C, Huang F, Huang S (2022) Seqdiffuseq: Text diffusion with encoder-decoder transformers. arXiv preprint arXiv:2212.10325
  26. Balagansky N, Gavrilov D (2023) Democratized diffusion language model. arXiv preprint arXiv:2305.10818
  27. He H, Bai C, Xu K, Yang Z, Zhang W, Wang D, Zhao B, Li X (2024) Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning. In: Advances in neural information processing systems 36
  28. Zhou K, Li Y, Zhao WX, Wen J-R (2023) Diffusion-nat: Self-prompting discrete diffusion for non-autoregressive text generation. arXiv preprint arXiv:2305.04044
  29. Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851
    Google Scholar
  30. Tashiro Y, Song J, Song Y, Ermon S (2021) Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Adv Neural Inf Process Syst 34:24804–24816
    Google Scholar
  31. Park SW, Lee K, Kwon J (2021) Neural markov controlled sde: Stochastic optimization for continuous-time data. In: International Conference on Learning Representations
  32. Alcaraz JML, Strodthoff N (2022) Diffusion-based time series imputation and forecasting with structured state space models. arXiv preprint arXiv:2208.09399
  33. Yu Y, James J, Li VO, Lam JC (2020) A novel interpolation-svt approach for recovering missing low-rank air quality data. IEEE Access 8:74291–74305
    Article Google Scholar
  34. Şahin ÜA, Bayat C, Uçan ON (2011) Application of cellular neural network (cnn) to the prediction of missing air pollutant data. Atmos Res 101(1–2):314–326
    Article Google Scholar
  35. Arroyo Á, Herrero Á, Tricio V, Corchado E, Woźniak M et al (2018) Neural models for imputation of missing ozone data in air-quality datasets. Complexity 2018:7238015
    Article Google Scholar
  36. Yu Y, Li VO, Lam JC (2021) Hierarchical recovery of missing air pollution data via improved long-short term context encoder network. IEEE Trans Big Data 9(1):93–105
    Article Google Scholar
  37. Ma J, Cheng JC, Ding Y, Lin C, Jiang F, Wang M, Zhai C (2020) Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series. Adv Eng Inform 44:101092
    Article Google Scholar
  38. Ma S, Jiao J, Ren S, Song W (2023) Missing value filling for multi-variable urban air quality data based onattention mechanism. Comput Eng Sci 45(8):1354–1364
    Google Scholar
  39. Jiang N, Li Y, Zuo H, Zheng H, Zheng Q (2020) Bilstm-a: A missing value imputation method for pm2. 5 prediction. In: 2020 2nd International Conference on Applied Machine Learning (ICAML), pp. 23–28. IEEE
  40. Peña M, Ortega P, Orellana M (2019) A novel imputation method for missing values in air pollutant time series data. In: 2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI), pp. 1–6. IEEE
  41. Chandra W, Suprihatin B, Resti Y (2023) Median-knn regressor-smote-tomek links for handling missing and imbalanced data in air quality prediction. Symmetry 15(4):887
    Article Google Scholar
  42. Junger W, De Leon AP (2015) Imputation of missing data in time series for air pollutants. Atmos Environ 102:96–104
    Article Google Scholar
  43. Özdenizci O, Legenstein R (2023) Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Trans Pattern Anal Mach Intell 45:10346–10357
    Article Google Scholar
  44. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722
  45. Lin C, Labzovskii LD, Mak HWL, Fung JC, Lau AK, Kenea ST, Bilal M, Hey JDV, Lu X, Ma J (2020) Observation of pm2.5 using a combination of satellite remote sensing and low-cost sensor network in Siberian urban areas with limited reference monitoring. Atmos Environ 227:117410
    Article Google Scholar
  46. DeSouza P, Anjomshoaa A, Duarte F, Kahn R, Kumar P, Ratti C (2020) Air quality monitoring using mobile low-cost sensors mounted on trash-trucks: methods development and lessons learned. Sustain Cities Soc 60:102239
    Article Google Scholar
  47. Hofman J, Do TH, Qin X, Bonet ER, Philips W, Deligiannis N, La Manna VP (2022) Spatiotemporal air quality inference of low-cost sensor data: evidence from multiple sensor testbeds. Environ Modell Softw 149:105306
    Article Google Scholar

Download references

Acknowledgements

The authors express gratitude to the Key Scientific Research Projects of Colleges and Universities in Henan Province (NO.23A170013).

Funding

The Key Scientific Research Projects of Colleges and Universities in Henan Province (NO.23A170013).

Author information

Authors and Affiliations

  1. College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, Henan, China
    Zhenying Li, Weidong Li, Xuehai Zhang & Jinlong Duan
  2. Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, 100094, China
    Linyan Bai

Authors

  1. Zhenying Li
  2. Weidong Li
  3. Xuehai Zhang
  4. Jinlong Duan
  5. Linyan Bai

Corresponding authors

Correspondence toWeidong Li or Linyan Bai.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Z., Li, W., Zhang, X. et al. DMMP-Net: diffusion model-based missing part patching network for station air quality data generation completion.Int. J. Mach. Learn. & Cyber. 16, 3601–3612 (2025). https://doi.org/10.1007/s13042-024-02468-x

Download citation

Keywords