Handling of incomplete data sets using ICA and SOM in data mining (original) (raw)

Abstract

Based on independent component analysis (ICA) and self-organizing maps (SOM), this paper proposes an ISOM-DH model for the incomplete data’s handling in data mining. Under these circumstances the data remain dependent and non-Gaussian, this model can make full use of the information of the given data to estimate the missing data and can visualize the handled high-dimensional data. Compared with mixture of principal component analyzers (MPCA), mean method and standard SOM-based fuzzy map model, ISOM-DH model can be applied to more cases, thus performing its superiority. Meanwhile, the correctness and reasonableness of ISOM-DH model is also validated by the experiment carried out in this paper.

Access this article

Log in via an institution

Subscribe and save

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69
    Article MATH MathSciNet Google Scholar
  2. Wang S (2003) Application of self-organising maps for data mining with incomplete data sets. Neural Comput Appl 12:42–48
    Article Google Scholar
  3. Chang P-C, Lai C-Y (2005) A hybrid system combining self-organizing maps with case-based reasoning in wholesaler’s new-release book for forecasting. Expert Syst Appl 29:183–192
    Article Google Scholar
  4. Oba S et al (2002) Missing value estimation using mixture of PCAs. LNCS 2415, pp 492–497
  5. Ad Feelders (1999) Handling missing data in trees-surrogate splits or statistical imputation. LNAI 1704, pp 329–334
    Google Scholar
  6. Grzymala-Busse JW (2004) Rough set approach to incomplete data. LNAI 3070, pp 50–55
  7. Gerardo BD et al (2004) The association rule algorithm with missing data in data mining. LNCS3043, pp 97–105
  8. Li D et al (2004) Towards missing data imputation—a study of fuzzy K-means clustering method. LNAI 3066, pp 573–579
    Google Scholar
  9. Zs. J. Viharos et al (2002) Training and application of artificial neural networks with incomplete data. LNAI 2358, pp 649–659
  10. Latkowski R (2002) Incomplete data decomposition for classification. LNAI 2475, pp 413–420
    Google Scholar
  11. Jutten C, Herault J (1998) Independent component analysis versus PCA. In: Proceeding of European signal processing conference, 287–314
  12. Singh Y, Rai CS (2003) A simplified approach to independent component analysis. Neural Comput Appl 12:173–177
    Article Google Scholar
  13. Kocsor A, Csirik J (2001) Fast independent component analysis in kernel feature spaces. LNCS 2234, pp 271–281
    Google Scholar
  14. Theis FJ et al (2002) Overcomplete ICA with a geometric algorithm. LNCS 2415, pp 1049–1054
  15. Vapnik V (2004) Statistical learning theory. Publishing House of Electronics Industry, Beijing
    Google Scholar

Download references

Author information

Authors and Affiliations

  1. Department of Applied Mathematics, Sun Yat-sen University, Guangzhou, 510275, China
    Hongyi Peng & Siming Zhu

Authors

  1. Hongyi Peng
  2. Siming Zhu

Corresponding author

Correspondence toHongyi Peng.

Rights and permissions

About this article

Cite this article

Peng, H., Zhu, S. Handling of incomplete data sets using ICA and SOM in data mining.Neural Comput & Applic 16, 167–172 (2007). https://doi.org/10.1007/s00521-006-0058-6

Download citation

Keywords