DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm (original) (raw)

Abstract

Document indexing using dimension reduction has been widely studied in recent years. Application of these methods in large distributed systems may be inefficient due to the required computational, storage, and communication costs. In this paper, we propose DLPR, a distributed locality preserving dimension reduction algorithm, to project a large distributed data set into a lower dimensional space. Partitioning methods are applied to divide the data set into several clusters. The system nodes communicate through virtual groups to project the clusters to the target space, independently or in conjunction with each other.

The actual computation of reduction transforms is performed using Locality Preserving Indexing, which is a less studied method in distributed environments. Experimental results demonstrate the efficiency of DLPR in terms of preserving the local structure of the data set, and reducing the computing and storage costs.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Larose, D.T.: Data mining methods and models. Wiley-Interscience, Hohn Wiley and Sons, Hoboken, New Jersey (2005)
    Book Google Scholar
  2. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, John Wiley and Sons (1995)
    Google Scholar
  3. Smith, L.: A tutorial on principal components analysis. University of Otago (2002)
    Google Scholar
  4. Heisterkamp, D.R.: Building a latent semantic index of an image database from patterns of relevance feedback. In: 4th International Conference on Pattern Recognition, pp. 134–137 (2002)
    Google Scholar
  5. Sahouria, E., Zakhor, A.: Content analysis of video using principal componets. In: 3rd International Conference on Image Processing, pp. 541–545 (1998)
    Google Scholar
  6. Smaragdis, P., Raj, B., Shashanka, M.: A probabilistic latent variable model for acoustic modeling. In: Workshop on Advances in Models for Acoustic Processing at NIPS (2006)
    Google Scholar
  7. He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems 16, Vancouver, Canada (2003)
    Google Scholar
  8. Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineerin 17, 1624–1637 (2005)
    Article Google Scholar
  9. Bassu, D., Behrens, C.: Distributed LSI: scalable concept-based information retrieval with high semantic resolution. In: 2003 Text Mining Workshop, pp. 72–82. ACM Press, San Francisco (2003)
    Google Scholar
  10. Zhang, Z., Zha, H.: Structure and perturbation analysis of truncated SVD for column-partitioned matrices. Matrix Analysis and Applications 22, 1245–1262 (2001)
    Article MathSciNet MATH Google Scholar
  11. Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)
    Article MATH Google Scholar
  12. Gao, J., Zhang, J.: Clustered SVD strategies in latent semantic indexing. Information Processing and Management 41, 1051–1063 (2005)
    Article MATH Google Scholar
  13. Gao, J., Zhang, J.: Text Retrieval Using Sparsified Concept Decomposition Matrix. In: Zhang, J., He, J.-H., Fu, Y. (eds.) CIS 2004. LNCS, vol. 3314, pp. 523–529. Springer, Heidelberg (2004)
    Chapter Google Scholar
  14. Zeimpekis, D., Gallopoulos, E.: ClSI: A flexible approximation scheme from clustered term-document matrices. In: SIAM Data Mining Conference, Newport Beach, California, pp. 631–635 (2005)
    Google Scholar
  15. Vigna, S.: Distributed, large-scale latent semantic analysis by index interpolation. In: 3rd International Conference on Scalable Information Systems, vol. 18 (2008)
    Google Scholar
  16. Alham, N.K., Li, M., Liu, Y., Hammoud, S.: A MapReduce-based Distributed SVM Algorithm for Automatic Image Annotation. Computers and Mathematics with Applications 62, 2801–2811 (2011)
    Article MATH Google Scholar
  17. Liu, Y., Li, M., Hammoud, S., Alham, N.K., Ponraj, M.: A MapReduce based distributed LSI. In: 7th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 297–298. IEEE Press, Yantai (2010)
    Google Scholar
  18. He, X., Niyogi, P.: Indexing by latent semantic analysis. Neural Information Processing Systems 6, 153–160 (2003)
    Google Scholar
  19. Lo, V., Zhou, D., Liu, Y., Dickey, C.G., Li, J.: Scalable supernode selection in peer-to-peer overlay networks. In: 2nd HOT-P2P Workshop, pp. 18–25. IEEE Press (2005)
    Google Scholar
  20. Datta, S., Giannella, C., Kargupta, H.: K-Means Clustering over a Large, Dynamic Network. In: SIAM International Conference on Data Mining, pp. 153–164 (2006)
    Google Scholar
  21. Hammouda, K.M., Kamel, M.S.: Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization. IEEE Transactions on Knowledge and Data Engineering, 681–698 (2009)
    Google Scholar
  22. Panigrahy, R.: Entropy-based nearest neighbor algorithm in high dimensions. In: ACM-SIAM Symposium on Discrete Algorithms (2006)
    Google Scholar
  23. Mashayekhi, H., Habibi, J.: K-Nearest Neighbor Search in Peer-to-Peer Systems. In: 2nd International Conference on Advances in P2P Systems, pp. 2–5 (2010)
    Google Scholar
  24. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: 3rd International Conference on Research and Development in Information Retreival, Toronto, Canada, pp. 267–273 (2003)
    Google Scholar
  25. Lovasz, L., Plummer, M.: Matching Theory. Akadémiai Kiadó. North Holland, Budapest (1986)
    MATH Google Scholar
  26. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
    Google Scholar

Download references

Author information

Authors and Affiliations

  1. Computer Engineering Department, Sharif University of Technology, Tehran, Iran
    Mina Ghashami, Hoda Mashayekhi & Jafar Habibi

Authors

  1. Mina Ghashami
  2. Hoda Mashayekhi
  3. Jafar Habibi

Editor information

Editors and Affiliations

  1. School of Information Technology, Deakin University, Melbourne Burwood Campus, 221 Burwood Highway, 3125, Burwood, VIC, Australia
    Yang Xiang
  2. Media Distribution, Telstra Corporation Limited, 21/35 Collins St, 3000, Melbourne, VIC, Australia
    Mukaddim Pathan
  3. Department of Mathematics and Computing, The University of Southern Queensland, Toowoomba, QLD, Australia
    Xiaohui Tao & Hua Wang &

Rights and permissions

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ghashami, M., Mashayekhi, H., Habibi, J. (2012). DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm. In: Xiang, Y., Pathan, M., Tao, X., Wang, H. (eds) Internet and Distributed Computing Systems. IDCS 2012. Lecture Notes in Computer Science, vol 7646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34883-9\_8

Download citation

Keywords

Publish with us