DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm (original) (raw)
Abstract
Document indexing using dimension reduction has been widely studied in recent years. Application of these methods in large distributed systems may be inefficient due to the required computational, storage, and communication costs. In this paper, we propose DLPR, a distributed locality preserving dimension reduction algorithm, to project a large distributed data set into a lower dimensional space. Partitioning methods are applied to divide the data set into several clusters. The system nodes communicate through virtual groups to project the clusters to the target space, independently or in conjunction with each other.
The actual computation of reduction transforms is performed using Locality Preserving Indexing, which is a less studied method in distributed environments. Experimental results demonstrate the efficiency of DLPR in terms of preserving the local structure of the data set, and reducing the computing and storage costs.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
- Larose, D.T.: Data mining methods and models. Wiley-Interscience, Hohn Wiley and Sons, Hoboken, New Jersey (2005)
Book Google Scholar - Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, John Wiley and Sons (1995)
Google Scholar - Smith, L.: A tutorial on principal components analysis. University of Otago (2002)
Google Scholar - Heisterkamp, D.R.: Building a latent semantic index of an image database from patterns of relevance feedback. In: 4th International Conference on Pattern Recognition, pp. 134–137 (2002)
Google Scholar - Sahouria, E., Zakhor, A.: Content analysis of video using principal componets. In: 3rd International Conference on Image Processing, pp. 541–545 (1998)
Google Scholar - Smaragdis, P., Raj, B., Shashanka, M.: A probabilistic latent variable model for acoustic modeling. In: Workshop on Advances in Models for Acoustic Processing at NIPS (2006)
Google Scholar - He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems 16, Vancouver, Canada (2003)
Google Scholar - Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineerin 17, 1624–1637 (2005)
Article Google Scholar - Bassu, D., Behrens, C.: Distributed LSI: scalable concept-based information retrieval with high semantic resolution. In: 2003 Text Mining Workshop, pp. 72–82. ACM Press, San Francisco (2003)
Google Scholar - Zhang, Z., Zha, H.: Structure and perturbation analysis of truncated SVD for column-partitioned matrices. Matrix Analysis and Applications 22, 1245–1262 (2001)
Article MathSciNet MATH Google Scholar - Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)
Article MATH Google Scholar - Gao, J., Zhang, J.: Clustered SVD strategies in latent semantic indexing. Information Processing and Management 41, 1051–1063 (2005)
Article MATH Google Scholar - Gao, J., Zhang, J.: Text Retrieval Using Sparsified Concept Decomposition Matrix. In: Zhang, J., He, J.-H., Fu, Y. (eds.) CIS 2004. LNCS, vol. 3314, pp. 523–529. Springer, Heidelberg (2004)
Chapter Google Scholar - Zeimpekis, D., Gallopoulos, E.: ClSI: A flexible approximation scheme from clustered term-document matrices. In: SIAM Data Mining Conference, Newport Beach, California, pp. 631–635 (2005)
Google Scholar - Vigna, S.: Distributed, large-scale latent semantic analysis by index interpolation. In: 3rd International Conference on Scalable Information Systems, vol. 18 (2008)
Google Scholar - Alham, N.K., Li, M., Liu, Y., Hammoud, S.: A MapReduce-based Distributed SVM Algorithm for Automatic Image Annotation. Computers and Mathematics with Applications 62, 2801–2811 (2011)
Article MATH Google Scholar - Liu, Y., Li, M., Hammoud, S., Alham, N.K., Ponraj, M.: A MapReduce based distributed LSI. In: 7th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 297–298. IEEE Press, Yantai (2010)
Google Scholar - He, X., Niyogi, P.: Indexing by latent semantic analysis. Neural Information Processing Systems 6, 153–160 (2003)
Google Scholar - Lo, V., Zhou, D., Liu, Y., Dickey, C.G., Li, J.: Scalable supernode selection in peer-to-peer overlay networks. In: 2nd HOT-P2P Workshop, pp. 18–25. IEEE Press (2005)
Google Scholar - Datta, S., Giannella, C., Kargupta, H.: K-Means Clustering over a Large, Dynamic Network. In: SIAM International Conference on Data Mining, pp. 153–164 (2006)
Google Scholar - Hammouda, K.M., Kamel, M.S.: Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization. IEEE Transactions on Knowledge and Data Engineering, 681–698 (2009)
Google Scholar - Panigrahy, R.: Entropy-based nearest neighbor algorithm in high dimensions. In: ACM-SIAM Symposium on Discrete Algorithms (2006)
Google Scholar - Mashayekhi, H., Habibi, J.: K-Nearest Neighbor Search in Peer-to-Peer Systems. In: 2nd International Conference on Advances in P2P Systems, pp. 2–5 (2010)
Google Scholar - Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: 3rd International Conference on Research and Development in Information Retreival, Toronto, Canada, pp. 267–273 (2003)
Google Scholar - Lovasz, L., Plummer, M.: Matching Theory. Akadémiai Kiadó. North Holland, Budapest (1986)
MATH Google Scholar - Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Google Scholar
Author information
Authors and Affiliations
- Computer Engineering Department, Sharif University of Technology, Tehran, Iran
Mina Ghashami, Hoda Mashayekhi & Jafar Habibi
Authors
- Mina Ghashami
- Hoda Mashayekhi
- Jafar Habibi
Editor information
Editors and Affiliations
- School of Information Technology, Deakin University, Melbourne Burwood Campus, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Yang Xiang - Media Distribution, Telstra Corporation Limited, 21/35 Collins St, 3000, Melbourne, VIC, Australia
Mukaddim Pathan - Department of Mathematics and Computing, The University of Southern Queensland, Toowoomba, QLD, Australia
Xiaohui Tao & Hua Wang &
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ghashami, M., Mashayekhi, H., Habibi, J. (2012). DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm. In: Xiang, Y., Pathan, M., Tao, X., Wang, H. (eds) Internet and Distributed Computing Systems. IDCS 2012. Lecture Notes in Computer Science, vol 7646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34883-9\_8
Download citation
- .RIS
- .ENW
- .BIB
- DOI: https://doi.org/10.1007/978-3-642-34883-9\_8
- Publisher Name: Springer, Berlin, Heidelberg
- Print ISBN: 978-3-642-34882-2
- Online ISBN: 978-3-642-34883-9
- eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science