Discovery of Web Usage Profiles Using Various Clustering Techniques (original) (raw)
Related papers
Web User Session Cluster Discovery Based on k-Means and k-Medoids Techniques
The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper describes the discovery of user session clusters using k-Means and k-Medoids clustering techniques. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared. Keywords-web usage mining; k-means clustering; k-medoids clustering I. INTRODUCTION Web Usage Mining [1] is described as the automatic discovery and analysis of patterns in web logs and associated data collected as a result of user interactions with Web resources on one or more Web sites. The goal of Web usage mining is to capture, model, and analyse the behavioural patterns and profiles of users interacting with a Web site. The discovered patterns are usually represented as collections of URLs that are frequently accessed by groups of users with common interests. Web usage mining has been used in a variety of applications such as i) Web Personalization systems [2], ii) Adaptive Web Sites [3][4], iii) Business Intelligence [5], iv) System Improvement to understand the web traffic behaviour which can be utilized to decide strategies for web caching [6], load balancing and data distribution [7], iv) Fraud detection: detection of unusual accesses to the secured data [8], etc.
Discovery of Web User Session Clusters Using Partitioning Based Clustering Techniques
The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper describes the discovery of user session clusters using the two most popular partition based clustering techniques namely k-Means and k-Medoids. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared.
Web Personalization Using Clustering of Web Usage Data
International Journal in Foundations of Computer Science & Technology, 2014
The exponential growth in the number and the complexity of information resources and services on the Web has made log data an indispensable resource to characterize the users for Web-based environment. It creates information of related web data in the form of hierarchy structure through approximation. This hierarchy structure can be used as the input for a variety of data mining tasks such as clustering, association rule mining, sequence mining etc. In this paper, we present an approach for personalizing web user environment dynamically when he interacting with web by clustering of web usage data using concept hierarchy. The system is inferred from the web server's access logs by means of data and web usage mining techniques to extract the information about users. The extracted knowledge is used for the purpose of offering a personalized view of the services to users.
As one of the most important tasks of web usage mining, web user clustering, which establishes groups of users exhibiting similar browsing patterns, provides useful knowledge to personalized web services. There are many clustering algorithm. In this paper, users' similarity is calculated then a comparative analysis of two clustering algorithms namely K-means algorithm and hierarchical algorithm is performed. Web users are clustered with these algorithms based on web user log data. Given a set of web users and their associated historical web usage data, we study their behavior characteristic and cluster them. In terms of accuracy K-means produces better results as compared to hierarchical algorithm. https://sites.google.com/site/ijcsis/
Clustering Model Based on Web Behavior
2014
Web log mining is an emerging part of data mining. It provides invaluable information by discovering trends and regularities in web user's access patterns. Clustering based on access pattern is an important research topic of web usage mining. Knowledge obtained from web user clusters has been used in different fields of web mining technologies. This paper presents an algorithm for measuring similarities and automated segmentation of web users based on their past access patterns. The compatibility measures are based on content extracted from user's browser data. Furthermore it also provides a locality based clustering method for the people who are unknown to their most compatible friends.
Influence of Various Clustering Algorithms on Web Personalization
2009
Abstract Today many e-commerce websites are incorporating personalization features to provide users with relevant content based on their past browsing behavior in order to make their browsing experience better. In turn, site owners gain more loyal customers. This paper compares various clustering algorithms such as K-means, Fuzzy C-means, Subtractive Clustering and K-modes, used for grouping of web user sessions. The clusters formed as a result of applying these algorithms are aggregated to form web user profiles.
Automatic clustering for the web usage mining
2003
In this paper we present an approach based on two hybrid clustering methods for Web Usage Mining (WUM). The WUM process contains three steps: pre-processing, data mining and result analysis. First, we give a brief description of the WUM process and Web data, followed in section 2 by the presentation of the pre-processing step and the data warehouse that we employed. Two hybrid clustering methods based on Principal Components Analysis (PCA), Multiple Classification Analysis (MCA) and Dynamic Clustering, are used for analysing the Web logs taken from INRIA's Web servers. The results obtained after applying these methods and the corresponding interpretations are presented in section four of the article. Finally, we provide some perspectives and future work.
IJERT-Clustering Model Based on Web Behavior
International Journal of Engineering Research and Technology (IJERT), 2014
https://www.ijert.org/clustering-model-based-on-web-behavior https://www.ijert.org/research/clustering-model-based-on-web-behavior-IJERTV3IS031029.pdf Web log mining is an emerging part of data mining. It provides invaluable information by discovering trends and regularities in web user's access patterns. Clustering based on access pattern is an important research topic of web usage mining. Knowledge obtained from web user clusters has been used in different fields of web mining technologies. This paper presents an algorithm for measuring similarities and automated segmentation of web users based on their past access patterns. The compatibility measures are based on content extracted from user's browser data. Furthermore it also provides a locality based clustering method for the people who are unknown to their most compatible friends.
Clustering techniques are widely used in "Web Usage Mining" to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are minimized while the intra cluster similarities are maximized. Since the application of different clustering algorithms generally results in different sets of cluster formation, it is important to evaluate the performance of these methods in terms of accuracy and validity of the clusters, and also the time required to generate them, using appropriate performance measures. This paper describes various validity and accuracy measures including Dunn's Index, Davies Bouldin Index, C Index, Rand Index, Jaccard Index, Silhouette Index, Fowlkes Mallows and Sum of the Squared Error (SSE). We conducted the performance evaluation of the following clustering techniques: k-Means, k-Medoids, Leader, Single Link Agglomerative Hierarchical and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Finally their performance results are presented and compared.