Analysis of Web User Clustering based on Users’ Access Behavior (original) (raw)
Related papers
Clustering Web Users Based on K-means Algorithm for Reducing Time Access Cost
2019 First International Conference of Intelligent Computing and Engineering (ICOICE), 2019
Numerous organizations are providing web-based services due to the consistent increase in web development and number of available web searching tools. However, the advancements in web-based services are associated with increasing difficulties in information retrieval. Efforts are now toward reducing the Internet traffic load and the cost of user access to important information. Web clustering as an important web usage mining (WUM) task groups web users based on their browsing patterns to ensure the provision of a useful knowledge of personalized web services. Based on the web structure, each Uniform Resource Locator (URL) in the web log data is parsed into tokens which are uniquely identified for URLs classification. The collective sequence of URLs a user navigated over a period of 30 minutes is considered as a session and the session is a representation of the users' navigation pattern. In this paper, K-Means algorithm was used to cluster web users based on their similarity in a vector matrix and K-means algorithm implemented several times when k=2,3,4 till k=8 and the results showed the best similarity was when k=8 and the Residual Sum of Squares (RSS) evaluation measure achieved a high intra-cluster similarity value (3.049) when k=8 .
As one of the most important tasks of web usage mining, web user clustering, which establishes groups of users exhibiting similar browsing patterns, provides useful knowledge to personalized web services. There are many clustering algorithm. In this paper, users' similarity is calculated then a comparative analysis of two clustering algorithms namely K-means algorithm and hierarchical algorithm is performed. Web users are clustered with these algorithms based on web user log data. Given a set of web users and their associated historical web usage data, we study their behavior characteristic and cluster them. In terms of accuracy K-means produces better results as compared to hierarchical algorithm. https://sites.google.com/site/ijcsis/
Web log mining is a new subfield of data mining research. It aims at discovery of trends and regularities in web users' access patterns. This paper presents a new algorithm for automated segmentation of web users based on their access patterns. The results may lead to an improved organization of the web documents for navigational convenience.
Clustering Model Based on Web Behavior
2014
Web log mining is an emerging part of data mining. It provides invaluable information by discovering trends and regularities in web user's access patterns. Clustering based on access pattern is an important research topic of web usage mining. Knowledge obtained from web user clusters has been used in different fields of web mining technologies. This paper presents an algorithm for measuring similarities and automated segmentation of web users based on their past access patterns. The compatibility measures are based on content extracted from user's browser data. Furthermore it also provides a locality based clustering method for the people who are unknown to their most compatible friends.
CAS based clustering algorithm for Web users
Nonlinear Dynamics, 2010
This article devises a clustering technique for detecting groups of Web users from Web access logs. In this technique, Web users are clustered by a new clustering algorithm which uses the mechanism analysis of chaotic ant swarm (CAS). This CAS based clustering algorithm is called as CAS-C and it solves clustering problems from the perspective of chaotic optimization. The performance of CAS-C for detecting Web user clusters is compared with the popular clustering method named k-means algorithm. Clustering qualities are evaluated via calculating the average intra-cluster and inter-cluster distance. Experimental results demonstrate that CAS-C is an effective clus-tering technique with larger average intra-cluster distance and smaller average inter-cluster distance than k-means algorithm. The statistical analysis of resulted distances also proves that the CAS-C based Web user clustering algorithm has better stability. In order to show the utility, the proposed approach is applied to a pre-fetching task which predicts user requests with encouraging results.
Web User Session Cluster Discovery Based on k-Means and k-Medoids Techniques
The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper describes the discovery of user session clusters using k-Means and k-Medoids clustering techniques. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared. Keywords-web usage mining; k-means clustering; k-medoids clustering I. INTRODUCTION Web Usage Mining [1] is described as the automatic discovery and analysis of patterns in web logs and associated data collected as a result of user interactions with Web resources on one or more Web sites. The goal of Web usage mining is to capture, model, and analyse the behavioural patterns and profiles of users interacting with a Web site. The discovered patterns are usually represented as collections of URLs that are frequently accessed by groups of users with common interests. Web usage mining has been used in a variety of applications such as i) Web Personalization systems [2], ii) Adaptive Web Sites [3][4], iii) Business Intelligence [5], iv) System Improvement to understand the web traffic behaviour which can be utilized to decide strategies for web caching [6], load balancing and data distribution [7], iv) Fraud detection: detection of unusual accesses to the secured data [8], etc.
Web Users Clustering Based on Fuzzy C-MEANS
VAWKUM Transactions on Computer Sciences
The Web contributes greatly to our life in many fields such as education, entertainment, Internet banking, online shopping and software downloading. This has led to rapid growth in the number of Internet users, which resulting in an explosive increase in traffic or bottleneck over the Internet performance. This paper proposes a new approach to group users according to their Web access patterns. The proposed approach for grouping users is based on Fuzzy c-means technique, which allows web users to be assigned into more than one cluster or interest. Each web user has a degree of membership of belonging to each cluster. The experimental results showed that the web users were successfully clustered to similar groups very fast using Fuzzy-c-means. In addition, the Fuzzy-c-means performed well and became much better when the clusters number increased on two real Bo2 and NY datasets. The proposed intelligent web users clustering based on Fuzzy-c-means can be used for discovering users' interests in Web pages that can contribute in enhancing several approaches such as Web caching, Web pre-fetching and Web recommender systems that are recently used to improve the Web performance.
Discovery of Web Usage Profiles Using Various Clustering Techniques
The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper reviews four of the popularly used clustering techniques: k-Means, k-Medoids, Leader and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared. (Abstract)
Automatic clustering for the web usage mining
2003
In this paper we present an approach based on two hybrid clustering methods for Web Usage Mining (WUM). The WUM process contains three steps: pre-processing, data mining and result analysis. First, we give a brief description of the WUM process and Web data, followed in section 2 by the presentation of the pre-processing step and the data warehouse that we employed. Two hybrid clustering methods based on Principal Components Analysis (PCA), Multiple Classification Analysis (MCA) and Dynamic Clustering, are used for analysing the Web logs taken from INRIA's Web servers. The results obtained after applying these methods and the corresponding interpretations are presented in section four of the article. Finally, we provide some perspectives and future work.
A Comparative Study of Mining Web Usage Patterns Using Variants of k-Means Clustering Algorithm
The explosive growth in the information available on the Web has prompted the need for developing Web personalization systems that understand and exploit user preferences to dynamically serve customized content to individual users [1]. To reveal information about user preferences from Web usage data, Data Mining techniques can be naturally applied, leading to the so-called Web Usage Mining (WUM) [2]. Clustering is widely used in WUM to capture similar interests and trends among users accessing a Web site [3]. k-Means clustering is a popular clustering algorithm based on the partitioning of data. However one of the drawbacks of it is that it requires the user to specify the number of clusters at the beginning and also it is sensitive to the initial selection of cluster centres. The global k-Means algorithm proposed by Likas [4] provides an incremental approach to clustering by dynamically adding one cluster centre at a time through a deterministic global search procedure. It does not depend on any initial conditions and considerably outperforms the k-Means algorithms, but the problem associated with this algorithm is its heavy computational effort. A faster version of global k-Means algorithm substantially reduces the execution time by improving the way of creating the next cluster centre in the global k-Means algorithm. We implemented and tested these algorithms against the web usage data in order to discover the user navigational session clusters. In this paper we present the implementation details of each of the above mentioned k-Means clustering techniques along with the underlying mathematical foundations. The results are presented with a comparison of different techniques. Our results show that the fast global k-Means clustering algorithm significantly reduces the computational time without affecting the performance of the global k-Means algorithm. It also outperforms the global K-means algorithm in terms of validity measure.