Model-based cluster analysis for web users sessions (original) (raw)

AN UNSUPERVISED APPROACH FOR USER BEHAVIOUR CLUSTERING OF WEBSITES USING THE NAVIGATION PATTERNS OF WEB USERS

Journal of Software Engineering & Intelligent Systems, 2018

Web traffic and e-commerce activities are increasing rapidly day by day. Hence, understanding the behavior of users based on their interactions with a website is becoming important. Web usage mining is needed for that. It works on web clickstream data in order to extract usage patterns. There are two major challenges involved here: One is preprocessing the raw data to provide an accurate picture of how a website is used. Other is to present the rules and patterns that are potentially interesting to the users. This paper proposes and develops an architecture for performing that. Firstly, we clean the web server logs by using a traditional clustering approach. Then, we apply a Discrete Time Markov Chain approach to generate a model of the user behavior. For generating the nodes for the model, we use a technique (regular expressions) to find out the atomic propositions. Then we find a directed graph as an output of a DTMC inference process. Next, we apply spectral clustering on that directed graph, which works on the affinity of the graph nodes and divides the nodes into clusters. Finally, we use graph traversal algorithms and discover the navigation patterns of web users for each cluster. To evaluate the approach, we use server log files from the website www.ualberta.ca. This approach is very useful to simplify better web personalization and website organization. It automatically finds out clusters of usage patterns undertake by the users, and makes this data available to the web designer. Hence the web designer will know the interests of the user and this will help them to develop a more personalized space for users.

Discovery of Web User Session Clusters Using Partitioning Based Clustering Techniques

The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper describes the discovery of user session clusters using the two most popular partition based clustering techniques namely k-Means and k-Medoids. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared.

Web User Session Cluster Discovery Based on k-Means and k-Medoids Techniques

The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper describes the discovery of user session clusters using k-Means and k-Medoids clustering techniques. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared. Keywords-web usage mining; k-means clustering; k-medoids clustering I. INTRODUCTION Web Usage Mining [1] is described as the automatic discovery and analysis of patterns in web logs and associated data collected as a result of user interactions with Web resources on one or more Web sites. The goal of Web usage mining is to capture, model, and analyse the behavioural patterns and profiles of users interacting with a Web site. The discovered patterns are usually represented as collections of URLs that are frequently accessed by groups of users with common interests. Web usage mining has been used in a variety of applications such as i) Web Personalization systems [2], ii) Adaptive Web Sites [3][4], iii) Business Intelligence [5], iv) System Improvement to understand the web traffic behaviour which can be utilized to decide strategies for web caching [6], load balancing and data distribution [7], iv) Fraud detection: detection of unusual accesses to the secured data [8], etc.

Validation and interpretation of Web users’ sessions clusters

2007

Understanding users' navigation on the Web is important towards improving the quality of information and the speed of accessing large-scale Web data sources. Clustering of users' navigation into sessions has been proposed in order to identify patterns and similarities which are then managed in the context of Web users oriented applications (searching, e-commerce, etc.). This paper deals with the problem of assessing the quality of user session clusters in order to make inferences regarding the users' navigation behavior.

Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Data Mining and …, 2003

We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach w e e m p l o y is model-based (as opposed to distance-based) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of rst-order Markov models using the Expectation-Maximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on user-tra c data from msnbc.com.

A Study of User Navigation Patterns for Web Usage Mining

Web Usage Mining is used in discovering navigation patterns and show how those discoveries can help in assessing and improving the quality of the site. By "quality" we mean Providing the desired content with minimum effort to the web user. The insight of the visitors is indirectly reflected in their navigation behavior, as represented in their browsing patterns. The study of web server logs has been used to model the behavior of web users to provide services like the recommendation of pages, providing the facility of prefetching pages to minimize access time,

Discovery of Web Usage Profiles Using Various Clustering Techniques

The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper reviews four of the popularly used clustering techniques: k-Means, k-Medoids, Leader and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared. (Abstract)

Quantitative Evaluation of Performance and Validity Indices for Clustering the Web Navigational Sessions

Clustering techniques are widely used in "Web Usage Mining" to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are minimized while the intra cluster similarities are maximized. Since the application of different clustering algorithms generally results in different sets of cluster formation, it is important to evaluate the performance of these methods in terms of accuracy and validity of the clusters, and also the time required to generate them, using appropriate performance measures. This paper describes various validity and accuracy measures including Dunn's Index, Davies Bouldin Index, C Index, Rand Index, Jaccard Index, Silhouette Index, Fowlkes Mallows and Sum of the Squared Error (SSE). We conducted the performance evaluation of the following clustering techniques: k-Means, k-Medoids, Leader, Single Link Agglomerative Hierarchical and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Finally their performance results are presented and compared.

Clustering Model Based on Web Behavior

2014

Web log mining is an emerging part of data mining. It provides invaluable information by discovering trends and regularities in web user's access patterns. Clustering based on access pattern is an important research topic of web usage mining. Knowledge obtained from web user clusters has been used in different fields of web mining technologies. This paper presents an algorithm for measuring similarities and automated segmentation of web users based on their past access patterns. The compatibility measures are based on content extracted from user's browser data. Furthermore it also provides a locality based clustering method for the people who are unknown to their most compatible friends.