Log Classification using K-Means Clustering for Identify Internet User Behaviors (original) (raw)

Cyber Profiling Using Log Analysis And K-Means Clustering

International Journal of Advanced Computer Science and Applications, 2016

The Activities of Internet users are increasing from year to year and has had an impact on the behavior of the users themselves. Assessment of user behavior is often only based on interaction across the Internet without knowing any others activities. The log activity can be used as another way to study the behavior of the user. The Log Internet activity is one of the types of big data so that the use of data mining with K-Means technique can be used as a solution for the analysis of user behavior. This study has been carried out the process of clustering using K-Means algorithm is divided into three clusters, namely high, medium, and low. The results of the higher education institution show that each of these clusters produces websites that are frequented by the sequence: website search engine, social media, news, and information. This study also showed that the cyber profiling had been done strongly influenced by environmental factors and daily activities.

Analyzing Logs from Proxy Server and Captive Portal Using K-Means Clustering Algorithm

Middle East Journal of Applied Science & Technology, 2020

The traffic on World Wide Web is rapidly increasing, and an enormous amount of generated data due to users’ various interactions with websites. Thus, web data becomes one of the most valuable resources for information retrievals and knowledge discoveries. The study utilized the logs from the Proxy Server and Captive Portal database and used Web Usage Mining to discover useful and exciting patterns from the web data. Moreover, k-means clustering algorithm was used to provide specific groups of the user access patterns specifically for the number of user sessions and websites accessed by the network users. Based on the results, it had been found out that most of the time, users are more engage in utilizing the internet.

A Comparative Study of Mining Web Usage Patterns Using Variants of k-Means Clustering Algorithm

The explosive growth in the information available on the Web has prompted the need for developing Web personalization systems that understand and exploit user preferences to dynamically serve customized content to individual users [1]. To reveal information about user preferences from Web usage data, Data Mining techniques can be naturally applied, leading to the so-called Web Usage Mining (WUM) [2]. Clustering is widely used in WUM to capture similar interests and trends among users accessing a Web site [3]. k-Means clustering is a popular clustering algorithm based on the partitioning of data. However one of the drawbacks of it is that it requires the user to specify the number of clusters at the beginning and also it is sensitive to the initial selection of cluster centres. The global k-Means algorithm proposed by Likas [4] provides an incremental approach to clustering by dynamically adding one cluster centre at a time through a deterministic global search procedure. It does not depend on any initial conditions and considerably outperforms the k-Means algorithms, but the problem associated with this algorithm is its heavy computational effort. A faster version of global k-Means algorithm substantially reduces the execution time by improving the way of creating the next cluster centre in the global k-Means algorithm. We implemented and tested these algorithms against the web usage data in order to discover the user navigational session clusters. In this paper we present the implementation details of each of the above mentioned k-Means clustering techniques along with the underlying mathematical foundations. The results are presented with a comparison of different techniques. Our results show that the fast global k-Means clustering algorithm significantly reduces the computational time without affecting the performance of the global k-Means algorithm. It also outperforms the global K-means algorithm in terms of validity measure.

Analysisof Students’ Web Browsing Behaviours Using Data Miningat a Campus Network

Turkish Journal of Computer and Mathematics Education (TURCOMAT), 2021

Analytics provides insight to people based on the analytics of past usage by using techniques such as statistics, data mining, machine learning and artificial intelligence. Lack of monitoring system of browsing causes low engagements that reduce the growth of certain businesses caused by unnecessary browsing for students learning time. This paper presents an analysis on browsing behavior that classifies browsed words followed their ethical word-groups browsing. An Analytic platform is created as a monitoring system of browsing behavior. Data mining, indexing and classification method are used in this research as data is the essential key of creating a predictive model and four types of ethical groups have been filtered based on the browsing behaviors. The browsed words are categorized into four types of browsing called queries, applications, social media, Campus-related sites. The research method uses software tools and data mining process on the browsing data and analytics is prese...

Clustering Web Users Based on K-means Algorithm for Reducing Time Access Cost

2019 First International Conference of Intelligent Computing and Engineering (ICOICE), 2019

Numerous organizations are providing web-based services due to the consistent increase in web development and number of available web searching tools. However, the advancements in web-based services are associated with increasing difficulties in information retrieval. Efforts are now toward reducing the Internet traffic load and the cost of user access to important information. Web clustering as an important web usage mining (WUM) task groups web users based on their browsing patterns to ensure the provision of a useful knowledge of personalized web services. Based on the web structure, each Uniform Resource Locator (URL) in the web log data is parsed into tokens which are uniquely identified for URLs classification. The collective sequence of URLs a user navigated over a period of 30 minutes is considered as a session and the session is a representation of the users' navigation pattern. In this paper, K-Means algorithm was used to cluster web users based on their similarity in a vector matrix and K-means algorithm implemented several times when k=2,3,4 till k=8 and the results showed the best similarity was when k=8 and the Residual Sum of Squares (RSS) evaluation measure achieved a high intra-cluster similarity value (3.049) when k=8 .

SrilekhaVedulaKameswari “Web Usage Mining using Statistical Classifiers and Fuzzy

2014

There are many models in literature and practice that analyse user behaviour based on user navigation data and use clustering algorithms to characterize their access patterns. The navigation patterns identified are expected to capture the user’s interests. In this paper, we model user behaviour as a vector of the time the user spends at each URL, and further classify a given new user access pattern. The clustering and classification methods of k-means with non-Euclidean similarity measure, Bayesian classifiers and artificial neural networks, with standardised fuzzy inputs are implemented and compared. Apart from identifying user behaviour, the model can also be used as a prediction system where we can identify deviational behaviour. Keywords:

IJERT-Clustering Model Based on Web Behavior

International Journal of Engineering Research and Technology (IJERT), 2014

https://www.ijert.org/clustering-model-based-on-web-behavior https://www.ijert.org/research/clustering-model-based-on-web-behavior-IJERTV3IS031029.pdf Web log mining is an emerging part of data mining. It provides invaluable information by discovering trends and regularities in web user's access patterns. Clustering based on access pattern is an important research topic of web usage mining. Knowledge obtained from web user clusters has been used in different fields of web mining technologies. This paper presents an algorithm for measuring similarities and automated segmentation of web users based on their past access patterns. The compatibility measures are based on content extracted from user's browser data. Furthermore it also provides a locality based clustering method for the people who are unknown to their most compatible friends.

Well-organized Data Mining Techniques for Clustering of Users on Web Log Data

Web usage mining is one among the essential frameworks to find domain data from the interaction of users with the net. This domain data is used for effective management of prognosticative websites, the creation of adaptative websites, enhancing business and net services, personalization, and so on. In nonprofit able organization's web site, it's tough to spot who area unit users, what info they have, and their interest's modification with time. Web usage mining supported log knowledge provides an answer to the present problem. The planned work focuses on weblog knowledge preprocessing, thin matrix construction supported net navigation of every user and clump the users of comparable interests. The performance of net usage mining is additionally compared supported k-means, X-means, and farthest 1st clump algorithms.

Web Users Clustering Analysis

As one of the most important tasks of web usage mining, web user clustering, which establishes groups of users exhibiting similar browsing patterns, provides useful knowledge to personalized web services. There are many clustering algorithm. In this paper, users' similarity is calculated then a comparative analysis of two clustering algorithms namely K-means algorithm and hierarchical algorithm is performed. Web users are clustered with these algorithms based on web user log data. Given a set of web users and their associated historical web usage data, we study their behavior characteristic and cluster them. In terms of accuracy K-means produces better results as compared to hierarchical algorithm. https://sites.google.com/site/ijcsis/