Web Log Mining: A study of user sessions (original) (raw)

Predicting user behavior through Sessions using the Web log mining

It is the method to extract the user sessions from the given log files. Initially, each user is identified according to his/her IP address specified in the log file and corresponding user sessions are extracted. Two types of logs ie., server-side logs and client-side logs are commonly used for web usage and usability analysis. Server-side logs can be automatically generated by web servers, with each entry corresponding to a user request. Clientside logs can capture accurate, comprehensive usage data for usability analysis. Usability is defined as the satisfaction, efficiency and effectiveness with which specific users can complete specific tasks in a particular environment. This process includes 3 stages, namely Data cleaning, User identification, Session identification. In this paper, we are implementing these three phases. Depending upon the frequency of users visiting each page mining is performed. By finding the session of the user we can analyze the user behavior by the time spend on a particular page.

Preprocessing and mining web log data for web personalization

AI* IA 2003: Advances in …, 2003

We describe the web usage mining activities of an on-going project, called ClickWorld 1 , that aims at extracting models of the navigational behaviour of a web site users. The models are inferred from the access logs of a web server by means of data and web mining techniques. The extracted knowledge is deployed to the purpose of offering a personalized and proactive view of the web services to users. We first describe the preprocessing steps on access logs necessary to clean, select and prepare data for knowledge extraction. Then we show two sets of experiments: the first one tries to predict the sex of a user based on the visited web pages, and the second one tries to predict whether a user might be interested in visiting a section of the site.

An Improved Session Identification Approach in Web Log Mining for Web Personalization

This Web based applications are increasing at an enormous speed and as a result its users are also increasing at an exponential speed. The innovative and evolutionary changes in technology have made it possible to capture the user’s fundamental nature and interactions with web applications through web server log file as web usage. In order to design attractive web sites, designers must understand their user’s needs. Therefore analyzing navigational behavior of users is an important part of web page design. Web Usage Mining (WUM) is the application of data mining techniques to web usage data in order to discover the patterns that can be used to analyze the user’s navigational behavior. Since web contains large amount of “irrelevant information” in the web log, the original log file cannot be directly used in the WUM process. Therefore, the preprocessing of web log file becomes very important in order to improve the accuracy in Web log mining. The basic procedure of data preprocessing is introduced firstly in this paper with the traditional session identification algorithm is fully analyzed, on the basis of which a session identification algorithm on page threshold and dynamic timeout is presented. Finally, the initial timeout is computed for each page according to sessions formed, combining with the importance degree of improved dynamic threshold algorithm which discards the uninterested attributes from log file. Comparing experiment shows that the algorithm Proposed can obtain a better performance on session identification and user interests which is the key for web personalization.

Log Data Preparation for Mining Web Usage Patterns

2007

In this paper we focus on log data preprocessing, the first step of a common Web Usage Mining process. In particular, we present LODAP (LOg DAta Preprocessor), a software tool which we designed and implemented in order to perform preprocessing of log data. The working scheme of LODAP embraces several steps. Firstly, log files are cleaned by removing irrelevant data. Then, the remaining requests are structured into user sessions, encoding the browsing behavior of users. Successively, the uninteresting sessions and the least visited pages are removed in order to reduce the size of data concerning the previously extracted user sessions. In addition, LODAP allows to create reports containing the results obtained in each step and information summaries mined from the analysis of the considered log files. During the preprocessing through LODAP, the analyst is guided by a sequence of panels representing the wizard-based interface which characterizes the tool. Each panel is a graphical window which offers a basic function of the preprocessor. Preliminary results on log files of a specific Web site show that the implemented tool can effectively reduce the log data size and identify user sessions encoding the user browsing behavior in a significant manner.

Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction

Data Mining and Knowledge …, 2011

In the last decade, the importance of analyzing information management systems logs has grown, because log data constitute a relevant aspect in evaluating the quality of such systems. A review of 10 years of research on log analysis is presented in this paper. About 50 papers and posters from five major conferences and about 30 related journal papers have been selected to trace the history of the state-of-the-art in this field. The paper presents an overview of two main themes: Web search engine log analysis and Digital Library System log analysis. The problem of the analysis of different sources of log data and the distribution of data are investigated.

Research on Web Log Mining to Predicting User Behavior through Session

International Journal for Research in Applied Science and Engineering Technology, 2018

Web usage mining is leading research area in Web Mining concerned about the web user's behavior. Web log mining is one of the recent areas of research in Data mining. Web Usage Mining becomes an important aspect in today's era because the quantity of data is continuously increasing. We deal with the web server logs which maintain the history of page requests. Web log files are the files which contain complete information about the users browse activities on the web server Web mining is the application of data mining techniques to discover patterns from the World Wide Web. This paper gives an attention on Web usage mining to predict the behavior of web users based on web server log files. Users using web pages, a frequent access path's and frequent access pages, links are stored in web server log files. Depending upon the frequency of users visiting each page mining is performed. By finding the session of the user we can analyze the user behavior by the time spend on a particular page. Web log along with the individuality of the user captures their browsing behavior on a website and discussing regarding the behavior from analysis of different algorithms and different methods. Web log analysis is a kind of web analytics technique that parses a server log file from a web server, and based on the values contained in the log file, derives indicators about when, how, and by whom a web server is visited. Using frequent links, we predict the user behavior and identify are all the sites mostly viewed by users. These used to predict the user behavior of and faculty in our college.

COMPREHENSIVE FRAMEWORK FOR PATTERN ANALYSIS THROUGH WEB LOGS USING WEB MINING: A REVIEW

Here we are presenting a personalization process based on Web usage mining. This paper reviews the process of discovering useful patterns from the web server log file. In this a host of Web usage mining activities required for this process, including the pre-processing and integration of data from multiple sources, and common pattern discovery techniques that are applied to the integrated usage data.

Efficient Approach for Web Search Personalization in User Behavior Supported Web Server Log Files Using Web Usage Mining

In the present word web is the colossal capacity of data and it will continue expanding with the developing of web innovations. However, the person ability to peruse, get to and comprehend content does not increment with that string. Henceforth it ends up plainly complex to site proprietors to introduce appropriate data to the clients. This prompted give customized web administrations to clients. One of the notable methodologies in giving web personalization is Web Usage Mining. In this paper, our thought process of web use mining is to find clients' get to examples of site pages naturally and rapidly from the immense server get to log records, for example, often went to hyperlinks, much of the time got to site pages and clients gathering. Likewise, we proposed another strategy for finding clients' get to designs and prescribe it to the client.

Data Preprocessing Evaluation for Web Log Mining: Reconstruction of Activities of a Web Visitor

Presumptions of each data analysis are data themselves, regardless of the analysis focus (visit rate analysis, optimization of portal, personalization of portal, etc.). Results of selected analysis highly depend on the quality of analyzed data. In case of portal usage analysis, these data can be obtained by monitoring web server log file. We are able to create data matrices and web map based on these data which will serve for searching for behaviour patterns of users. Data preparation from the log file represents the most time-consuming phase of whole analysis. We realized an experiment so that we can find out to which criteria are necessary to realize this time-consuming data preparation. We aimed at specifying the inevitable steps that are required for obtaining valid data from the log file. Specially, we focused on the reconstruction of activities of the web visitor. This advanced technique of data preprocessing belongs to time consuming one. In the article we tried to assess the impact of reconstruction of activities of a web visitor on the quantity and quality of the extracted rules which represent the web users' behaviour patterns.