Log File Analysis Research Papers (original) (raw)

Purpose – To explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH). Design/methodology/approach – The Web... more

Purpose – To explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH). Design/methodology/approach – The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data. Findings – Link data, when analysed together with user transaction log files (i.e., Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if...

With the growing popularity of mobile commerce (m-commerce), it becomes vital for both researchers and practitioners to understand m-commerce usage behavior. In this study, we investigate browsing behavior patterns based on the analysis... more

With the growing popularity of mobile commerce (m-commerce), it becomes vital for both researchers and practitioners to understand m-commerce usage behavior. In this study, we investigate browsing behavior patterns based on the analysis of clickstream data that is recorded in server-side log files. We compare consumers' browsing behavior in the m-commerce channel against the traditional e-commerce channel. For the comparison, we offer an integrative web usage mining approach, combining visualization graphs, association rules and classification models to analyze the Web server log files of a large Internet retailer in Israel, who introduced m-commerce to its existing e-commerce offerings. The analysis is expected to reveal typical m-commerce and e-commerce browsing behavior, in terms of session timing and intensity of use and in terms of session navigation patterns. The obtained results will contribute to the emerging research area of m-commerce and can be also used to guide future development of mobile websites and increase their effectiveness. Our preliminary findings are promising. They reveal that browsing behaviors in m-commerce and e-commerce are different.

This research outline refers to the assessment of motivation in online learning environments. It includes a presentation of previous approaches, most of them based on Keller’s ARCS model, and argues for an approach based on Social... more

This research outline refers to the assessment of motivation in online learning environments. It includes a presentation of previous approaches, most of them based on Keller’s ARCS model, and argues for an approach based on Social Cognitive Learning Theory, in particular building on self-efficacy and self-regulation concepts. The research plan includes two steps: first, detect the learners in danger of dropping-out based on their interaction with the system; second, create a model of the learner’s motivation (including self-efficacy, self-regulation, goal orientation, attribution and perceived task characteristics) upon which intervention can be done.

Scientific LogAnalyzer is a platform-independent interactive Web service for the analysis of log files. Scientific LogAnalyzer offers several features not available in other log file analysis tools — for example, organizational criteria... more

Scientific LogAnalyzer is a platform-independent interactive Web service for the analysis of log files. Scientific LogAnalyzer offers several features not available in other log file analysis tools — for example, organizational criteria and computational algorithms suited to aid behavioral and social scientists. Scientific LogAnalyzer is highly flexible on the input side (unlimited types of log file formats), while strictly keeping a scientific output format. Features include (1) free definition of log file format, (2) searching and marking dependent on any combination of strings (necessary for identifying conditions in experiment data), (3) computation of response times, (4) detection of multiple sessions, (5) speedy analysis of large log files, (6) output in HTML and/or tab-delimited form, suitable for import into statistics software, and (7) a module for analyzing and visualizing drop-out. Several methodological features specifically needed in the analysis of data collected in Internet-based experiments have been implemented in the Web-based tool and are described in this article. A regression analysis with data from 44 log file analyses shows that the size of the log file and the domain name lookup are the two main factors determining the duration of an analysis. It is less than a minute for a standard experimental study with a 2 X 2 design, a dozen Web pages, and 48 participants (ca. 800 lines, including data from drop-outs). The current version of Scientific LogAnalyzer is freely available for small log files. Its Web address is h

Learning environments aim to deliver efficacious instruction, but rarely take into consideration the motivational factors involved in the learning process. However, motivational aspects like engagement play an important role in effective... more

Learning environments aim to deliver efficacious instruction, but rarely take into consideration the motivational factors involved in the learning process. However, motivational aspects like engagement play an important role in effective learning-engaged learners gain more. E-Learning systems could be improved by tracking students' disengagement that, in turn, would allow personalized interventions at appropriate times in order to reengage students. This idea has been exploited several times for Intelligent Tutoring Systems, but not yet in other types of learning environments that are less structured. To address this gap, our research looks at online learning-content-delivery systems using educational data mining techniques. Previously, several attributes relevant for disengagement prediction were identified by means of log-file analysis on HTML-Tutor, a web-based learning environment. In this paper, we investigate the extendibility of our approach to other systems by studying the relevance of these attributes for predicting disengagement in a different e-learning system. To this end, two validation studies were conducted indicating that the previously identified attributes are pertinent for disengagement prediction, and two new meta-attributes derived from log-data observations improve prediction and may potentially be used for automatic log-file annotation.

Log Analysis is a critical procedure in most framework and system exercises where log information is utilized for different reasons, for example, for execution checking, security examining or notwithstanding for revealing and profiling.... more

Log Analysis is a critical procedure in most framework and system exercises where log information is utilized for different reasons, for example, for execution checking, security examining or notwithstanding for revealing and profiling. Nonetheless, as years cruised by, the volume of log information increments alongside the span of the framework just as the quantity of clients included. Customary or existing log analyser instruments are not ready to deal with the huge measure of information. Thusly, Big Data is the answer for defeated this issue. The principle motivation behind this paper is to introduce a survey of log document investigation in Big Data condition dependent on past research works. This paper likewise features the qualities of Big Data just as Hadoop Framework that has been generally utilized as Big Data application. Results from the papers assessed demonstrate that dominant part analysts connected MapReduce as the principle segment of Hadoop for investigating the log records and HDFS as the information stockpiling. Past analysts have likewise utilized different instruments and calculations together with the Hadoop Framework for investigation purposes. The discoveries of this paper will give an intelligible audit of Hadoop use execution in breaking down various kinds of log records and prescribe justifiable outcomes for end clients to use in future work.

The learners’ motivation has an impact on the quality of learning, especially in e-Learning environments. Most of these environments store data about the learner’s actions in log files. Logging the users’ interactions in educational... more

The learners’ motivation has an impact on the quality of learning, especially in e-Learning environments. Most of these environments store data about the learner’s actions in log files. Logging the users’ interactions in educational systems gives the possibility to track their actions at a refined level of detail. Data mining and machine learning techniques can “give meaning” to these data and provide valuable information for learning improvement. An area where improvement is absolutely necessary and of great importance is motivation, known to be an essential factor for preventing attrition in e-Learning. In this paper we investigate if the log files data analysis can be used to estimate the motivational level of the learner. A decision tree is build from a limited number of log files from a web-based learning environment. The results suggest that time spent reading is an important factor for predicting motivation; also, performance in tests was found to be a relevant indicator of t...

This paper presents a comparative analysis of user queries to a web search engine, questions to a Q&A service (answers.com), and questions employed in question answering (QA) evaluations at TREC and CLEF. The analysis shows that user... more

This paper presents a comparative analysis of user queries to a web search engine, questions to a Q&A service (answers.com), and questions employed in question answering (QA) evaluations at TREC and CLEF. The analysis shows that user queries to search engines contain mostly content words (i.e. keywords) but lack structure words (i.e. stopwords) and capitalization. Thus, they resemble natural language input after case folding and stopword removal. In contrast, topics for QA evaluation and questions to answers.com mainly consist of fully capitalized and syntactically well-formed questions. Classification experiments using a na¨ıve Bayes classifier show that stopwords play an important role in determining the expected answer type. A classification based on stopwords is considerably more
accurate (47.5% accuracy) than a classification based on all query words (40.1% accuracy) or on content words (33.9% accuracy). To simulate user input, questions are preprocessed by case folding and stopword removal. Additional classification experiments aim at reconstructing the syntactic wh-word frame of a question, i.e. the embedding of the interrogative word. Results indicate that this part of questions can be reconstructed with moderate accuracy (25.7%), but for a classification problem with a much larger number of classes compared to classifying queries by expected answer type (2096 classes vs. 130 classes). Furthermore, eliminating stopwords can lead to
multiple reconstructed questions with a different or with the opposite meaning (e.g. if negations or temporal restrictions are included). In conclusion, question reconstruction from short user queries can be seen as a new realistic evaluation challenge for QA systems.

In this paper it is shown how a real-world electronic dictionary can be simultaneously compiled and its use studied. While the results of the dictionary use study may be successfully fed back into the compilation, the semi-automatic... more

In this paper it is shown how a real-world electronic dictionary can be simultaneously compiled and its use studied. While the results of the dictionary use study may be successfully fed back into the compilation, the semi-automatic analysis of the use itself for the first time reveals how electronic dictionaries are really used. In order to achieve this, an intricate and multifaceted integrated log file tracks every single action of every single user – date and time stamping each lookup, ordering founds and not-founds, monitoring long-term vocabulary retention, etc. – with a multitude of summaries being presented to the lexicographers. The ultimate goal is that with such data the parameters of various user profiles could be pinpointed, with which self-tailoring electronic dictionaries could be built.

This paper outlines a typology for online communities of practice. The typology is based on findings from observations of three online communities of practice, a content analysis of messages, and a review of the existing literature. The... more

This paper outlines a typology for online communities of practice. The typology is based on findings from observations of three online communities of practice, a content analysis of messages, and a review of the existing literature. The three examples of communities of practice are of electronic discussion lists that cover topics of interest to university webmasters, librarians, and educators. This work expands on a typology that consolidated prior research and focused on online communities of practice within ...

To ensure the normal operation of a large computer network sys- tem, the common practice is to constantly collect system logs and analyze the network activities for detecting anomalies. Most of the analysis methods in use today are highly... more

To ensure the normal operation of a large computer network sys- tem, the common practice is to constantly collect system logs and analyze the network activities for detecting anomalies. Most of the analysis methods in use today are highly automated due to the enor- mous size of the collected data. Conventional automated methods are largely based on statistical modeling, and

Problem The emphasis on scientific practices in the NGSS requires the ability to teach and assess students' abilities in those areas. Yet the process of doing science is often messy, materials-intensive, and expensive. Virtual experiments... more

Problem The emphasis on scientific practices in the NGSS requires the ability to teach and assess students' abilities in those areas. Yet the process of doing science is often messy, materials-intensive, and expensive. Virtual experiments provide a cost effective alternative, and have been useful in studying how students build models (Gill, Marcum Dietrich, & Becker Klein, 2014) and how to assess student learning (DeBoer et al., 2014). This study investigates learners' behaviors in a simulation designed to allow them to explore relationships among different variables in ecosystems. First, we outline a theoretical framework for scientific experimentation as an activity. Then, we briefly explore research on virtual simulations as educational tools for scientific experimentation, including implications of using simulations on learning assessment. Finally, we present the results of a study of students' approaches to experimentation by bringing together interview data and log file data gathered during the interviews.

Most e-Learning systems store data about the learner’s actions in log files, which give us detailed information about learner behaviour. Data mining and machine learning techniques can give meaning to these data and provide valuable... more

Most e-Learning systems store data about the learner’s actions in log files, which give us detailed information about learner behaviour. Data mining and machine learning techniques can give meaning to these data and provide valuable information for learning improvement. One area that is of particular importance in the design of e-Learning systems is learner motivation as it is a key factor in the quality of learning and in the prevention of attrition. One aspect of motivation is engagement, a necessary condition for effective learning. Using data mining techniques for log file analysis, our research investigates the possibility of predicting users’ level of engagement, with a focus on disengaged learners. As demonstrated previously across two different e-Learning systems, HTML-Tutor and iHelp, disengagement can be predicted by monitoring the learners’ actions (e.g. reading pages and taking test/quizzes). In this paper we present the findings of three studies that refine this prediction approach. Results from the first study show that two additional reading speed attributes can increase the accuracy of prediction. The second study suggests that distinguishing between two different patterns of disengagement (spending a long time on a page/test and browsing quickly through pages/tests) may improve prediction in some cases. The third study demonstrates the influence of exploratory behaviour on prediction, as most users at the first login familiarize themselves with the system before starting to learn.

Educational Data Mining (EDM) has emerged as independent research area in recent years. Moreover student learning environment also rapidly move towards online. Compare to traditional teaching method, online tutoring will attracts younger... more

Educational Data Mining (EDM) has emerged as independent research area in recent years. Moreover student learning environment also rapidly move towards online. Compare to traditional teaching method, online tutoring will attracts younger generations. However student engagement is an important aspect of effective learning. Most of the students performed well in their academic performance and spent more time with internet too. Thus measuring disengagement is likely to help poor performance students. In this paper we propose a new framework called Quasi Framework, which is trying to measure the significant relationship between disengagement level and their academic achievement.

Engagement is an important aspect of effective learning. Time spent using an e-Learning system is not quality time if the learner is not engaged. Tracking student disengagement would offer the possibility to intervene in order to motivate... more

Engagement is an important aspect of effective learning. Time spent using an e-Learning system is not quality time if the learner is not engaged. Tracking student disengagement would offer the possibility to intervene in order to motivate the learner at appropriate time. In previous research we demonstrated the possibility of predicting engagement from log files using a web-based e-Learning system. In this paper we present the results obtained from another web-based system and compare them to the previous ones. The similarity of results across systems demonstrates that our approach is system- independent and that engagement can be elicited from basic information logged by most e-Learning systems: number of pages read, time spent reading pages, number of tests/quizzes and time spent on test/quizzes.

Fuzzy SF, a novel concept for an electronic-dictionary package, is presented. In Fuzzy SF, log-file based Artificial Intelligence components enable the implicit retrieval of personalised user feedback with which the package customises... more

Fuzzy SF, a novel concept for an electronic-dictionary package, is presented. In Fuzzy SF, log-file based Artificial Intelligence components enable the implicit retrieval of personalised user feedback with which the package customises each user’s own and unique dictionary. To that end, all the data in both the databases and the multimedia (sub)corpora are graded using Fuzzy Sets, so that the package only answers queries on the user’s (current) level.

Background and Aim Among the places that context-aware systems and services would be very useful, are libraries. The purpose of this study is to achieve a coherent definition of context aware systems and applications, especially in... more

Background and Aim Among the places that context-aware systems and services would be very useful,
are libraries. The purpose of this study is to achieve a coherent definition of context aware systems
and applications, especially in digital libraries.
Method: This was a review article that was conducted by using Library method by searching articles
and e-books on websites and databases.
Results: Findings of this study indicate that context-aware services in digital libraries by understanding
specific conditions of each user - such as demographic characteristics, time position, location and
collecting and analyzing of its data - could provide smart and appropriate services.
Conclusion: Digital libraries are constantly evolving and moving forward and must coordinate themselves
with on-going changes. Equipping libraries with context-aware services lead them to best meet
information needs of their users, without limitation of time and place.
Keywords: Context-aware, digital libraries, context-aware systems, context-aware models, application
programs, context-aware architecture.

This paper presents initial results from the ongoing evaluation of the Greek Go-Online e-business awareness and training web portal. The Go-Online portal aims to support the very small end of the small and medium enterprises (vSMEs) that... more

This paper presents initial results from the ongoing evaluation of the Greek Go-Online e-business awareness and training web portal. The Go-Online portal aims to support the very small end of the small and medium enterprises (vSMEs) that participate in the Greek Go-Online initiative by providing a wide number of e-services that adress the dynamic needs of a user group with diverse profiles. This paper presents results from the most recent evaluation acticities conducted: the log files analysis of the Go-Online web portal and the online survey of the web portal users' satisfaction.

The growing need for content customization in websites has fostered the development of systems which try to identify the user’s navigation patterns. These may be, normally, identified by means of log file analysis. However, this solution... more

The growing need for content customization in websites has fostered the development of systems which try to identify the user’s navigation patterns. These may be, normally, identified by means of log file analysis. However, this solution does not identify the semantic intention behind user’s navigation. This paper provides an approach to incorporating semantic knowledge to the process of identifying the user’s intentions in the navigation of a website with semantic support. The capture of the user’s intentions is achieved by the semantic enrichment of the log files and the use of and approach that takes into account the linguistic and cognitive aspects in the development of the user model.

In this paper we introduce a novel interactive video retrieval approach which uses sub-needs of an information need for querying and organising the search process. The underlying assumption of this approach is that the search... more

In this paper we introduce a novel interactive video retrieval approach which uses sub-needs of an information need for querying and organising the search process. The underlying assumption of this approach is that the search effectiveness will be enhanced when employed for interactive video retrieval. We explore the performance bounds of a faceted system by using the simulated user evaluation methodology on TRECVID data sets and also on the logs of a prior user experiment with the system. We discuss the simulated evaluation strategies employed in our evaluation and the effect on the use of both textual and visual features. The facets are simulated by the use of clustering the video shots using textual and visual features. The experimental results of our study demonstrate that the faceted browser can potentially improve the search effectiveness.

Web-based learning environments are now used extensively as integral components of course delivery in tertiary education. To provide an effective learning environment, it is important that educators understand how these environments are... more

Web-based learning environments are now used extensively as integral components of course delivery in tertiary education. To provide an effective learning environment, it is important that educators understand how these environments are used by their students. In conventional teaching environments educators are able to obtain feedback on student learning experiences in face-to-face interactions with their students, enabling continual evaluation of