Information Gain Research Papers - Academia.edu (original) (raw)
Dimensionality refers to number of terms in a web page. While classifying web pages high dimensionality of web pages causes problem. The main objective of reducing dimensionality of web pages is improving the performance of classifier.... more
Dimensionality refers to number of terms in a web page. While classifying web pages high dimensionality of web pages causes problem. The main objective of reducing dimensionality of web pages is improving the performance of classifier. Processing time and accuracy are two parameters which influence the performance of a classifier. To reduce the processing time, less informative and redundant terms have to be removed from web pages. This research describes hybrid approach for dimensionality reduction in web page classification using a rough set and naïve Bayesian method. Feature selection and dimensionality reduction methods are used for reducing the dimensionality. Information gain method is used as feature selection method. Rough set based Quick Reduct algorithm is used for dimensionality reduction. Naïve Bayesian method is used for classifying web pages to optimal predefined categories. Assignment of web pages to category is based on maximum posterior probability. Words remaining ...
In this paper, an enhanced ant colony optimization (EACO) is proposed for capacitated vehicle routing problem. The capacitated vehicle routing problem is to service customers with known demands by a homogeneous fleet of fixed capacity... more
In this paper, an enhanced ant colony optimization (EACO) is proposed for capacitated vehicle routing problem. The capacitated vehicle routing problem is to service customers with known demands by a homogeneous fleet of fixed capacity vehicles starting from a depot. It plays a major role in the field of logistics and belongs to NP-hard problems. Therefore, it is difficult to solve the capacitated vehicle routing problem directly when solutions increase exponentially with the number of serviced customers. The framework of this paper is to develop an enhanced ant colony optimization for the capacitated vehicle routing problem. It takes the advantages of simulated annealing and ant colony optimization for solving the capacitated vehicle routing problem. In the proposed algorithm, simulated annealing provides a good initial solution for ant colony optimization. Furthermore, an information gain based ant colony optimization is used to ameliorate the search performance. Computational results show that the proposed algorithm is superior to original ant colony optimization and simulated annealing separately reported on fourteen small-scale instances and twenty large-scale instances.
The purpose of this article is to describe statistical procedures to assess how prevention and intervention programs achieve their effects. The analyses require the measurement of intervening or mediating variables hypothesized to... more
The purpose of this article is to describe statistical procedures to assess how prevention and intervention programs achieve their effects. The analyses require the measurement of intervening or mediating variables hypothesized to represent the causal mechanism by which the prevention program achieves its effects. Methods to estimate mediation are illustrated in the evaluation of a health promotion program designed to reduce dietary cholesterol and a school-based drug prevention program. The methods are relatively easy to apply and the information gained from such analyses should add to our understanding of prevention.
... To the extent the REIS market rents and IREM market expenses represent the benchmark, then differences in an individual REIT portfolio relative to the benchmark arise through management decisions. ... Same Store New Store Total New... more
... To the extent the REIS market rents and IREM market expenses represent the benchmark, then differences in an individual REIT portfolio relative to the benchmark arise through management decisions. ... Same Store New Store Total New Store** Total Efficiency Gains ...
CiteSeerX - Document Details (Isaac Councill, Lee Giles, Pradeep Teregowda): We propose to use two seemingly different R 2 measures of fit in PROC LOGISTIC and PROC GENMOD (SAS/STAT), and we show that they are closely related to each... more
CiteSeerX - Document Details (Isaac Councill, Lee Giles, Pradeep Teregowda): We propose to use two seemingly different R 2 measures of fit in PROC LOGISTIC and PROC GENMOD (SAS/STAT), and we show that they are closely related to each other in terms of the amount of ...
It has been widely accepted by many studies that non-linearity exists in the financial markets and that neural networks can be effectively used to uncover this relationship. Unfortunately, many of these studies fail to consider... more
It has been widely accepted by many studies that non-linearity exists in the financial markets and that neural networks can be effectively used to uncover this relationship. Unfortunately, many of these studies fail to consider alternative forecasting techniques, the relevance of input variables, or the performance of the models when using different trading strategies. This paper introduces an information gain technique used in machine learning for data mining to evaluate the predictive relationships of numerous financial and economic variables. Neural network models for level estimation and classification are then examined for their ability to provide an effective forecast of future values. A cross-validation technique is also employed to improve the generalization ability of several models. The results show that the trading strategies guided by the classification models generate higher risk-adjusted profits than the buy-and-hold strategy, as well as those guided by the level-estim...
INTERNATIONAL JOURNAL OF MEDICAL I NFORMATICS 76 (2oo7)23¢a45 235 1. Introduction Online evidence retrieval systems provide health professionals with fast and easy access to a wide range of information resources to inform their... more
INTERNATIONAL JOURNAL OF MEDICAL I NFORMATICS 76 (2oo7)23¢a45 235 1. Introduction Online evidence retrieval systems provide health professionals with fast and easy access to a wide range of information resources to inform their decision-making processes. Provision of online evidence systems to support clinical work at the point-of-care has been adopted as a strategy to support evidence-based practice in the UK, USA and Australia [1—3]. Few evaluations of their effectiveness have been reported [4]. Most studies have focused on assessing ...
The driving force behind the evolution of computing from automating automation [3] to automating learning [8] can be attributed to Machine Learning algorithms. By being able to generalize from examples, today's computers have the ability... more
The driving force behind the evolution of computing from automating automation [3] to automating learning [8] can be attributed to Machine Learning algorithms. By being able to generalize from examples, today's computers have the ability to perform autonomously and take critical decisions in almost any given task. In this report, we briefly discuss some of the foundations of what makes Machine Learning so powerful. Within the realm of supervised learning, we explore one of the most widely used classifi cation technique known as Decision Tree Learning. We develop and analyze the ID3 algorithm, in particular we demonstrate how concepts such as Shannon's Entropy and Information Gain enables this form of learning to yield such powerful results. Furthermore, we introduce avenues through which the ID3 algorithm can be improved such as Gain ratio and the Random Forest Approach [4].
The improvements in educational data mining (EDM) and machine learning motivated the academic staff to implement educational models to predict the performance of students and find the factors that increase their success. EDM faced many... more
The improvements in educational data mining (EDM) and machine learning motivated the academic staff to implement educational models to predict the performance of students and find the factors that increase their success. EDM faced many approaches for classifying, analyzing and predicting a student"s academic performance. This paper presents a model of prediction based on an artificial neural network (ANN) by implementing feature selection (FS). A questionnaire is built to collect students" answers using LimeSurvey and google forms. The questionnaire holds a combination of 61 questions that cover many fields such as sports, health, residence, academic activities, social and managerial information. 161 students participated in the survey from two departments (. The data set is combined from two sources applications and is pre-processed by removing the uncompleted answers to produce 151 answers used in the model. Apart from the model, the FS approach is implemented to find the top correlated questions that affect the final class (Grade). The aim of FS is to eliminate the unimportant questions and find those which are important, besides improving the accuracy of the model. A combination of Four FS methods (Info Gain, Correlation, SVM and PCA) are tested and the average rank of these algorithms is obtained to find the top 30 questions out of 61 questions of the questionnaire. Artificial Neural Network is implemented to predict the grade (Pass (P) or Failed (F)). The model performance is compared with three previous models to prove its optimality.
- by Nuno Carvalhais and +1
- •
- Earth Sciences, Time Series, Monte Carlo, Data Assimilation
There is a need for a methodology to fairly compare and present evaluation study results of stochastic global optimization algorithms. This need raises two important questions of (i) an appropriate set of benchmark test problems that the... more
There is a need for a methodology to fairly compare and present evaluation study results of stochastic global optimization algorithms. This need raises two important questions of (i) an appropriate set of benchmark test problems that the algorithms may be tested upon and (ii) a methodology to compactly and completely present the results. To address the first question, we compiled a collection of test problems, some are better known than others. Although the compilation is not exhaustive, it provides an easily accessible collection of standard test problems for continuous global optimization. Five different stochastic global optimization algorithms have been tested on these problems and a performance profile plot based on the improvement of objective function values is constructed to investigate the macroscopic behavior of the algorithms. The paper also investigates the microscopic behavior of the algorithms through quartile sequential plots, and contrasts the information gained from these two kinds of plots. The effect of the length of run is explored by using three maximum numbers of function evaluations and it is shown to significantly impact the behavior of the algorithms.