Information Gain Research Papers - Academia.edu (original) (raw)

2025, Neurocomputing

Feature selection problems arise in a variety of applications, such as microarray analysis, clinical prediction, text categorization, image classification and face recognition, multi-label learning, and classification of internet traffic.... more

Feature selection problems arise in a variety of applications, such as microarray analysis, clinical prediction, text categorization, image classification and face recognition, multi-label learning, and classification of internet traffic. Among the various classes of methods, forward feature selection methods based on mutual information have become very popular and are widely used in practice. However, comparative evaluations of these methods have been limited by being based on specific datasets and classifiers. In this paper, we develop a theoretical framework that allows evaluating the methods based on their theoretical properties. Our framework is grounded on the properties of the target objective function that the methods try to approximate, and on a novel categorization of features, according to their contribution to the explanation of the class; we derive upper and lower bounds for the target objective function and relate these bounds with the feature types. Then, we characterize the types of approximations taken by the methods, and analyze how these approximations cope with the good properties of the target objective function. Additionally, we develop a distributional setting designed to illustrate the various deficiencies of the methods, and provide several examples of wrong feature selections. Based on our work, we identify clearly the methods that should be avoided, and the methods that currently have the best performance.

2025, 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS)

Collaborative Filtering (CF) is a popular recommendation system that makes recommendations based on similar users' preferences. Though it is widely used, CF is prone to Shilling/Profile Injection attacks, where fake profiles are injected... more

Collaborative Filtering (CF) is a popular recommendation system that makes recommendations based on similar users' preferences. Though it is widely used, CF is prone to Shilling/Profile Injection attacks, where fake profiles are injected into the CF system to alter its outcome. Most of the existing shilling attacks do not work on online systems and cannot be efficiently implemented in real-world applications. In this paper, we introduce an efficient Multi-Armed-Bandit-based reinforcement learning method to practically execute online shilling attacks. Our method works by reducing the uncertainty associated with the item selection process and finds the most optimal items to enhance attack reach. Such practical online attacks open new avenues for research in building more robust recommender systems. We treat the recommender system as a black box, making our method effective irrespective of the type of CF used. Finally, we also experimentally test our approach against popular state-of-the-art shilling attacks.

2025, 2012 IEEE Congress on Evolutionary Computation

One of the major challenges in automatic classification is to deal with highly dimensional data. Several dimensionality reduction strategies, including popular feature selection metrics such as Information Gain and χ 2 , have already been... more

One of the major challenges in automatic classification is to deal with highly dimensional data. Several dimensionality reduction strategies, including popular feature selection metrics such as Information Gain and χ 2 , have already been proposed to deal with this situation. However, these strategies are not well suited when the data is very skewed, a common situation in real-world data sets. This occurs when the number of samples in one class is much larger than the others, causing common feature selection metrics to be biased towards the features observed in the largest class. In this paper, we propose the use of Genetic Programming (GP) to implement an aggressive, yet very effective, selection of attributes. Our GP-based strategy is able to largely reduce dimensionality, while dealing effectively with skewed data. To this end, we exploit some of the most common feature selection metrics and, with GP, combine their results into new sets of features, obtaining a better unbiased estimate for the discriminative power of each feature. Our proposal was evaluated against each individual feature selection metric used in our GP-based solution (namely, Information Gain, χ 2 , Odds-Ratio, Correlation Coefficient) using a k8 cancer-rescue mutants data set, a very unbalanced collection referring to examples of p53 protein. For this data set, our solution not only increases the efficiency of the learning algorithms, with an aggressive reduction of the input space, but also significantly increases its accuracy.

2025

One of the main themes supporting text mining is text representation, i.e., looking for the appropriate terms to transfer the documents into numerical vectors. Recently, many efforts have been invested on this topic to enrich text... more

One of the main themes supporting text mining is text representation, i.e., looking for the appropriate terms to transfer the documents into numerical vectors. Recently, many efforts have been invested on this topic to enrich text representation using vector space model (VSM) to improve the performances of text mining techniques such as classification, clustering, etc. The main concern of this paper is to investigate the effectiveness of using multi-words for text representation on the performances of classification. Firstly, a practical method is proposed to implement the multi-word extraction from documents based on the syntactical structure. Secondly, two strategies as general concept representation and subtopic representation are presented to represent the documents using the extracted multi-words. Especially, the dynamic k-mismatch is proposed to determine the presence of a long multi-word which is a subtopic of the content of a document. Finally, we carried out a series of exp...

2025

We carried out a series of experiments on text classification using multi-word features. A hand-crafted method was proposed to extract the multi-words from text data set and two different strategies were developed to normalize the... more

We carried out a series of experiments on text classification using multi-word features. A hand-crafted method was proposed to extract the multi-words from text data set and two different strategies were developed to normalize the multi-words into two different versions of multi-word features. After the texts were represented respectively using these two different multi-word features, text classification was conducted in contrast to examine the effectiveness of these two strategies. Also the linear and nonlinear polynomial kernel of support vector machine (SVM) was compared on the performance of text classification task.

2025

In many applications, Unmanned Aerial Vehicles (UAVs) provide an indispensable platform for gathering information about the situation on the ground. However, to maximise information gained about the environment, such platforms require... more

In many applications, Unmanned Aerial Vehicles (UAVs) provide an indispensable platform for gathering information about the situation on the ground. However, to maximise information gained about the environment, such platforms require increased autonomy to coordinate the actions of multiple UAVs. This has led to the development of flight planning and coordination algorithms designed to maximise information gain during sensing missions. However, these have so far neglected the need to maintain wireless network connectivity. In this paper, we address this limitation by enhancing an existing multi-UAV planning algorithm with two new features that together make a significant contribution to the state-of-theart: (1) we incorporate an on-line learning procedure that enables UAVs to adapt to the radio propagation characteristics of their environment, and (2) we integrate flight path and network routing decisions, so that modelling uncertainty and the affect of UAV position on network performance is taken into account.

2025, Proceedings on Intelligent Systems and Knowledge Engineering (ISKE2007)

The irrationality of constructing weights, lending to subjectivity and without considering the redundancy of attributes, exists in traditional management decision-making. The σimportant rating and ξimportant rating are first proposed by... more

The irrationality of constructing weights, lending to subjectivity and without considering the redundancy of attributes, exists in traditional management decision-making. The σimportant rating and ξimportant rating are first proposed by the attribute reduct in rough sets theory. Then, the gimportant rating is given by the information gain in information entropy. An approach for acquiring attribute weights employing the three important ratings is presented to solve the problem of subjectivity and redundancy. An empirical case study validates the rationality and validity of our method.

2025, IEEE Access

Feature selection (FS) is one of the important tasks of data preprocessing in data analytics. The data with a large number of features will affect the computational complexity, increase a huge amount of resource usage and time consumption... more

Feature selection (FS) is one of the important tasks of data preprocessing in data analytics. The data with a large number of features will affect the computational complexity, increase a huge amount of resource usage and time consumption for data analytics. The objective of this study is to analyze relevant and significant features of huge network traffic to be used to improve the accuracy of traffic anomaly detection and to decrease its execution time. Information Gain is the most feature selection technique used in Intrusion Detection System (IDS) research. This study uses Information Gain, ranking and grouping the features according to the minimum weight values to select relevant and significant features, and then implements Random Forest (RF), Bayes Net (BN), Random Tree (RT), Naive Bayes (NB) and J48 classifier algorithms in experiments on CICIDS-2017 dataset. The experiment results show that the number of relevant and significant features yielded by Information Gain affects significantly the improvement of detection accuracy and execution time. Specifically, the Random Forest algorithm has the highest accuracy of 99.86% using the relevant selected features of 22, whereas the J48 classifier algorithm provides an accuracy of 99.87% using 52 relevant selected features with longer execution time.

2025, Poljoprivreda

were not significantly affected by agricultural practices in maize, and population of these parasitoids was low significantly affect ECB feeding activity.

2025

The aim of this study was to determine the natural infestation of European corn borer (ECB) eggs by Trichogramma wasps (Hymenoptera: Trichogrammatidae) under field conditions. The experiment was set up in Osijek, Croatia in 2013. The... more

The aim of this study was to determine the natural infestation of European corn borer (ECB) eggs by Trichogramma wasps (Hymenoptera: Trichogrammatidae) under field conditions. The experiment was set up in Osijek, Croatia in 2013. The experiment included two levels of irrigations, two nitrogen rates and two maize genotypes. Parameters of ECB feeding activity and maize tolerance (cob mass, tunnel length, number of ECB larvae per plant), as well as number of parasitized ECB eggs by Trihogramma wasps were evaluated. Genotypes were significantly different in terms of tolerance to ECB injury. In treatments with nitrogen fertilization, ECB feeding activity was increased at both nitrogen rates. Agricultural practices did not significantly affect parasitism of ECB eggs by Trichogramma. Correlation between parameters of ECB feeding activity and parasitism by Trichogramma was slight to moderate and not significant. Natural occurrence of Trichogramma wasps were not significantly affected by agr...

2025, Climate

Ensembles of general circulation model (GCM) integrations yield predictions for meteorological conditions in future months. Such predictions have implicit uncertainty resulting from model structure, parameter uncertainty, and fundamental... more

Ensembles of general circulation model (GCM) integrations yield predictions for meteorological conditions in future months. Such predictions have implicit uncertainty resulting from model structure, parameter uncertainty, and fundamental randomness in the physical system. In this work, we build probabilistic models for long-term forecasts that include the GCM ensemble values as inputs but incorporate statistical correction of GCM biases and different treatments of uncertainty. Specifically, we present, and evaluate against observations, several versions of a probabilistic forecast for gridded air temperature 1 month ahead based on ensemble members of the National Centers for Environmental Prediction (NCEP) Climate Forecast System Version 2 (CFSv2). We compare the forecast performance against a baseline climatology based probabilistic forecast, using average information gain as a skill metric. We find that the error in the CFSv2 output is better represented by the climatological variance than by the distribution of ensemble members because the GCM ensemble sometimes suffers from unrealistically little dispersion. Lack of ensemble spread leads a probabilistic forecast whose variance is based on the ensemble dispersion alone to underperform relative to a baseline probabilistic forecast based only on climatology, even when the ensemble mean is corrected for bias. We also show that a combined regression based model that includes climatology, temperature from recent months, trend, and the GCM ensemble mean yields a probabilistic forecast that outperforms approaches using only past observations or GCM outputs. Improvements in predictive skill from the combined probabilistic forecast vary spatially, with larger gains seen in traditionally hard to predict regions such as the Arctic.

2025

The Emigrant Slimhole Drilling Project ("ESDP") was a highly successful, phased resource evaluation program designed to evaluate the commercial geothermal potential of the eastern margin of the northern Fish Lake Valley pull-apart basin... more

The Emigrant Slimhole Drilling Project ("ESDP") was a highly successful, phased resource evaluation program designed to evaluate the commercial geothermal potential of the eastern margin of the northern Fish Lake Valley pull-apart basin in west-central Nevada. The program involved three phases: (1) Resource evaluation; (2) Drilling and resource characterization; and (3) Resource testing and assessment. Efforts included detailed geologic mapping; 3-D modeling; compilation of a GIS database; and production of a conceptual geologic model followed by the successful drilling of the 2,938 foot deep 17-31 slimhole (core hole), which encountered commercial geothermal temperatures (327⁰ F) and exhibits an increasing, conductive, temperature gradient to total depth; completion of a short injection test; and compilation of a detailed geologic core log and revised geologic cross-sections. Results of the project greatly increased the understanding of the geologic model controlling the Emigrant geothermal resource. Information gained from the 17-31 core hole revealed the existence of commercial temperatures beneath the area in the Silver Peak Core Complex which is composed of formations that exhibit excellent reservoir characteristics. Knowledge gained from the ESDP may lead to the development of a new commercial geothermal field in Nevada. Completion of the 17-31 core hole also demonstrated the cost-effectiveness of deep core drilling as an exploration tool and the unequaled value of core in understanding the geology, mineralogy, evolutional history and structural aspects of a geothermal resource. The proposed goals and objectives of the ESDP, which was proposed in 2004, were for the most part met or exceeded. Project objectives were laid out in three tasks. This phase of the project was designed to involve: (1) Assembly and review of relevant published and proprietary literature and previous geothermal investigations in the region; (2) detailed geologic mapping (1 : 4,000 scale) of the Emigrant Miocene sedimentary basin and surrounding Paleozoic basements, with the aid of state-of-the-are remote-sensing technology; (3) analysis of 32 lithologic logs from U.S. Borax Chemical Corporation ("U.S. Borax") and Amax Exploration, Inc. ("Amax") mineral exploration and thermal-gradient drill holes; (4) synthesis of geologic mapping results and lithologic logs for 3-D geologic characterization of the prospect area; (5) themal anomaly mapping using remotely sensed thermal infrared data; (6) compilation of relevant data from the foregoing sub-activities into a Geologic Information Systems (GIS) database for use in knowledge-based modeling ; and (7) development of a refined conceptual geologic model to guide the site selection of the Emigrant core hole (Hulen,

2025, Satyanarayana Ballamudi

Image mining, an essential process in many industrial image applications, has demonstrated significant utility in fields such as medical diagnostics, agriculture, industrial operations, space research, and education. This process involves... more

Image mining, an essential process in many industrial image applications, has demonstrated significant utility in fields such as medical diagnostics, agriculture, industrial operations, space research, and education. This process involves extracting both information and image segments, but these tasks are often conducted independently, resulting in different workflows. This paper proposes an approach that integrates feature extraction and object recognition, leading to improved object identification. We introduce a novel method that improves recognition accuracy by increasing the percentage of optimal features. The ORB algorithm, known for its speed, is used in the initial pass, while the SURF algorithm is used as a secondary confirmation step for unrecognized objects. This approach supports the simultaneous processing of many images, which makes it suitable for large-scale applications such as image repositories in social media and expands the scope of research. This refined version maintains the core elements, while making the structure a little more fluid and coherent.

2025, WIT Transactions on Information and Communication Technologies

Text categorization is the task of classifying natural language documents into a set of predefine categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped... more

Text categorization is the task of classifying natural language documents into a set of predefine categories. Documents are typically represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training classifiers on large collections of documents, both the time and memory requirements connected with processing of these vectors may be prohibitive. This calls for using a feature selection method, not only to reduce the number of features but also to increase the sparsity of document vectors. We propose a feature selection method based on linear Support Vector Machines (SVMS). First, we train the linear SVM on a subset of training data and retain only those features that correspond to highly weighted components (in absolute value sense) of the normal to the resulting hyperplane that separates positive and negative examples. This reduced feature space is then used to train a classifier over a larger training set because more documents now fit into the same amount of memory. In our experiments we compare the effectiveness of the SVM-based feature selection with that of more traditional feature selection methods, such as odds ratio and information gain, in achieving the desired tradeoff between the vector sparsity and the classification performance. Experimental results indicate that, at the same level of vector sparsity, feature selection based on SVM normals yields better classification performance than odds ratio-or information gainbased feature selection when linear SVM classifiers are used.

2025, Biosystems

Standard" information theory says nothing about the semantic content of information. Nevertheless, applications such as evolutionary theory demand consideration of precisely this aspect of information, a need that has motivated a largely... more

Standard" information theory says nothing about the semantic content of information. Nevertheless, applications such as evolutionary theory demand consideration of precisely this aspect of information, a need that has motivated a largely unsuccessful search for a suitable measure of an "amount of meaning". This paper represents an attempt to move beyond this impasse, based on the observation that the meaning of a message can only be understood relative to its receiver. Positing that the semantic value of information is its usefulness in making an informed decision, we define pragmatic information as the information gain in the probability distributions of the receiver's actions, both before and after receipt of a message in some pre-defined ensemble. We then prove rigorously that our definition is the only one that satisfies obvious desiderata, such as the additivity of information from logically independent messages. This definition, when applied to the information "learned" by the time evolution of a process, defies the intuitions of the few previous researchers thinking along these lines by being monotonic in the uncertainty that remains after receipt of the message, but non-monotonic in the Shannon entropy of the input ensemble. It follows that the pragmatic information of the genetic "messages" in an evolving population is a global Lyapunov function for Eigen's quasi-species model of biological evolution. A concluding section argues that a theory such as ours must explicitly acknowledge purposeful action, or "agency", in such diverse fields as evolutionary theory and finance.

2025, Jurnal Teknik Informatika (Jutif)

One of the most widely used data classification methods is the K-Nearest Neighbor (K-NN) algorithm. Classification of data in this method is carried out based on the calculation of the closest distance to the training data as much as the... more

One of the most widely used data classification methods is the K-Nearest Neighbor (K-NN) algorithm. Classification of data in this method is carried out based on the calculation of the closest distance to the training data as much as the value of K from its neighbors. Then the new data class is determined using the most votes system from the number of K nearest neighbors. However, the performance of this method is still lower than other data classification methods. The cause is the use of the most voting system in determining new data classes and the influence of features less relevant to the dataset. This study compares several feature selection methods in the data set to see their effects on the performance of the K-NN algorithm in data classification. The feature selection methods in this research are Information gain, Gain ratio, and Gini index. The method was tested on the Water Quality dataset from the Kaggle Repository to see the most optimal feature selection method. The tes...

2025, Geoderma

The scale at which a soil landscape (soilscape) is viewed has a significant impact on soil pattern and interpretations made from those patterns. Recently deglaciated soilscapes are particularly spatially complex. In order to understand... more

The scale at which a soil landscape (soilscape) is viewed has a significant impact on soil pattern and interpretations made from those patterns. Recently deglaciated soilscapes are particularly spatially complex. In order to understand how scale impacts pattern on complex soilscapes, we used a GIS to examine soil maps for 13 counties in the northern United States, all affected by Late Wisconsinan glaciation. We used an Arck macro language script to change the map scale and, when the change was to a smaller scale, group/dissolve soil map units based on similarities to a prescribed list of neighboring map unit characteristics. Similarity criteria included drainage class, taxonomic great group, parent material and slope. Soilscape complexity was measured at nine different scales and is based on various pattern metrics: number of punctate soil units km À 2 , map unit polygons km À 2 , map unit boundary length km À 2 , and boundary length polygon À 1 km À 2 . Soilscape complexity as a function of scale was then examined by regressing pattern metric data against the size of the minimum map unit for each of the nine scales. Extrapolation of the regression lines to 1:10,000 (a scale larger than is typically mapped) illustrated how much additional information might accrue if these counties were to be mapped at that larger scale. In most cases, 2 -10 times more map units would have been recognized and delineated at the two times larger map scale, but map unit boundary lengths would have increased by only about 1.5 times. Whether this additional information is of such a magnitude that it could justify remapping some of these complex landscapes at larger scales is an economic decision; our study provides much needed data on the magnitude of information gained by mapping soilscapes at larger scales.

2025, The Journal of Neuroscience

Single unit responses were recorded in the medial geniculate body (MGB) of anesthetized cats. In response to acoustical stimulation the properties of response latency, discharge pattern, frequency tuning, binaural interaction, and... more

Single unit responses were recorded in the medial geniculate body (MGB) of anesthetized cats. In response to acoustical stimulation the properties of response latency, discharge pattern, frequency tuning, binaural interaction, and habituation were examined to allow an appraisal of the differentiation of the MGB by electrophysiological means. It is found that definite boundaries can be determined at which there is a distinct change in response properties; the position of these “physiological boundaries” seems to correspond with the boundaries between the seven subnuclei of the MGB described by Morest (Morest, D. K. (1964) J. Anat. 98: 611–630) in Golgi-stained material. Using these physiological boundaries to determine unit locations, population comparisons are made allowing the description of each subnucleus in terms of its auditory response properties. It is suggested that these properties, together with the limited information gained from Nissl cytoarchitecture, are sufficient to ...

2025

Orthodox Quantum Mechanics (OQM) is wrong for living matter. OQM is to Post-Quantum Mechanics (PQM) as flat Euclidean geometry is to curved Non-Euclidean Geometry or as Special Relativity is to General Relativity. Special Relativity... more

Orthodox Quantum Mechanics (OQM) is wrong for living matter. OQM is to Post-Quantum Mechanics (PQM) as flat Euclidean geometry is to curved Non-Euclidean Geometry or as Special Relativity is to General Relativity.
Special Relativity cannot explain Gravity.
Quantum Mechanics cannot explain Consciousness.

2025

In different application domains as well as areas of research text classification is one of the well studied problems. So there is need to enhance the effective and efficient algorithm for text classification .There are many algorithm... more

In different application domains as well as areas of research text classification is one of the well studied problems. So there is need to enhance the effective and efficient algorithm for text classification .There are many algorithm presented by different authors over the successfully and accurate text classification by different researchers. Each algorithm presented are specific to applications or some other domains of research. Some techniques presented are based on data mining and machine learning domains. The main aim of this paper is to summarize the different types of algorithm presented for text classification. In this paper we have presented the key components for text classification which will be helpful for researcher to understand the existing techniques of text classification. First we will give the overview of why there is need for feature reduction and different technique for feature selection, then the key components of text classification system. Later we will disc...

2025, OCEANS 2000 MTS/IEEE Conference and Exhibition. Conference Proceedings (Cat. No.00CH37158)

We address the problem of surveying of oceanic parameters using autonomous instrumented mobile platforms. As an example, we consider the problem of current mapping in coastal areas. We study the impact on survey efficiency of using a... more

We address the problem of surveying of oceanic parameters using autonomous instrumented mobile platforms. As an example, we consider the problem of current mapping in coastal areas. We study the impact on survey efficiency of using a priori knowledge concerning the surveyed field for on-line guidance of the sensors, as an alternative to the classical approach of executing a predefined trajectory, or to the more recently proposed perception-driven observation strategies. Availability of this a priori model enables extrapolation of the measurements, as well as the determination of the information yield by future observations, allowing the search for the best next observation point. In the paper, we present simulation results of the proposed on-line guidance based on information gain, and compare its efficiency to standard survey strategies.

2025, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Efficient exploration of unknown terrains by extraterrestrial rovers requires the development of strategies that reduce the entropy in the geological classification of a given terrain. Without such intelligent strategies, teleoperation of... more

Efficient exploration of unknown terrains by extraterrestrial rovers requires the development of strategies that reduce the entropy in the geological classification of a given terrain. Without such intelligent strategies, teleoperation of the rover is reliant either on human intuition or on the exhaustive exploration of the entire terrain. This paper highlights the use of low-resolution reconnaissance using satellite imagery to generate plans for rovers that reduce the overall uncertainty in the various geological classes. This becomes pivotal when exploration to collect diverse samples is resource constrained through exploration budgets and transmission bandwidths. We put forward two major contributions-a science-aware planner that uses information gain and a novel method of estimating this information gain. We propose an exploration strategy, based on the Multi-Heuristic A*, to solve the tradeoff between optimizing path lengths and geological exploration through Pareto-optimal solutions. We show that our algorithm, which explicitly uses projected entropy-reduction in planning, significantly outperforms science-agnostic approaches and other science-aware strategies like greedy best-first searches. We further propose a feature-space based entropy formulation in contrast to the frequently used differential entropy formulation and show superior results when reconstructing the unsampled data from the set of sampled points.

2025, International Journal of Advanced Research in Artificial Intelligence

Huge number of documents are increasing rapidly, therefore, to organize it in digitized form text categorization becomes an challenging issue. A major issue for text categorization is its large number of features. Most of the features are... more

Huge number of documents are increasing rapidly, therefore, to organize it in digitized form text categorization becomes an challenging issue. A major issue for text categorization is its large number of features. Most of the features are noisy, irrelevant and redundant, which may mislead the classifier. Hence, it is most important to reduce dimensionality of data to get smaller subset and provide the most gain in information. Feature selection techniques reduce the dimensionality of feature space. It also improves the overall accuracy and performance. Hence, to overcome the issues of text categorization feature selection is considered as an efficient technique. Therefore, we proposed a multistage feature selection model to improve the overall accuracy and performance of classification. In the first stage document preprocessing part is performed. Secondly, each term within the documents are ranked according to their importance for classification using the information gain. Thirdly rough set technique is applied to the terms which are ranked importantly and feature reduction is carried out. Finally a document classification is performed on the core features using Naive Bayes and KNN classifier. Experiments are carried out on three UCI datasets, Reuters 21578, Classic 04 and Newsgroup 20. Results show the better accuracy and performance of the proposed model.

2025, Journal of Experimental and Theoretical Physics

Quantum cryptography (secure key distribution) systems must include procedures for correcting errors in the raw key transmitted over a quantum communication channel. Several reconciliation protocols are discussed and compared in terms of... more

Quantum cryptography (secure key distribution) systems must include procedures for correcting errors in the raw key transmitted over a quantum communication channel. Several reconciliation protocols are discussed and compared in terms of efficiency.

2025, ArXiv

In this paper, we propose an optimal active perception method for recognizing multimodal object categori es. Multimodal categorization methods enable a robot to form several multimodal categories through interaction with daily objects... more

In this paper, we propose an optimal active perception method for recognizing multimodal object categori es. Multimodal categorization methods enable a robot to form several multimodal categories through interaction with daily objects autonomously. In most settings, the robot has to obtain all o f the modality information when it attempts to recognize a new target object. However, even though a robot obtains visual informa tion at a distance, it cannot obtain haptic and auditory information without taking action on the object. The robot has to determi ne its next action to obtain information about the object to recognize it. We propose an action selection method for multimodal obj ect category recognition on the basis of the multimodal hierarchical Dirichlet process (MHDP) and information gain criterion. We also prove its optimality from the viewpoint of the Kullback– Leibler divergence between a final recognition state and a cu rrent recognition state. In addition, we show that the...

2025, Journal of Colloid and Interface Science

(1,2-benzenedicarboxylate) at the water-metal (hydr)oxide interface. Previously published infrared spectroscopic, potentiometric, and adsorption data characterizing the boehmite (␥-AlOOH) system are compared with new data collected for... more

(1,2-benzenedicarboxylate) at the water-metal (hydr)oxide interface. Previously published infrared spectroscopic, potentiometric, and adsorption data characterizing the boehmite (␥-AlOOH) system are compared with new data collected for o-phthalate adsorption on aged ␥-Al 2 O 3 and goethite (␣-FeOOH). The study focuses on identifying bonding mechanisms, stoichiometries, and stabilities of the formed complexes, and comparing these among the three systems. Furthermore, the effects of ionic strength and composition of the ionic medium are investigated. The infrared spectroscopic data provided direct, molecular-level evidence for the existence of two dominating surface complexes on all three solids. One was shown to be a deprotonated outer-sphere species and the other was an inner-sphere surface complex. The inner-sphere complexes on the three solids were structurally related, and they were tentatively assigned to a mononuclear, chelating structure involving both carboxylate groups. The outer-sphere complexes were shown to increase in relative importance at high pH and low ionic strengths, while low pH and high ionic strengths favored the inner-sphere complexes. The information gained from the infrared spectroscopic investigations was used as qualitative input in the formulation of the surface complexation models. New models, based on the extended constant capacitance approach, were presented for the o-phthalate/aged ␥-Al 2 O 3 and o-phthalate/goethite systems.

2025

MetaSearch is utilizing multiple other search systems to perform simultaneous search. A MetaSearch Engine (MSE) is a search system that enables MetaSearch. To perform a MetaSearch, user query is sent to multiple search engines; once the... more

MetaSearch is utilizing multiple other search systems to perform simultaneous search. A MetaSearch Engine (MSE) is a search system that enables MetaSearch. To perform a MetaSearch, user query is sent to multiple search engines; once the search results returned, they are received by the MSE, then merged into a single ranked list and the ranked list is presented to the user. When a query is submitted to a MSE, decisions are made with respect to the underlying search engines to be used, what modifications will be made to the query and how to score the results. These decisions are typically made by considering only the user’s keyword query, neglecting the larger information need. The cornerstone of their technology is their rank aggregation method. In other words, Result merging is a key component in a MSE. The effectiveness of a MSE is closely related to the result merging algorithm it employs. In this paper, we want to investigate a variety of result merging methods based on a wide ra...

2025, Physical Oceanography

This paper discusses a mathematical model for the dynamics of pollution in the Arctic basin, considering spatial non-uniformities in the distribution of ecological and hydrodynamic parameters. The model features blocks simulating... more

This paper discusses a mathematical model for the dynamics of pollution in the Arctic basin, considering spatial non-uniformities in the distribution of ecological and hydrodynamic parameters. The model features blocks simulating contaminant fluxes through the tropic chains. The results of numerical experiments are provided which demonstrate the model's capability to predict and assess the dynamics of heavy metals, radionuclides, and petroleum hydrocarbons in the Arctic basin. The model is adaptable to a global model. Climatic phenomena and anthropogenic forcing are described in the form of scenarios.

2025, IEEE Transactions on Neural Systems and Rehabilitation Engineering

Conductive elastomers are a novel strain sensing technology which can be unobtrusively embedded into a garment's fabric, allowing a new type of sensorized cloths for motion analysis. A possible application for this technology is remote... more

Conductive elastomers are a novel strain sensing technology which can be unobtrusively embedded into a garment's fabric, allowing a new type of sensorized cloths for motion analysis. A possible application for this technology is remote monitoring and control of motor rehabilitation exercises. The present work describes a sensorized shirt for upper limb posture recognition. Supervised learning techniques have been employed to compare classification models for the analysis of strains, simultaneously measured at multiple points of the shirt. The instantaneous position of the limb was classified into a finite set of predefined postures, and the movement was decomposed in an ordered sequence of discrete states. The amount of information given by the observation of each sensor during the execution of a specific exercise was quantitatively estimated by computing the information gain for each sensor, which in turn allows the data-driven optimization of the garment. Real-time feedback on exercise progress can also be provided by reconstructing the sequence of consecutive positions assumed by the limb.

2025

Repnnted from the preptint volume 04 SEVENTH CONFERENCE ON SATELLITE METEOROLOGY AND OCEANOGF_PHY.

2025

Since it first appeared, there has been much research and critical discussion on the theory of optimal data selection as an explanation of ) selection task . In this paper, this literature is reviewed, and the theory of optimal data... more

Since it first appeared, there has been much research and critical discussion on the theory of optimal data selection as an explanation of ) selection task . In this paper, this literature is reviewed, and the theory of optimal data selection is reevaluated in its light. The information gain model is first located in the current theoretical debate in the psychology of reasoning concerning dual processes in human reasoning. A model comparison exercise is then presented that compares a revised version of the model with its theoretical competitors. Tests of the novel predictions of the model are then reviewed. This section also reviews experiments claimed not to be consistent with optimal data selection. Finally, theoretical criticisms of optimal data selection are discussed. It is argued either that the revised model accounts for them or that they do not stand up under analysis. It is concluded that some version of the optimal data selection model still provides the best account of the selection task. Consequently, the conclusion of Oaksford and Chater's (1994) original rational analysis , that people's hypothesis-testing behavior on this task is rational and well adapted to the environment, still stands. M M ( ) ( , ) ( , |

2025, IEEE Transactions on Systems, Man, and Cybernetics

2025, Maedica

Despite the fact that melanoma is an easy approachable tumor for diagnosis, the incidence of this skin cancer is still increasing. Histopathological assessment of melanocytic tumors is the gold standard in melanoma diagnosis and... more

Despite the fact that melanoma is an easy approachable tumor for diagnosis, the incidence of this skin cancer is still increasing. Histopathological assessment of melanocytic tumors is the gold standard in melanoma diagnosis and represents a problematic aspect of dermatology and pathology. Over the past decades many efforts have been made in determining histological characteristics influencing the prognosis and survival of patients with clinically localized primary melanoma. Some of these parameters also proved to be essential for tumor staging and choosing adequate clinical management. We present a retrospective study of 21 melanoma cases with histopathological errors or incomplete path reports, with the intention to raise awareness about the importance of an accurate diagnosis for the management of these cases and for patient prognosis. We retrospectively reviewed data from pathology reports and discharge medical records from 21 patients diagnosed with melanoma between 2006 and 20...

2025, Journal of Molecular Biology

We developed an algorithm to analyze the distribution and geometry of Department of Molecular simple and complex salt bridges in 94 proteins selected from the Protein Biology and Biotechnology Data Bank. In this study, the term ''salt... more

We developed an algorithm to analyze the distribution and geometry of Department of Molecular simple and complex salt bridges in 94 proteins selected from the Protein Biology and Biotechnology Data Bank. In this study, the term ''salt bridging'' denotes both non-bonded

2025

The purpose of this study is to assess how well college students use Google Classroom as a useful and informative teaching and learning tool. The survey method was utilized in the study to measure student involvement in Google Classroom.... more

The purpose of this study is to assess how well college students use Google Classroom as a useful and informative teaching and learning tool. The survey method was utilized in the study to measure student involvement in Google Classroom. This study's sample population included 292 college students from Northern Negros State College of Science and Technology. Algorithms such as Random Forest (RF), C4.5, and Naive Bayes (NB) were utilized with three of the most crucial techniques, such as 60% split, training set, and 8-fold cross-validation, for performing analysis on the student data. After analyzing different metrics for performance (Correctly Classified Instances, FP Rate, ROC Area, F-Measure, TP Rate, Recall, Precision, Time taken to build model, Mean Absolute Error, Root Mean Squared Error, Root Relative Squared Error, Relative Absolute Error) by various algorithms for data mining, the researchers determined which algorithm performs better than others on the student dataset gathered, allowing the researchers to make a recommendation for future improvement in students' Google Classroom engagement.

2025

An important part of the interpretation of a decision process lies on the ascertainment of the in uence of the input features, that is, of how much the implemented model relies on a given input feature to perform the desired task.... more

An important part of the interpretation of a decision process lies on the ascertainment of the in uence of the input features, that is, of how much the implemented model relies on a given input feature to perform the desired task. Recently data analysis techniques based on fuzzy logic have gained attention because of their interpretability. Many real-world applications, however, have very high dimensionality and require very complex decision borders. In this case the numbe rof fuzzy rules can proliferate and the easy interpretability of the fuzzy model can progressively disappear. A method is presented that quanti es the discriminative po wer of the input features in a fuzzy model. The proposed quanti cation helps the interpretation of fuzzy models constructed on high dimensional and very fragmented training sets. First, a measure of the information contained in the fuzzy model is de ned on the basis of its fuzzy rules. The classi cation is then performed along one of the input features, that is, the fuzzy rules are split according to that feature's linguistic values. For each linguistic value, a fuzzy sub-model is generated from the original fuzzy model. The average information contained in these fuzzy sub-models is measured and its relative comparison with the information measure of the original fuzzy model quanti es the information gain that derives from the classi cation performed on the selected input feature. This information gain characterizes the discriminative po wer of that input feature. Therefore, the proposed information gain can beused to obtain better insights into the selected fuzzy classi cation strategy, even in very high dimensional cases, and possibly to reduce the input dimension. Several arti cial and real-world data analysis are reported as examples, in order to illustrate the characteristics and potentialities of the proposed algorithm. As real-world examples, the most informative electrocardiographic measures are detected for an arrhythmia classi cation problem and the role of duration, amplitude and pitch variations of syllabic nuclei in American English spoken sentences is investigated for prosodic stress classi cation.

2025, Türk Tarım ve Doğa Bilimleri Dergisi

The biggest problem of the use of resynthesised rapeseed forms in quality breeding is their high glucosinolate content arising from the same character originating from the B. oleracea parent. Glucosinolates are sulphur-and nitrogen... more

The biggest problem of the use of resynthesised rapeseed forms in quality breeding is their high glucosinolate content arising from the same character originating from the B. oleracea parent. Glucosinolates are sulphur-and nitrogen containing plant secondary matabolites common in the Brassicaceae and related plant families. The hydrolyzed products of glucosinolates, namely, isothiocyanates and other sulphur-containing compounds, were shown to interfere with the uptake of iodine by the thyroid gland, contribute to liver disease, and reduce growth and weight gain in animals. Consequently, plant breeders realized that if rapeseed (Brassica napus L.) meal was to be used in animal feed, the glucosinolate content had to be reduced. Up to now, interspecific rapeseed (Brassica napus L.) hybrids displaying low erucic acid quality were developed. But their glucosinolate content are high because of the B. oleracea parent. To introduce canola quality in RS-lines crosses with adapted material and subsequent backcrosses to resynthesized material are required, followed by recurrent selection for agronomic performance. A second approach should be the reduction of the glucosinolate content of the B. oleracea parent. Possible methods may be the irradiation of B. oleracea seeds or interspecific hybridization of B. oleracea with related Brassica species,because the selection of cabbage genotypes with low glucosinolate content may be the longer and deficienter way. Another method should be the cultivation of the low erucic acid genotypes in vitro since tissue culture cause as well known somaclonal variation, which may led to the breakdown of the high glucosinolate level.

2025

Kolza kalite islahinda turler arasi melez formlarin kullanilmasindaki en buyuk problem B. olearacea ebeveyneinden gelen yuksek orandaki glukosinolat ozelligidir. Glikozinolatlar Brassicaceae ve akraba bitki familyalarindan yaygin olarak... more

Kolza kalite islahinda turler arasi melez formlarin kullanilmasindaki en buyuk problem B. olearacea ebeveyneinden gelen yuksek orandaki glukosinolat ozelligidir. Glikozinolatlar Brassicaceae ve akraba bitki familyalarindan yaygin olarak bulunan, sulfur ve azot iceren sekonder metabolitlerdir. Glikosinolatlarin parcalanma urunleri olan isotiyosiyanatlar ve diger sulfur iceren bilesiklerin tiroid bezi vasitasiyla iyot alimi etkiledigi ortaya konmustur, bu da karaciger hastaligina katki yapar ve hayvanlarda canli agirlik kaybina sebebiyet vermektedir. Sonuc olarak, hayvan yemi icersinde kolza (Brassica napus L.) kuspesi kullanilacaksa, glikosinolat oraninin dusurulmesi gerekmektedir. Şimdiye kadar, bitkisel yag kalitesine sahip turler arasi melez kolza (Brassica napus L.) formlari gelistirilmistir. Fakat glikosinolat oranlari B. oleracea ebeveyninden dolayi yuksektir. Turler arasi melz kolza hatlarina kanola kaliteini aktarabilmek icin once adapte edilmis kolza materyali ile melezleme ...

2025

Abstract: The objective of this study was to examine the feasibility of applying the State and Territorial Injury Prevention Directors Association (STIPDA) consensus recommendations for using hospital discharge data in injury and adverse... more

Abstract: The objective of this study was to examine the feasibility of applying the State and Territorial Injury Prevention Directors Association (STIPDA) consensus recommendations for using hospital discharge data in injury and adverse event surveillance to the Veterans ...

2025, Feature Extraction

We report on our approach, CBAmethod3E, which was submitted to the NIPS 2003 Feature Selection Challenge on Dec. 8, 2003. Our approach consists of combining filtering techniques for variable selection, information gain and feature... more

We report on our approach, CBAmethod3E, which was submitted to the NIPS 2003 Feature Selection Challenge on Dec. 8, 2003. Our approach consists of combining filtering techniques for variable selection, information gain and feature correlation, with Support Vector Machines for induction. We ranked 13th overall and ranked 6th as a group. It is worth pointing out that our feature selection method was very successful in selecting the second smallest set of features among the top-20 submissions, and in identifying almost all probes in the datasets, resulting in the challenge's best performance on the latter benchmark.

2025

The central challenge with computer security is determining the difference between normal and potentially harmful activity. A promising solution is emerging in the form of Artificial Immune Systems (AIS). These include the theories... more

The central challenge with computer security is determining the difference between normal and potentially harmful activity. A promising solution is emerging in the form of Artificial Immune Systems (AIS). These include the theories regarding how the immune system responds to pathogenic material. This paper takes relatively new theory: the Danger theory and Dendritic cells, and explores the relevance of those to the application domain of security and evaluating on the Kdd’99 data.

2025, Lecture Notes in Computer Science

In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, χ 2 statistic, and Odds... more

In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, χ 2 statistic, and Odds Ratio. From the literature there are three term ranking methods to summarize term weights of different categories for multi-class text categorization. They are Summation, Average, and Maximum methods. In this paper we present a new term ranking method to summarize term weights, i.e. Maximum Gap. Using two different methods of information gain and χ 2 statistic, we setup controlled experiments for different term ranking methods. Reuter-21578 text corpus is used as the dataset. Two popular classification algorithms SVM and Boostexter are adopted to evaluate the performance of different term ranking methods. Experimental results show that the new term ranking method performs better.

2025

Scaling from hundreds to millions of objects is the next challenge in visual recognition. We investigate and benchmark the scalability properties (memory requirements, runtime, recognition performance) of the state-of-the-art object... more

Scaling from hundreds to millions of objects is the next challenge in visual recognition. We investigate and benchmark the scalability properties (memory requirements, runtime, recognition performance) of the state-of-the-art object recognition techniques: the forest of k-d trees, the locality sensitive hashing (LSH) method, and the approximate clustering procedure with the tf-idf inverted index. The characterization of the images was performed with SIFT features. We conduct experiments on two new datasets of more than 100,000 images each, and quantify the performance using artificial and natural deformations. We analyze the results and point out the pitfalls of each of the compared methodologies suggesting potential new research avenues for the field.

2025, Climate

Ensembles of general circulation model (GCM) integrations yield predictions for meteorological conditions in future months. Such predictions have implicit uncertainty resulting from model structure, parameter uncertainty, and fundamental... more

Ensembles of general circulation model (GCM) integrations yield predictions for meteorological conditions in future months. Such predictions have implicit uncertainty resulting from model structure, parameter uncertainty, and fundamental randomness in the physical system. In this work, we build probabilistic models for long-term forecasts that include the GCM ensemble values as inputs but incorporate statistical correction of GCM biases and different treatments of uncertainty. Specifically, we present, and evaluate against observations, several versions of a probabilistic forecast for gridded air temperature 1 month ahead based on ensemble members of the National Centers for Environmental Prediction (NCEP) Climate Forecast System Version 2 (CFSv2). We compare the forecast performance against a baseline climatology based probabilistic forecast, using average information gain as a skill metric. We find that the error in the CFSv2 output is better represented by the climatological variance than by the distribution of ensemble members because the GCM ensemble sometimes suffers from unrealistically little dispersion. Lack of ensemble spread leads a probabilistic forecast whose variance is based on the ensemble dispersion alone to underperform relative to a baseline probabilistic forecast based only on climatology, even when the ensemble mean is corrected for bias. We also show that a combined regression based model that includes climatology, temperature from recent months, trend, and the GCM ensemble mean yields a probabilistic forecast that outperforms approaches using only past observations or GCM outputs. Improvements in predictive skill from the combined probabilistic forecast vary spatially, with larger gains seen in traditionally hard to predict regions such as the Arctic.

2025, Climate

Ensembles of general circulation model (GCM) integrations yield predictions for meteorological conditions in future months. Such predictions have implicit uncertainty resulting from model structure, parameter uncertainty, and fundamental... more

Ensembles of general circulation model (GCM) integrations yield predictions for meteorological conditions in future months. Such predictions have implicit uncertainty resulting from model structure, parameter uncertainty, and fundamental randomness in the physical system. In this work, we build probabilistic models for long-term forecasts that include the GCM ensemble values as inputs but incorporate statistical correction of GCM biases and different treatments of uncertainty. Specifically, we present, and evaluate against observations, several versions of a probabilistic forecast for gridded air temperature 1 month ahead based on ensemble members of the National Centers for Environmental Prediction (NCEP) Climate Forecast System Version 2 (CFSv2). We compare the forecast performance against a baseline climatology based probabilistic forecast, using average information gain as a skill metric. We find that the error in the CFSv2 output is better represented by the climatological variance than by the distribution of ensemble members because the GCM ensemble sometimes suffers from unrealistically little dispersion. Lack of ensemble spread leads a probabilistic forecast whose variance is based on the ensemble dispersion alone to underperform relative to a baseline probabilistic forecast based only on climatology, even when the ensemble mean is corrected for bias. We also show that a combined regression based model that includes climatology, temperature from recent months, trend, and the GCM ensemble mean yields a probabilistic forecast that outperforms approaches using only past observations or GCM outputs. Improvements in predictive skill from the combined probabilistic forecast vary spatially, with larger gains seen in traditionally hard to predict regions such as the Arctic.

2025

I consider the tradeoff between the information gained about an initially unknown quantum state, and the disturbance caused to that state by the measurement process. I show that for any distribution of initial states, the... more

I consider the tradeoff between the information gained about an initially unknown quantum state, and the disturbance caused to that state by the measurement process. I show that for any distribution of initial states, the information-disturbance frontier is convex, and disturbance is nondecreasing with information gain. I consider the most general model of quantum measurements, and all post-measurement dynamics compatible with a given measurement. For the uniform initial distribution over states, I show that an optimal information-disturbance combination may always be achieved by a measurement procedure which satisfies a generalization of the projection postulate, the "square-root dynamics." I use this to show that the information-disturbance frontier for the uniform ensemble may be achieved with "isotropic" (unitarily covariant) dynamics. This results in a significant simplification of the optimization problem for calculating the tradeoff in this case, giving hope for a closed-form solution. I also show that the discrete ensembles uniform on the d(d + 1) vectors of a certain set of d + 1 "mutually unbiased" or conjugate bases in d dimensions form spherical 2-designs in CP d-1 when d is a power of an odd prime. This implies that many of the results of the paper apply also to these discrete ensembles.

2025, arXiv (Cornell University)

I consider the tradeoff between the information gained about an initially unknown quantum state, and the disturbance caused to that state by the measurement process. I show that for any distribution of initial states, the... more

I consider the tradeoff between the information gained about an initially unknown quantum state, and the disturbance caused to that state by the measurement process. I show that for any distribution of initial states, the information-disturbance frontier is convex, and disturbance is nondecreasing with information gain. I consider the most general model of quantum measurements, and all post-measurement dynamics compatible with a given measurement. For the uniform initial distribution over states, I show that an optimal information-disturbance combination may always be achieved by a measurement procedure which satisfies a generalization of the projection postulate, the "square-root dynamics." I use this to show that the information-disturbance frontier for the uniform ensemble may be achieved with "isotropic" (unitarily covariant) dynamics. This results in a significant simplification of the optimization problem for calculating the tradeoff in this case, giving hope for a closed-form solution. I also show that the discrete ensembles uniform on the d(d + 1) vectors of a certain set of d + 1 "mutually unbiased" or conjugate bases in d dimensions form spherical 2-designs in CP d-1 when d is a power of an odd prime. This implies that many of the results of the paper apply also to these discrete ensembles.