Razvan Andonie | Central Washington University (original) (raw)
Papers by Razvan Andonie
Artificial Neural Networks and Machine Learning – ICANN 2017
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL
Nearly all model algorithms used in machine learning use two different sets of parameters: the tr... more Nearly all model algorithms used in machine learning use two different sets of parameters: the training parameters and the meta-parameters (hyperparameters). While the training parameters are learned during the training phase, the values of the hyperparameters have to be specified before learning starts. For a given dataset, we would like to find the optimal combination of hyperparameter values, in a reasonable amount of time. This is a challenging task because of its computational complexity. In previous work [11], we introduced the Weighted Random Search (WRS) method, a combination of Random Search (RS) and probabilistic greedy heuristic. In the current paper, we compare the WRS method with several state-of-the art hyperparameter optimization methods with respect to Convolutional Neural Network (CNN) hyperparameter optimization. The criterion is the classification accuracy achieved within the same number of tested combinations of hyperparameter values. According to our experiments...
International Journal of Computers Communications & Control
We introduce an improved version of Random Search (RS), used here for hyperparameter optimization... more We introduce an improved version of Random Search (RS), used here for hyperparameter optimization of machine learning algorithms. Unlike the standard RS, which generates for each trial new values for all hyperparameters, we generate new values for each hyperparameter with a probability of change. The intuition behind our approach is that a value that already triggered a good result is a good candidate for the next step, and should be tested in new combinations of hyperparameter values. Within the same computational budget, our method yields better results than the standard RS. Our theoretical results prove this statement. We test our method on a variation of one of the most commonly used objective function for this class of problems (the Grievank function) and for the hyperparameter optimization of a deep learning CNN architecture. Our results can be generalized to any optimization problem dened on a discrete domain.
International Journal of Computers Communications & Control
A fundamental concept frequently applied to statistical machine learning is the detection of depe... more A fundamental concept frequently applied to statistical machine learning is the detection of dependencies between unknown random variables found from data samples. In previous work, we have introduced a nonparametric unilateral dependence measure based on Onicescu’s information energy and a kNN method for estimating this measure from an available sample set of discrete or continuous variables. This paper provides the formal proofs which show that the estimator is asymptotically unbiased and has asymptotic zero variance when the sample size increases. It implies that the estimator has good statistical qualities. We investigate the performance of the estimator for data analysis applications in sensor data analysis and financial time series.
2016 IEEE Symposium Series on Computational Intelligence (SSCI), 2016
2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), 2016
We present the first algorithm for finding holes in high dimensional data that runs in polynomial... more We present the first algorithm for finding holes in high dimensional data that runs in polynomial time with respect to the number of dimensions. Previous algorithms are exponential. Finding large empty rectangles or boxes in a set of points in 2D and 3D space has been well studied. Efficient algorithms exist to identify the empty regions in these lowdimensional spaces. Unfortunately such efficiency is lacking in higher dimensions where the problem has been shown to be NP-complete when the dimensions are included in the input. Applications for algorithms that find large empty spaces include big data analysis, recommender systems, automated knowledge discovery, and query optimization. Our Monte Carlo-based algorithm discovers interesting maximal empty hyper-rectangles in cases where dimensionality and input size would otherwise make analysis impractical. The run-time is polynomial in the size of the input and the number of dimensions. We apply the algorithm on a 39-dimensional data set for protein structures and discover interesting properties that we think could not be inferred otherwise.
Relevance Learning Vector Quantization (RLVQ) (introduced in [1]) is a variation of Learning Vect... more Relevance Learning Vector Quantization (RLVQ) (introduced in [1]) is a variation of Learning Vector Quantization (LVQ) which allows a heuristic determination of relevance factors for the input dimensions. The method is based on Hebbian learning and defines weighting factors of the input dimensions which are automatically adapted to the specific problem. These relevance factors increase the overall performance of the LVQ algorithm. At the same time, relevances can be used for feature ranking and input dimensionality reduction. We introduce a different method for computing the relevance of the input dimensions in RLVQ. The relevances are computed on-line as Ordered Weighted Aggregation (OWA) weights. OWA operators are a family of mean type aggregation operators [2]. The principal benefit of our OWA-RLVQ algorithm is that it connects RLVQ to the mathematically consistent OWA models.
Lecture Notes in Computer Science, 2005
We describe a kernel method which uses the maximization of Onicescu's informational energy as a c... more We describe a kernel method which uses the maximization of Onicescu's informational energy as a criteria for computing the relevances of input features. This adaptive relevance determination is used in combination with the neural-gas and the generalized relevance LVQ algorithms. Our quadratic optimization function, as an L 2 type method, leads to linear gradient and thus easier computation. We obtain an approximation formula similar to the mutual information based method, but in a more simple way.
Neural Network World, 1996
Recent results changed essentially our view concerning the generality of neural networks&... more Recent results changed essentially our view concerning the generality of neural networks' models. Presently, we know that such models i) are more powerful than Turing machines, ii) are universal approximators, iii) can represent any logical function, iv) can solve efficiently ...
Esann, 2004
Abstract. FAMR (Fuzzy ARTMAP with Relevance factor) is a FAM (Fuzzy ARTMAP) neural network used f... more Abstract. FAMR (Fuzzy ARTMAP with Relevance factor) is a FAM (Fuzzy ARTMAP) neural network used for classification, probability estimation [3],[2], and function approximation [4]. FAMR uses a relevance factor assigned to each sample pair, proportional to the ...
Pdpta, 2009
The paper 1 focuses on distributed file systems in P2P networks. We introduce a novel file replic... more The paper 1 focuses on distributed file systems in P2P networks. We introduce a novel file replication scheme which is adaptive, reacting to changes in the patterns of access to the file system by dynamically creating or deleting replicas. Replication is used to increase data availability in the presence of site or communication failures and to decrease retrieval costs by local access if possible. Our system is completely decentralized and nodes can be removed/added dynamically. We also propose an overlay architecture for file search. This architecture is structured, but also based on random walk. Our system has a mobile agent which performs dynamic load-balancing. This agent is event driven and circulates in the network to find and "destroy" the least important files and thus limit the proliferation of superfluous replicas. We have implemented our method at TCP/IP sockets level.
Esann, 2004
An Informational Energy LVQ Approach for Feature Ranking R˘azvan Andonie 1 and Angel Cataron 2 1C... more An Informational Energy LVQ Approach for Feature Ranking R˘azvan Andonie 1 and Angel Cataron 2 1Computer Science Department, Central Washington University, USA ... Denote the set of all codebook vectors by {w1,..., wK}. The components of a vector wj are [wj1,...,wjn]. ...
International Journal of Computers Communications & Control, 2010
Neural networks have been applied successfully in many fields. However, satisfactory results can ... more Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets. We have the following goals: i. To discuss the meaning of "small" in the context of inferring from small datasets. ii. To overview computational intelligence solutions for this problem. iii. To illustrate the introduced concepts with a real-life application. 1 Introduction Small dataset conditions exist in many applications, such as disease diagnosis, fault diagnosis or deficiency detection in biology and biotechnology, mechanics, flexible manufacturing system scheduling, drug design, and short-term load forecasting (an activity conducted on a daily basis by electrical utilities). In this section, we describe a computational chemistry problem, review a class of neural networks to be used, and summarize our previous work in this area. 1.1 A Real-World Problem: Assist Drug Discovery Current treatments for HIV/AIDS consist of co-administering a protease inhibitor and two reverse transcriptase inhibitors (usually referred to as combination therapy). This therapy is effective in reducing viremia to very low levels; however, in 30-50% of patients it is ineffective due to resistance development often caused by viral mutations. Due to resistance and poor bioavailability 1 profiles, as well as toxicity associated with these therapies, there is an urgent need for more efficient design of drugs. We focus on inhibitors to the HIV-1 protease enzyme, using the IC as the target value. A detailed description of the problem, from a computational chemistry point of view, can be found in our papers [1-3]. The IC value represents the concentration of a compound that is required to reduce enzyme activity by 50%. A low IC value indicates good inhibitory activity. The available dataset consists of 196 compounds with experimentally determined IC values. Twenty of these molecules are used as an external test set after the training is completed. The remaining 176 molecules are used for training and cross-validation. Our practical goal is to predict the (unknown) IC values for 26 novel compounds which are candidates for HIV-1 protease inhibitors. We use two IC prediction accuracy measures: the RMSE (Root Mean Squared Error) and the Symmetric Mean Absolute Percentage Error (sMAPE). 1 Bioavailability is the rate at which the drug reaches the systemic circulation.
Ieee Acm Transactions on Computational Biology and Bioinformatics Ieee Acm, 2011
Obtaining satisfactory results with neural networks depends on the availability of large data sam... more Obtaining satisfactory results with neural networks depends on the availability of large data samples. The use of small training sets generally reduces performance. Most classical Quantitative Structure-Activity Relationship (QSAR) studies for a specific enzyme system have been performed on small data sets. We focus on the neuro-fuzzy prediction of biological activities of HIV-1 protease inhibitory compounds when inferring from small training sets. We propose two computational intelligence prediction techniques which are suitable for small training sets, at the expense of some computational overhead. Both techniques are based on the FAMR model. The FAMR is a Fuzzy ARTMAP (FAM) incremental learning system used for classification and probability estimation. During the learning phase, each sample pair is assigned a relevance factor proportional to the importance of that pair. The two proposed algorithms in this paper are: 1) The GA-FAMR algorithm, which is new, consists of two stages: a) During the first stage, we use a genetic algorithm (GA) to optimize the relevances assigned to the training data. This improves the generalization capability of the FAMR. b) In the second stage, we use the optimized relevances to train the FAMR. 2) The Ordered FAMR is derived from a known algorithm. Instead of optimizing relevances, it optimizes the order of data presentation using the algorithm of Dagher et al. In our experiments, we compare these two algorithms with an algorithm not based on the FAM, the FS-GA-FNN introduced in [4], [5]. We conclude that when inferring from small training sets, both techniques are efficient, in terms of generalization capability and execution time. The computational overhead introduced is compensated by better accuracy. Finally, the proposed techniques are used to predict the biological activities of newly designed potential HIV-1 protease inhibitors.
The 2006 Ieee International Joint Conference on Neural Network Proceedings, 2006
Using a neural network-fuzzy logic-genetic algorithm approach we generate an optimal predictor fo... more Using a neural network-fuzzy logic-genetic algorithm approach we generate an optimal predictor for biological activities of HIV-1 protease potential inhibitory compounds. We use genetic algorithms (GAs) in the two optimization stages. In the first stage, we generate an optimal subset of features. In the second stage, we optimize the architecture of the fuzzy neural network. The optimized network is trained and used for the prediction of biological activities of newly designed chemical compounds. Finally, we extract fuzzy IF/THEN rules. These rules map physico-chemical structure descriptors to predicted inhibitory values. The optimal subset of features, combined with the generated rules, can be used to analyze the influence of descriptors.
2015 International Joint Conference on Neural Networks (IJCNN), 2015
Our research area is the unilateral dependency (UD) analysis of non-linear relationships within p... more Our research area is the unilateral dependency (UD) analysis of non-linear relationships within pairs of simultaneous data. The application is in financial analysis, using the data reported by Kodak and Apple for the period of 1999-2014. We compute and analyze the UD between Kodak's and Apple's financial time series in order to understand how they influence each other over their company assets and liabilities. We also analyze within each of the two companies the UD between assets and liabilities. Our formal approach is based on the informational energy UD measure derived by us in previous work. This measure is estimated here from available sample data, using a non-parametric asymptotically unbiased and consistent kNN estimator.
2015 International Joint Conference on Neural Networks (IJCNN), 2015
2015 International Joint Conference on Neural Networks (IJCNN), 2015
Artificial Neural Networks and Machine Learning – ICANN 2017
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL
Nearly all model algorithms used in machine learning use two different sets of parameters: the tr... more Nearly all model algorithms used in machine learning use two different sets of parameters: the training parameters and the meta-parameters (hyperparameters). While the training parameters are learned during the training phase, the values of the hyperparameters have to be specified before learning starts. For a given dataset, we would like to find the optimal combination of hyperparameter values, in a reasonable amount of time. This is a challenging task because of its computational complexity. In previous work [11], we introduced the Weighted Random Search (WRS) method, a combination of Random Search (RS) and probabilistic greedy heuristic. In the current paper, we compare the WRS method with several state-of-the art hyperparameter optimization methods with respect to Convolutional Neural Network (CNN) hyperparameter optimization. The criterion is the classification accuracy achieved within the same number of tested combinations of hyperparameter values. According to our experiments...
International Journal of Computers Communications & Control
We introduce an improved version of Random Search (RS), used here for hyperparameter optimization... more We introduce an improved version of Random Search (RS), used here for hyperparameter optimization of machine learning algorithms. Unlike the standard RS, which generates for each trial new values for all hyperparameters, we generate new values for each hyperparameter with a probability of change. The intuition behind our approach is that a value that already triggered a good result is a good candidate for the next step, and should be tested in new combinations of hyperparameter values. Within the same computational budget, our method yields better results than the standard RS. Our theoretical results prove this statement. We test our method on a variation of one of the most commonly used objective function for this class of problems (the Grievank function) and for the hyperparameter optimization of a deep learning CNN architecture. Our results can be generalized to any optimization problem dened on a discrete domain.
International Journal of Computers Communications & Control
A fundamental concept frequently applied to statistical machine learning is the detection of depe... more A fundamental concept frequently applied to statistical machine learning is the detection of dependencies between unknown random variables found from data samples. In previous work, we have introduced a nonparametric unilateral dependence measure based on Onicescu’s information energy and a kNN method for estimating this measure from an available sample set of discrete or continuous variables. This paper provides the formal proofs which show that the estimator is asymptotically unbiased and has asymptotic zero variance when the sample size increases. It implies that the estimator has good statistical qualities. We investigate the performance of the estimator for data analysis applications in sensor data analysis and financial time series.
2016 IEEE Symposium Series on Computational Intelligence (SSCI), 2016
2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), 2016
We present the first algorithm for finding holes in high dimensional data that runs in polynomial... more We present the first algorithm for finding holes in high dimensional data that runs in polynomial time with respect to the number of dimensions. Previous algorithms are exponential. Finding large empty rectangles or boxes in a set of points in 2D and 3D space has been well studied. Efficient algorithms exist to identify the empty regions in these lowdimensional spaces. Unfortunately such efficiency is lacking in higher dimensions where the problem has been shown to be NP-complete when the dimensions are included in the input. Applications for algorithms that find large empty spaces include big data analysis, recommender systems, automated knowledge discovery, and query optimization. Our Monte Carlo-based algorithm discovers interesting maximal empty hyper-rectangles in cases where dimensionality and input size would otherwise make analysis impractical. The run-time is polynomial in the size of the input and the number of dimensions. We apply the algorithm on a 39-dimensional data set for protein structures and discover interesting properties that we think could not be inferred otherwise.
Relevance Learning Vector Quantization (RLVQ) (introduced in [1]) is a variation of Learning Vect... more Relevance Learning Vector Quantization (RLVQ) (introduced in [1]) is a variation of Learning Vector Quantization (LVQ) which allows a heuristic determination of relevance factors for the input dimensions. The method is based on Hebbian learning and defines weighting factors of the input dimensions which are automatically adapted to the specific problem. These relevance factors increase the overall performance of the LVQ algorithm. At the same time, relevances can be used for feature ranking and input dimensionality reduction. We introduce a different method for computing the relevance of the input dimensions in RLVQ. The relevances are computed on-line as Ordered Weighted Aggregation (OWA) weights. OWA operators are a family of mean type aggregation operators [2]. The principal benefit of our OWA-RLVQ algorithm is that it connects RLVQ to the mathematically consistent OWA models.
Lecture Notes in Computer Science, 2005
We describe a kernel method which uses the maximization of Onicescu's informational energy as a c... more We describe a kernel method which uses the maximization of Onicescu's informational energy as a criteria for computing the relevances of input features. This adaptive relevance determination is used in combination with the neural-gas and the generalized relevance LVQ algorithms. Our quadratic optimization function, as an L 2 type method, leads to linear gradient and thus easier computation. We obtain an approximation formula similar to the mutual information based method, but in a more simple way.
Neural Network World, 1996
Recent results changed essentially our view concerning the generality of neural networks&... more Recent results changed essentially our view concerning the generality of neural networks' models. Presently, we know that such models i) are more powerful than Turing machines, ii) are universal approximators, iii) can represent any logical function, iv) can solve efficiently ...
Esann, 2004
Abstract. FAMR (Fuzzy ARTMAP with Relevance factor) is a FAM (Fuzzy ARTMAP) neural network used f... more Abstract. FAMR (Fuzzy ARTMAP with Relevance factor) is a FAM (Fuzzy ARTMAP) neural network used for classification, probability estimation [3],[2], and function approximation [4]. FAMR uses a relevance factor assigned to each sample pair, proportional to the ...
Pdpta, 2009
The paper 1 focuses on distributed file systems in P2P networks. We introduce a novel file replic... more The paper 1 focuses on distributed file systems in P2P networks. We introduce a novel file replication scheme which is adaptive, reacting to changes in the patterns of access to the file system by dynamically creating or deleting replicas. Replication is used to increase data availability in the presence of site or communication failures and to decrease retrieval costs by local access if possible. Our system is completely decentralized and nodes can be removed/added dynamically. We also propose an overlay architecture for file search. This architecture is structured, but also based on random walk. Our system has a mobile agent which performs dynamic load-balancing. This agent is event driven and circulates in the network to find and "destroy" the least important files and thus limit the proliferation of superfluous replicas. We have implemented our method at TCP/IP sockets level.
Esann, 2004
An Informational Energy LVQ Approach for Feature Ranking R˘azvan Andonie 1 and Angel Cataron 2 1C... more An Informational Energy LVQ Approach for Feature Ranking R˘azvan Andonie 1 and Angel Cataron 2 1Computer Science Department, Central Washington University, USA ... Denote the set of all codebook vectors by {w1,..., wK}. The components of a vector wj are [wj1,...,wjn]. ...
International Journal of Computers Communications & Control, 2010
Neural networks have been applied successfully in many fields. However, satisfactory results can ... more Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets. We have the following goals: i. To discuss the meaning of "small" in the context of inferring from small datasets. ii. To overview computational intelligence solutions for this problem. iii. To illustrate the introduced concepts with a real-life application. 1 Introduction Small dataset conditions exist in many applications, such as disease diagnosis, fault diagnosis or deficiency detection in biology and biotechnology, mechanics, flexible manufacturing system scheduling, drug design, and short-term load forecasting (an activity conducted on a daily basis by electrical utilities). In this section, we describe a computational chemistry problem, review a class of neural networks to be used, and summarize our previous work in this area. 1.1 A Real-World Problem: Assist Drug Discovery Current treatments for HIV/AIDS consist of co-administering a protease inhibitor and two reverse transcriptase inhibitors (usually referred to as combination therapy). This therapy is effective in reducing viremia to very low levels; however, in 30-50% of patients it is ineffective due to resistance development often caused by viral mutations. Due to resistance and poor bioavailability 1 profiles, as well as toxicity associated with these therapies, there is an urgent need for more efficient design of drugs. We focus on inhibitors to the HIV-1 protease enzyme, using the IC as the target value. A detailed description of the problem, from a computational chemistry point of view, can be found in our papers [1-3]. The IC value represents the concentration of a compound that is required to reduce enzyme activity by 50%. A low IC value indicates good inhibitory activity. The available dataset consists of 196 compounds with experimentally determined IC values. Twenty of these molecules are used as an external test set after the training is completed. The remaining 176 molecules are used for training and cross-validation. Our practical goal is to predict the (unknown) IC values for 26 novel compounds which are candidates for HIV-1 protease inhibitors. We use two IC prediction accuracy measures: the RMSE (Root Mean Squared Error) and the Symmetric Mean Absolute Percentage Error (sMAPE). 1 Bioavailability is the rate at which the drug reaches the systemic circulation.
Ieee Acm Transactions on Computational Biology and Bioinformatics Ieee Acm, 2011
Obtaining satisfactory results with neural networks depends on the availability of large data sam... more Obtaining satisfactory results with neural networks depends on the availability of large data samples. The use of small training sets generally reduces performance. Most classical Quantitative Structure-Activity Relationship (QSAR) studies for a specific enzyme system have been performed on small data sets. We focus on the neuro-fuzzy prediction of biological activities of HIV-1 protease inhibitory compounds when inferring from small training sets. We propose two computational intelligence prediction techniques which are suitable for small training sets, at the expense of some computational overhead. Both techniques are based on the FAMR model. The FAMR is a Fuzzy ARTMAP (FAM) incremental learning system used for classification and probability estimation. During the learning phase, each sample pair is assigned a relevance factor proportional to the importance of that pair. The two proposed algorithms in this paper are: 1) The GA-FAMR algorithm, which is new, consists of two stages: a) During the first stage, we use a genetic algorithm (GA) to optimize the relevances assigned to the training data. This improves the generalization capability of the FAMR. b) In the second stage, we use the optimized relevances to train the FAMR. 2) The Ordered FAMR is derived from a known algorithm. Instead of optimizing relevances, it optimizes the order of data presentation using the algorithm of Dagher et al. In our experiments, we compare these two algorithms with an algorithm not based on the FAM, the FS-GA-FNN introduced in [4], [5]. We conclude that when inferring from small training sets, both techniques are efficient, in terms of generalization capability and execution time. The computational overhead introduced is compensated by better accuracy. Finally, the proposed techniques are used to predict the biological activities of newly designed potential HIV-1 protease inhibitors.
The 2006 Ieee International Joint Conference on Neural Network Proceedings, 2006
Using a neural network-fuzzy logic-genetic algorithm approach we generate an optimal predictor fo... more Using a neural network-fuzzy logic-genetic algorithm approach we generate an optimal predictor for biological activities of HIV-1 protease potential inhibitory compounds. We use genetic algorithms (GAs) in the two optimization stages. In the first stage, we generate an optimal subset of features. In the second stage, we optimize the architecture of the fuzzy neural network. The optimized network is trained and used for the prediction of biological activities of newly designed chemical compounds. Finally, we extract fuzzy IF/THEN rules. These rules map physico-chemical structure descriptors to predicted inhibitory values. The optimal subset of features, combined with the generated rules, can be used to analyze the influence of descriptors.
2015 International Joint Conference on Neural Networks (IJCNN), 2015
Our research area is the unilateral dependency (UD) analysis of non-linear relationships within p... more Our research area is the unilateral dependency (UD) analysis of non-linear relationships within pairs of simultaneous data. The application is in financial analysis, using the data reported by Kodak and Apple for the period of 1999-2014. We compute and analyze the UD between Kodak's and Apple's financial time series in order to understand how they influence each other over their company assets and liabilities. We also analyze within each of the two companies the UD between assets and liabilities. Our formal approach is based on the informational energy UD measure derived by us in previous work. This measure is estimated here from available sample data, using a non-parametric asymptotically unbiased and consistent kNN estimator.
2015 International Joint Conference on Neural Networks (IJCNN), 2015
2015 International Joint Conference on Neural Networks (IJCNN), 2015