Robi Polikar | Rowan University (original) (raw)
Papers by Robi Polikar
2015 International Joint Conference on Neural Networks (IJCNN), 2015
The 2013 International Joint Conference on Neural Networks (IJCNN), 2013
ABSTRACT Increasing number of practical applications that involve streaming nonstationary data ha... more ABSTRACT Increasing number of practical applications that involve streaming nonstationary data have led to a recent surge in algorithms designed to learn from such data. One challenging version of this problem that has not received as much attention, however, is learning streaming nonstationary data when a small initial set of data are labeled, with unlabeled data being available thereafter. We have recently introduced the COMPOSE algorithm for learning in such scenarios, which we refer to as initially labeled nonstationary streaming data. COMPOSE works remarkably well, however it requires limited (gradual) drift, and cannot address special cases such as introduction of a new class or significant overlap of existing classes, as such scenarios cannot be learned without additional labeled data. Scenarios that provide occasional or periodic limited labeled data are not uncommon, however, for which many of COMPOSE's restrictions can be lifted. In this contribution, we describe a new version of COMPOSE as a proof-of-concept algorithm that can identify the instances whose labels — if available — would be most beneficial, and then combine those instances with unlabeled data to actively learn from streaming nonstationary data, even when the distribution of the data experiences abrupt changes. On two carefully designed experiments that include abrupt changes as well as addition of new classes, we show that COMPOSE.AL significantly outperforms original COMPOSE, while closely tracking the optimal Bayes classifier performance.
— Selection of most informative features that leads to a small loss on future data are arguably o... more — Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman–Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.
Introduction: Reductions of cerebrospinal fluid (CSF) amyloid-beta(Aβ42) and elevated phosphoryla... more Introduction: Reductions of cerebrospinal fluid (CSF) amyloid-beta(Aβ42) and elevated phosphorylated-tau(p-Tau) reflect in vivo Alzheimer's disease (AD) pathology and show utility in predicting conversion from mild cognitive impairment (MCI) to dementia. We investigated the P50 event related potential component as a non invasive biomarker of AD pathology in non-demented elderly.
Methods: 36 MCI patients were stratified into amyloid positive (MCI-AD, n=17) and negative (MCI-Other, n=19) groups using CSF levels of Aβ42. All amyloid positive patients were also
p-Tau positive. P50s were elicited with an auditory oddball paradigm.
Results: MCI-AD patients yielded larger P50s than MCI-Other. The best amyloid-statuspredictor model showed 94.7% sensitivity, 94.1% specificity and 94.4% total accuracy.
Discussion: P50 predicted amyloid status in MCI patients, thereby showing a relationship with AD pathology versus MCI from an other etiology. The P50 may have clinical utility for
inexpensive prescreening and assessment of Alzheimer's pathology.
—Recent advances in machine learning, specifically in deep learning with neural networks, has mad... more —Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metage-nomics, we experiment with two deep learning methods: i) a deep belief network, and ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multi-layer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy—as that depends on the specific application—but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.
— An increasing number of real-world applications are associated with streaming data drawn from d... more — An increasing number of real-world applications are associated with streaming data drawn from drifting and nonstationary distributions that change over time. These applications demand new algorithms that can learn and adapt to such changes, also known as concept drift. Proper characterization of such data with existing approaches typically requires substantial amount of labeled instances, which may be difficult, expensive, or even impractical to obtain. In this paper, we introduce compacted object sample extraction (COMPOSE), a computational geometry-based framework to learn from nonstationary streaming data, where labels are unavailable (or presented very sporadically) after initialization. We introduce the algorithm in detail, and discuss its results and performances on several synthetic and real-world data sets, which demonstrate the ability of the algorithm to learn under several different scenarios of initially labeled streaming environments. On carefully designed synthetic data sets, we compare the performance of COMPOSE against the optimal Bayes classifier, as well as the arbitrary subpopula-tion tracker algorithm, which addresses a similar environment referred to as extreme verification latency. Furthermore, using the real-world National Oceanic and Atmospheric Administration weather data set, we demonstrate that COMPOSE is competitive even with a well-established and fully supervised nonstationary learning algorithm that receives labeled data in every batch. Index Terms— Alpha shape, concept drift, nonstationary environment, semisupervised learning (SSL), verification latency.
Applications that generate data from nonstationary environments, where the underlying phenomena c... more Applications that generate data from
nonstationary environments, where the underlying
phenomena change over time, are becoming increasingly
prevalent. Examples of these applications include making
inferences or predictions based on financial data, energy demand
and climate data analysis, web usage or sensor network monitoring, and
malware/spam detection, among many others. In nonstationary environments,
particularly those that generate streaming or multi-domain data, the probability
density function of the data-generating process may change (drift) over time.
Therefore, the fundamental and rather naïve assumption made by most computational
intelligence approaches – that the training and testing data are sampled from the same fixed,
albeit unknown, probability distribution – is simply not true. Learning in nonstationary
environments requires adaptive or evolving approaches that can monitor and track the
underlying changes, and adapt a model to accommodate those changes accordingly.
In this effort, we provide a comprehensive survey and tutorial of established as
well as state-of-the-art approaches, while highlighting two primary perspectives,
active and passive, for learning in nonstationary environments.
Finally, we also provide an inventory of existing real and
synthetic datasets, as well as tools and software for getting
started, evaluating and comparing different approaches.
IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS SPEECH AND SIGNAL …, Jan 1, 2002
Using a computational model to learn under various environments has been a well-researched field ... more Using a computational model to learn under various environments has been a well-researched field that produced relevant results; unfortunately, the majority of these efforts rely on three fundamental assumptions: i) there is a suffi-cient and representative data set to configure and assess ...
... on the specific class and the complexity of the particular laboratory exercise, the ... a bro... more ... on the specific class and the complexity of the particular laboratory exercise, the ... a broad background in biomedical engineering, and it is not just conducting random experiments ... In allexperiments described below (unless noted otherwise) students acquired their own biological ...
2015 International Joint Conference on Neural Networks (IJCNN), 2015
The 2013 International Joint Conference on Neural Networks (IJCNN), 2013
ABSTRACT Increasing number of practical applications that involve streaming nonstationary data ha... more ABSTRACT Increasing number of practical applications that involve streaming nonstationary data have led to a recent surge in algorithms designed to learn from such data. One challenging version of this problem that has not received as much attention, however, is learning streaming nonstationary data when a small initial set of data are labeled, with unlabeled data being available thereafter. We have recently introduced the COMPOSE algorithm for learning in such scenarios, which we refer to as initially labeled nonstationary streaming data. COMPOSE works remarkably well, however it requires limited (gradual) drift, and cannot address special cases such as introduction of a new class or significant overlap of existing classes, as such scenarios cannot be learned without additional labeled data. Scenarios that provide occasional or periodic limited labeled data are not uncommon, however, for which many of COMPOSE's restrictions can be lifted. In this contribution, we describe a new version of COMPOSE as a proof-of-concept algorithm that can identify the instances whose labels — if available — would be most beneficial, and then combine those instances with unlabeled data to actively learn from streaming nonstationary data, even when the distribution of the data experiences abrupt changes. On two carefully designed experiments that include abrupt changes as well as addition of new classes, we show that COMPOSE.AL significantly outperforms original COMPOSE, while closely tracking the optimal Bayes classifier performance.
— Selection of most informative features that leads to a small loss on future data are arguably o... more — Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman–Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.
Introduction: Reductions of cerebrospinal fluid (CSF) amyloid-beta(Aβ42) and elevated phosphoryla... more Introduction: Reductions of cerebrospinal fluid (CSF) amyloid-beta(Aβ42) and elevated phosphorylated-tau(p-Tau) reflect in vivo Alzheimer's disease (AD) pathology and show utility in predicting conversion from mild cognitive impairment (MCI) to dementia. We investigated the P50 event related potential component as a non invasive biomarker of AD pathology in non-demented elderly.
Methods: 36 MCI patients were stratified into amyloid positive (MCI-AD, n=17) and negative (MCI-Other, n=19) groups using CSF levels of Aβ42. All amyloid positive patients were also
p-Tau positive. P50s were elicited with an auditory oddball paradigm.
Results: MCI-AD patients yielded larger P50s than MCI-Other. The best amyloid-statuspredictor model showed 94.7% sensitivity, 94.1% specificity and 94.4% total accuracy.
Discussion: P50 predicted amyloid status in MCI patients, thereby showing a relationship with AD pathology versus MCI from an other etiology. The P50 may have clinical utility for
inexpensive prescreening and assessment of Alzheimer's pathology.
—Recent advances in machine learning, specifically in deep learning with neural networks, has mad... more —Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metage-nomics, we experiment with two deep learning methods: i) a deep belief network, and ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multi-layer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy—as that depends on the specific application—but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.
— An increasing number of real-world applications are associated with streaming data drawn from d... more — An increasing number of real-world applications are associated with streaming data drawn from drifting and nonstationary distributions that change over time. These applications demand new algorithms that can learn and adapt to such changes, also known as concept drift. Proper characterization of such data with existing approaches typically requires substantial amount of labeled instances, which may be difficult, expensive, or even impractical to obtain. In this paper, we introduce compacted object sample extraction (COMPOSE), a computational geometry-based framework to learn from nonstationary streaming data, where labels are unavailable (or presented very sporadically) after initialization. We introduce the algorithm in detail, and discuss its results and performances on several synthetic and real-world data sets, which demonstrate the ability of the algorithm to learn under several different scenarios of initially labeled streaming environments. On carefully designed synthetic data sets, we compare the performance of COMPOSE against the optimal Bayes classifier, as well as the arbitrary subpopula-tion tracker algorithm, which addresses a similar environment referred to as extreme verification latency. Furthermore, using the real-world National Oceanic and Atmospheric Administration weather data set, we demonstrate that COMPOSE is competitive even with a well-established and fully supervised nonstationary learning algorithm that receives labeled data in every batch. Index Terms— Alpha shape, concept drift, nonstationary environment, semisupervised learning (SSL), verification latency.
Applications that generate data from nonstationary environments, where the underlying phenomena c... more Applications that generate data from
nonstationary environments, where the underlying
phenomena change over time, are becoming increasingly
prevalent. Examples of these applications include making
inferences or predictions based on financial data, energy demand
and climate data analysis, web usage or sensor network monitoring, and
malware/spam detection, among many others. In nonstationary environments,
particularly those that generate streaming or multi-domain data, the probability
density function of the data-generating process may change (drift) over time.
Therefore, the fundamental and rather naïve assumption made by most computational
intelligence approaches – that the training and testing data are sampled from the same fixed,
albeit unknown, probability distribution – is simply not true. Learning in nonstationary
environments requires adaptive or evolving approaches that can monitor and track the
underlying changes, and adapt a model to accommodate those changes accordingly.
In this effort, we provide a comprehensive survey and tutorial of established as
well as state-of-the-art approaches, while highlighting two primary perspectives,
active and passive, for learning in nonstationary environments.
Finally, we also provide an inventory of existing real and
synthetic datasets, as well as tools and software for getting
started, evaluating and comparing different approaches.
IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS SPEECH AND SIGNAL …, Jan 1, 2002
Using a computational model to learn under various environments has been a well-researched field ... more Using a computational model to learn under various environments has been a well-researched field that produced relevant results; unfortunately, the majority of these efforts rely on three fundamental assumptions: i) there is a suffi-cient and representative data set to configure and assess ...
... on the specific class and the complexity of the particular laboratory exercise, the ... a bro... more ... on the specific class and the complexity of the particular laboratory exercise, the ... a broad background in biomedical engineering, and it is not just conducting random experiments ... In allexperiments described below (unless noted otherwise) students acquired their own biological ...