Active learning in nonstationary environments (original) (raw)

ABSTRACT Increasing number of practical applications that involve streaming nonstationary data have led to a recent surge in algorithms designed to learn from such data. One challenging version of this problem that has not received as much attention, however, is learning streaming nonstationary data when a small initial set of data are labeled, with unlabeled data being available thereafter. We have recently introduced the COMPOSE algorithm for learning in such scenarios, which we refer to as initially labeled nonstationary streaming data. COMPOSE works remarkably well, however it requires limited (gradual) drift, and cannot address special cases such as introduction of a new class or significant overlap of existing classes, as such scenarios cannot be learned without additional labeled data. Scenarios that provide occasional or periodic limited labeled data are not uncommon, however, for which many of COMPOSE's restrictions can be lifted. In this contribution, we describe a new version of COMPOSE as a proof-of-concept algorithm that can identify the instances whose labels — if available — would be most beneficial, and then combine those instances with unlabeled data to actively learn from streaming nonstationary data, even when the distribution of the data experiences abrupt changes. On two carefully designed experiments that include abrupt changes as well as addition of new classes, we show that COMPOSE.AL significantly outperforms original COMPOSE, while closely tracking the optimal Bayes classifier performance.