Yuheng Bu - Academia.edu (original) (raw)
Papers by Yuheng Bu
GLOBECOM 2022 - 2022 IEEE Global Communications Conference
2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP)
Cornell University - arXiv, Oct 18, 2022
Cornell University - arXiv, Oct 15, 2022
Cornell University - arXiv, Oct 28, 2021
Selective regression allows abstention from prediction if the confidence to make an accurate pred... more Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, then the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets.
Cornell University - arXiv, Jul 28, 2021
Bounding the generalization error of a supervised learning algorithm is one of the most important... more Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory, and various approaches have been developed. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm in terms of symmetrized KL information between the input training samples and the output hypothesis. Such a result can be applied to tighten existing expected generalization error bound. Our analysis provides more insight on the fundamental role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm.
The focus of this thesis is on understanding machine learning algorithms from an information-theo... more The focus of this thesis is on understanding machine learning algorithms from an information-theoretic point of view. More specifically, we apply information-theoretic tools to construct performance bounds for the learning algorithms, with the goal of deepening the understanding of current algorithms and inspiring new learning techniques. The first problem considered involves a sequence of machine learning problems that vary in a bounded manner from one time-step to the next. To solve these problems in an accurate and data-efficient way, an active and adaptive learning framework is proposed, in which the labels of the most informative samples are actively queried from an unlabeled data pool, and the adaptation to the change is achieved by utilizing the information acquired in previous steps. The goal is to satisfy a pre-specified bound on the excess risk at each time-step. More specifically, the design of the active querying algorithm is based on minimizing the excess risk using sto...
2022 IEEE International Symposium on Information Theory (ISIT)
Generalization error bounds are essential to understanding machine learning algorithms. This pape... more Generalization error bounds are essential to understanding machine learning algorithms. This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample. Multiple generalization error upper bounds based on different information measures are provided, including Wasserstein distance, total variation distance, KL divergence, and Jensen-Shannon divergence. Due to the convexity of the information measures, the proposed bounds in terms of Wasserstein distance and total variation distance are shown to be tighter than their counterparts based on individual samples in the literature. An example is provided to demonstrate the tightness of the proposed generalization error bounds.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Model change detection is studied, in which there are two sets of samples that are independently ... more Model change detection is studied, in which there are two sets of samples that are independently and identically distributed (i.i.d.) according to a pre-change probabilistic model with parameter θ, and a post-change model with parameter θ , respectively. The goal is to detect whether the change in the model is significant, i.e., whether the difference between the prechange parameter and the post-change parameter θ − θ 2 is larger than a predetermined threshold ρ. The problem is considered in a Neyman-Pearson setting, where the goal is to maximize the probability of detection under a false alarm constraint. Since the generalized likelihood ratio test (GLRT) is difficult to compute in this problem, we construct an empirical difference test (EDT), which approximates the GLRT and has low computational complexity. Moreover, we provide an approximation method to set the threshold of the EDT to meet the false alarm constraint. Experiments with linear regression and logistic regression are conducted to validate the proposed algorithms.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
As machine learning algorithms grow in popularity and diversify to many industries, ethical and l... more As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant. We explore the problem of algorithmic fairness, taking an informationtheoretic view. The maximal correlation framework is introduced for expressing fairness constraints and shown to be capable of being used to derive regularizers that enforce independence and separation-based fairness criteria, which admit optimization algorithms for both discrete and continuous variables which are more computationally efficient than existing algorithms. We show that these algorithms provide smooth performance-fairness tradeoff curves and perform competitively with state-of-the-art methods on both discrete datasets (COMPAS, Adult) and continuous datasets (Communities and Crimes).
2021 IEEE International Symposium on Information Theory (ISIT)
We generalize the information bottleneck (IB) and privacy funnel (PF) problems by introducing the... more We generalize the information bottleneck (IB) and privacy funnel (PF) problems by introducing the notion of a sensitive attribute, which arises in a growing number of applications. In this generalization, we seek to construct representations of observations that are maximally (or minimally) informative about a target variable, while also satisfying constraints with respect to a variable corresponding to the sensitive attribute. In the Gaussian and discrete settings, we show that by suitably approximating the Kullback-Liebler (KL) divergence defining traditional Shannon mutual information, the generalized IB and PF problems can be formulated as semi-definite programs (SDPs), and thus efficiently solved, which is important in applications of high-dimensional inference. We validate our algorithms on synthetic data and demonstrate their use in imposing fairness in machine learning on real data as an illustrative application.
Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain a... more Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models. Existing methods for such multi-source-free domain adaptation typically train a target model using supervised techniques in conjunction with pseudo-labels for the target data, which are produced by the available source models. However, we show that assigning pseudo-labels to only a subset of the target data leads to improved performance. In particular, we develop an information-theoretic bound on the generalization error of the resulting target model that demonstrates an inherent bias-variance trade-off controlled by the subset choice. Guided by this analysis, we develop a method that partitions the target data into pseudo-labeled and unlabeled subsets to balance the trade-off. In addition to exploiting the pseudo-labeled subset, our algorithm further leverages the...
Entropy, 2022
As machine learning algorithms grow in popularity and diversify to many industries, ethical and l... more As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant. We explore the problem of algorithmic fairness, taking an information–theoretic view. The maximal correlation framework is introduced for expressing fairness constraints and is shown to be capable of being used to derive regularizers that enforce independence and separation-based fairness criteria, which admit optimization algorithms for both discrete and continuous variables that are more computationally efficient than existing algorithms. We show that these algorithms provide smooth performance–fairness tradeoff curves and perform competitively with state-of-the-art methods on both discrete datasets (COMPAS, Adult) and continuous datasets (Communities and Crimes).
Cornell University - arXiv, Nov 2, 2021
We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfe... more We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, α-weighted-ERM and two-stage-ERM. Our key result is an exact characterization of the generalization behaviour using the conditional symmetrized KL information between the output hypothesis and the target training samples given the source samples. Our results can also be applied to provide novel distribution-free generalization error upper bounds on these two aforementioned Gibbs algorithms. Our approach is versatile, as it also characterizes the generalization errors and excess risks of these two Gibbs algorithms in the asymptotic regime, where they converge to the α-weighted-ERM and two-stage-ERM, respectively. Based on our theoretical results, we show that the benefits of transfer learning can be viewed as a bias-variance trade-off, with the bias induced by the source distribution and the variance induced by the lack of target samples. We believe this viewpoint can guide the choice of transfer learning algorithms in practice.
Cornell University - arXiv, Nov 30, 2018
While recommendation systems generally observe user behavior passively, there has been an increas... more While recommendation systems generally observe user behavior passively, there has been an increased interest in directly querying users to learn their specific preferences. In such settings, considering queries at different levels of granularity to optimize user information acquisition is crucial to efficiently providing a good user experience. In this work, we study the active learning problem with multi-level user preferences within the collective matrix factorization (CMF) framework. CMF jointly captures multi-level user preferences with respect to items and relations between items (e.g., book genre, cuisine type), generally resulting in improved predictions. Motivated by finite-sample analysis of the CMF model, we propose a theoretically optimal active learning strategy based on the Fisher information matrix and use this to derive a realizable approximation algorithm for practical recommendations. Experiments are conducted using both the Yelp dataset directly and an illustrative synthetic dataset in the three settings of personalized active learning, cold-start recommendations, and noisy data-demonstrating strong improvements over several widely used active learning methods.
Various approaches have been developed to upper bound the generalization error of a supervised le... more Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm (a.k.a. Gibbs posterior) using symmetrized KL information between the input training samples and the output hypothesis. Our result can be applied to tighten existing expected generalization error and PAC-Bayesian bounds. Our approach is versatile, as it also characterizes the generalization error of the Gibbs algorithm with data-dependent regularizer and that of the Gibbs algorithm in the asymptotic regime, where it converges to the empirical risk minimization algorithm. Of particular relevance, our results highlight the role the symmetrized KL information plays in controlling the genera...
2016 IEEE International Symposium on Information Theory (ISIT), 2016
The problem of estimating the KL divergence between two unknown distributions is studied. The alp... more The problem of estimating the KL divergence between two unknown distributions is studied. The alphabet size k of the distributions can scale to infinity. The estimation is based on m and n independent samples respectively drawn from the two distributions. It is first shown that there does not exist any consistent estimator to guarantee asymptotic small worst-case quadratic risk over the set of all pairs of distributions. A restricted set that contains pairs of distributions with bounded ratio f(k) is further considered. An augmented plug-in estimator is proposed, and is shown to be consistent if and only if m = ω(k ⋁ log2(f(k)) and n = ω(k f(k)). Furthermore, if f(k) ≥ log2k and log2(f(k)) = o(k), it is shown that any consistent estimator must satisfy the necessary conditions: m = ω( k/log k ⋁ log2(f(k)) and n = ω( k f(k)/log k).
2019 53rd Asilomar Conference on Signals, Systems, and Computers, 2019
We consider solving a sequence of machine learning problems that vary in a bounded manner from on... more We consider solving a sequence of machine learning problems that vary in a bounded manner from one time-step to the next. To solve these problems in an accurate and data-efficient way, we propose an active and adaptive learning framework, in which we actively query the labels of the most informative samples from an unlabeled data pool, and adapt to the change by utilizing the information acquired in the previous steps. Our goal is to satisfy a pre-specified bound on the excess risk at each time-step. We first design the active querying algorithm by minimizing the excess risk using stochastic gradient descent in the maximum likelihood estimation setting. Then, we propose a sample size selection rule that minimizes the number of samples by adapting to the change in the learning problems, while satisfying the required bound on excess risk at each time-step. Based on the actively queried samples, we construct an estimator for the change in the learning problems, which we prove to be an asymptotically tight upper bound of its true value. We validate our algorithm and theory through experiments with real data.
To be considered for the 2017 IEEE Jack Keil Wolf ISIT Student Paper Award. We study an outlying ... more To be considered for the 2017 IEEE Jack Keil Wolf ISIT Student Paper Award. We study an outlying sequence detection problem, in which there are M sequences of samples out of which a small subset of outliers needs to be detected. A sequence is considered an outlier if the observations therein are generated by a distribution different from those generating the observations in the majority of sequences. In the universal setting, the goal is to identify all the outliers without any knowledge about the underlying generating distributions. In prior work, this problem was studied as a universal hypothesis testing problem, and a generalized likelihood (GL) test with high computational complexity was constructed and its asymptotic performance characterized. we novelly propose a test based on distribution clustering. Such a test is shown to be exponentially consistent and the time complexity is linear in the total number of sequences. Furthermore, our tests based on clustering are applicable ...
ArXiv, 2018
A framework is introduced for actively and adaptively solving a sequence of machine learning prob... more A framework is introduced for actively and adaptively solving a sequence of machine learning problems, which are changing in bounded manner from one time step to the next. An algorithm is developed that actively queries the labels of the most informative samples from an unlabeled data pool, and that adapts to the change by utilizing the information acquired in the previous steps. Our analysis shows that the proposed active learning algorithm based on stochastic gradient descent achieves a near-optimal excess risk performance for maximum likelihood estimation. Furthermore, an estimator of the change in the learning problems using the active learning samples is constructed, which provides an adaptive sample size selection rule that guarantees the excess risk is bounded for sufficiently large number of time steps. Experiments with synthetic and real data are presented to validate our algorithm and theoretical results.
GLOBECOM 2022 - 2022 IEEE Global Communications Conference
2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP)
Cornell University - arXiv, Oct 18, 2022
Cornell University - arXiv, Oct 15, 2022
Cornell University - arXiv, Oct 28, 2021
Selective regression allows abstention from prediction if the confidence to make an accurate pred... more Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, then the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets.
Cornell University - arXiv, Jul 28, 2021
Bounding the generalization error of a supervised learning algorithm is one of the most important... more Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory, and various approaches have been developed. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm in terms of symmetrized KL information between the input training samples and the output hypothesis. Such a result can be applied to tighten existing expected generalization error bound. Our analysis provides more insight on the fundamental role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm.
The focus of this thesis is on understanding machine learning algorithms from an information-theo... more The focus of this thesis is on understanding machine learning algorithms from an information-theoretic point of view. More specifically, we apply information-theoretic tools to construct performance bounds for the learning algorithms, with the goal of deepening the understanding of current algorithms and inspiring new learning techniques. The first problem considered involves a sequence of machine learning problems that vary in a bounded manner from one time-step to the next. To solve these problems in an accurate and data-efficient way, an active and adaptive learning framework is proposed, in which the labels of the most informative samples are actively queried from an unlabeled data pool, and the adaptation to the change is achieved by utilizing the information acquired in previous steps. The goal is to satisfy a pre-specified bound on the excess risk at each time-step. More specifically, the design of the active querying algorithm is based on minimizing the excess risk using sto...
2022 IEEE International Symposium on Information Theory (ISIT)
Generalization error bounds are essential to understanding machine learning algorithms. This pape... more Generalization error bounds are essential to understanding machine learning algorithms. This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample. Multiple generalization error upper bounds based on different information measures are provided, including Wasserstein distance, total variation distance, KL divergence, and Jensen-Shannon divergence. Due to the convexity of the information measures, the proposed bounds in terms of Wasserstein distance and total variation distance are shown to be tighter than their counterparts based on individual samples in the literature. An example is provided to demonstrate the tightness of the proposed generalization error bounds.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Model change detection is studied, in which there are two sets of samples that are independently ... more Model change detection is studied, in which there are two sets of samples that are independently and identically distributed (i.i.d.) according to a pre-change probabilistic model with parameter θ, and a post-change model with parameter θ , respectively. The goal is to detect whether the change in the model is significant, i.e., whether the difference between the prechange parameter and the post-change parameter θ − θ 2 is larger than a predetermined threshold ρ. The problem is considered in a Neyman-Pearson setting, where the goal is to maximize the probability of detection under a false alarm constraint. Since the generalized likelihood ratio test (GLRT) is difficult to compute in this problem, we construct an empirical difference test (EDT), which approximates the GLRT and has low computational complexity. Moreover, we provide an approximation method to set the threshold of the EDT to meet the false alarm constraint. Experiments with linear regression and logistic regression are conducted to validate the proposed algorithms.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
As machine learning algorithms grow in popularity and diversify to many industries, ethical and l... more As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant. We explore the problem of algorithmic fairness, taking an informationtheoretic view. The maximal correlation framework is introduced for expressing fairness constraints and shown to be capable of being used to derive regularizers that enforce independence and separation-based fairness criteria, which admit optimization algorithms for both discrete and continuous variables which are more computationally efficient than existing algorithms. We show that these algorithms provide smooth performance-fairness tradeoff curves and perform competitively with state-of-the-art methods on both discrete datasets (COMPAS, Adult) and continuous datasets (Communities and Crimes).
2021 IEEE International Symposium on Information Theory (ISIT)
We generalize the information bottleneck (IB) and privacy funnel (PF) problems by introducing the... more We generalize the information bottleneck (IB) and privacy funnel (PF) problems by introducing the notion of a sensitive attribute, which arises in a growing number of applications. In this generalization, we seek to construct representations of observations that are maximally (or minimally) informative about a target variable, while also satisfying constraints with respect to a variable corresponding to the sensitive attribute. In the Gaussian and discrete settings, we show that by suitably approximating the Kullback-Liebler (KL) divergence defining traditional Shannon mutual information, the generalized IB and PF problems can be formulated as semi-definite programs (SDPs), and thus efficiently solved, which is important in applications of high-dimensional inference. We validate our algorithms on synthetic data and demonstrate their use in imposing fairness in machine learning on real data as an illustrative application.
Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain a... more Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models. Existing methods for such multi-source-free domain adaptation typically train a target model using supervised techniques in conjunction with pseudo-labels for the target data, which are produced by the available source models. However, we show that assigning pseudo-labels to only a subset of the target data leads to improved performance. In particular, we develop an information-theoretic bound on the generalization error of the resulting target model that demonstrates an inherent bias-variance trade-off controlled by the subset choice. Guided by this analysis, we develop a method that partitions the target data into pseudo-labeled and unlabeled subsets to balance the trade-off. In addition to exploiting the pseudo-labeled subset, our algorithm further leverages the...
Entropy, 2022
As machine learning algorithms grow in popularity and diversify to many industries, ethical and l... more As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant. We explore the problem of algorithmic fairness, taking an information–theoretic view. The maximal correlation framework is introduced for expressing fairness constraints and is shown to be capable of being used to derive regularizers that enforce independence and separation-based fairness criteria, which admit optimization algorithms for both discrete and continuous variables that are more computationally efficient than existing algorithms. We show that these algorithms provide smooth performance–fairness tradeoff curves and perform competitively with state-of-the-art methods on both discrete datasets (COMPAS, Adult) and continuous datasets (Communities and Crimes).
Cornell University - arXiv, Nov 2, 2021
We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfe... more We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, α-weighted-ERM and two-stage-ERM. Our key result is an exact characterization of the generalization behaviour using the conditional symmetrized KL information between the output hypothesis and the target training samples given the source samples. Our results can also be applied to provide novel distribution-free generalization error upper bounds on these two aforementioned Gibbs algorithms. Our approach is versatile, as it also characterizes the generalization errors and excess risks of these two Gibbs algorithms in the asymptotic regime, where they converge to the α-weighted-ERM and two-stage-ERM, respectively. Based on our theoretical results, we show that the benefits of transfer learning can be viewed as a bias-variance trade-off, with the bias induced by the source distribution and the variance induced by the lack of target samples. We believe this viewpoint can guide the choice of transfer learning algorithms in practice.
Cornell University - arXiv, Nov 30, 2018
While recommendation systems generally observe user behavior passively, there has been an increas... more While recommendation systems generally observe user behavior passively, there has been an increased interest in directly querying users to learn their specific preferences. In such settings, considering queries at different levels of granularity to optimize user information acquisition is crucial to efficiently providing a good user experience. In this work, we study the active learning problem with multi-level user preferences within the collective matrix factorization (CMF) framework. CMF jointly captures multi-level user preferences with respect to items and relations between items (e.g., book genre, cuisine type), generally resulting in improved predictions. Motivated by finite-sample analysis of the CMF model, we propose a theoretically optimal active learning strategy based on the Fisher information matrix and use this to derive a realizable approximation algorithm for practical recommendations. Experiments are conducted using both the Yelp dataset directly and an illustrative synthetic dataset in the three settings of personalized active learning, cold-start recommendations, and noisy data-demonstrating strong improvements over several widely used active learning methods.
Various approaches have been developed to upper bound the generalization error of a supervised le... more Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm (a.k.a. Gibbs posterior) using symmetrized KL information between the input training samples and the output hypothesis. Our result can be applied to tighten existing expected generalization error and PAC-Bayesian bounds. Our approach is versatile, as it also characterizes the generalization error of the Gibbs algorithm with data-dependent regularizer and that of the Gibbs algorithm in the asymptotic regime, where it converges to the empirical risk minimization algorithm. Of particular relevance, our results highlight the role the symmetrized KL information plays in controlling the genera...
2016 IEEE International Symposium on Information Theory (ISIT), 2016
The problem of estimating the KL divergence between two unknown distributions is studied. The alp... more The problem of estimating the KL divergence between two unknown distributions is studied. The alphabet size k of the distributions can scale to infinity. The estimation is based on m and n independent samples respectively drawn from the two distributions. It is first shown that there does not exist any consistent estimator to guarantee asymptotic small worst-case quadratic risk over the set of all pairs of distributions. A restricted set that contains pairs of distributions with bounded ratio f(k) is further considered. An augmented plug-in estimator is proposed, and is shown to be consistent if and only if m = ω(k ⋁ log2(f(k)) and n = ω(k f(k)). Furthermore, if f(k) ≥ log2k and log2(f(k)) = o(k), it is shown that any consistent estimator must satisfy the necessary conditions: m = ω( k/log k ⋁ log2(f(k)) and n = ω( k f(k)/log k).
2019 53rd Asilomar Conference on Signals, Systems, and Computers, 2019
We consider solving a sequence of machine learning problems that vary in a bounded manner from on... more We consider solving a sequence of machine learning problems that vary in a bounded manner from one time-step to the next. To solve these problems in an accurate and data-efficient way, we propose an active and adaptive learning framework, in which we actively query the labels of the most informative samples from an unlabeled data pool, and adapt to the change by utilizing the information acquired in the previous steps. Our goal is to satisfy a pre-specified bound on the excess risk at each time-step. We first design the active querying algorithm by minimizing the excess risk using stochastic gradient descent in the maximum likelihood estimation setting. Then, we propose a sample size selection rule that minimizes the number of samples by adapting to the change in the learning problems, while satisfying the required bound on excess risk at each time-step. Based on the actively queried samples, we construct an estimator for the change in the learning problems, which we prove to be an asymptotically tight upper bound of its true value. We validate our algorithm and theory through experiments with real data.
To be considered for the 2017 IEEE Jack Keil Wolf ISIT Student Paper Award. We study an outlying ... more To be considered for the 2017 IEEE Jack Keil Wolf ISIT Student Paper Award. We study an outlying sequence detection problem, in which there are M sequences of samples out of which a small subset of outliers needs to be detected. A sequence is considered an outlier if the observations therein are generated by a distribution different from those generating the observations in the majority of sequences. In the universal setting, the goal is to identify all the outliers without any knowledge about the underlying generating distributions. In prior work, this problem was studied as a universal hypothesis testing problem, and a generalized likelihood (GL) test with high computational complexity was constructed and its asymptotic performance characterized. we novelly propose a test based on distribution clustering. Such a test is shown to be exponentially consistent and the time complexity is linear in the total number of sequences. Furthermore, our tests based on clustering are applicable ...
ArXiv, 2018
A framework is introduced for actively and adaptively solving a sequence of machine learning prob... more A framework is introduced for actively and adaptively solving a sequence of machine learning problems, which are changing in bounded manner from one time step to the next. An algorithm is developed that actively queries the labels of the most informative samples from an unlabeled data pool, and that adapts to the change by utilizing the information acquired in the previous steps. Our analysis shows that the proposed active learning algorithm based on stochastic gradient descent achieves a near-optimal excess risk performance for maximum likelihood estimation. Furthermore, an estimator of the change in the learning problems using the active learning samples is constructed, which provides an adaptive sample size selection rule that guarantees the excess risk is bounded for sufficiently large number of time steps. Experiments with synthetic and real data are presented to validate our algorithm and theoretical results.