Seungjin Choi - Academia.edu (original) (raw)

Papers by Seungjin Choi

Research paper thumbnail of Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility

arXiv (Cornell University), Jan 29, 2015

Research paper thumbnail of On Uncertainty Estimation by Tree-based Surrogate Models in Sequential Model-based Optimization

arXiv (Cornell University), Feb 21, 2022

Research paper thumbnail of Blind separation of nonstationary and temporally correlated sources from noisy mixtures

Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501)

Research paper thumbnail of Local Stability Analysis Of Flexible Independent Component Analysis Algorithm

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

This paper addresses local stability analysis for the exible independent component analysis (ICA)... more This paper addresses local stability analysis for the exible independent component analysis (ICA) algorithm 6] where the generalized Gaussian density m o d e l w as employed for blind separation of mixtures of sub-and super-Gaussian sources. In the exible ICA algorithm, the shape of nonlinear function in the learning algorithm varies depending on the Gaussian exponent which is properly selected according to the kurtosis of estimated source. In the framework of the natural gradient in Stiefel manifold, the exible ICA algorithm is revisited and some new results about its local stability analysis are presented.

Research paper thumbnail of Topographic Independent Component Analysis of Gene Expression Time Series Data

Research paper thumbnail of Iterative Projection Approximation Algorithms for PCA

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings

In this paper we introduce a new error measure, integrated reconstruction error (IRE), the minimi... more In this paper we introduce a new error measure, integrated reconstruction error (IRE), the minimization of which leads to principal eigenvectors (without rotational ambiguity) of the data covariance matrix. Then we present iterative algorithms for the IRE minimization, through the projection approximation. The proposed algorithm is referred to as COnstrained Projection Approximation (COPA) algorithm and its limiting case is called COPAL. We also discuss regularized algorithms, referred to as R-COPA and R-COPAL. Numerical experiments demonstrate that these algorithms successfully find exact principal eigenvectors of the data covariance matrix.

Research paper thumbnail of Bayesian Multi-task Learning for Common Spatial Patterns

2011 International Workshop on Pattern Recognition in NeuroImaging, 2011

Common spatial pattern (CSP) is a widely-used feature extraction method for electroencephalogram ... more Common spatial pattern (CSP) is a widely-used feature extraction method for electroencephalogram (EEG) classification and corresponding probabilistic models were recently developed, adopting a linear generative model for each class. These models are trained on a subject-by-subject basis so that inter-subject information is neglected. Moreover when only a few training samples are available for each subject, the performance is degraded. In this paper we employ Bayesian multi-task learning so that subject-to-subject information is transferred in learning the model for a subject of interest. We present two probabilistic models where precision parameters of multivariate or matrix-variate Gaussian prior for the dictionary are shared across subjects. Numerical experiments on the BCI competition IV 2a dataset confirm that our methods improve classification performance over the standard CSP (on a subject-by-subject basis), especially in the case of subjects with fewer number of training samples.

Research paper thumbnail of Sequential Spectral Learning to Hash with Multiple Representations

Computer Vision – ECCV 2012, 2012

Learning to hash involves learning hash functions from a set of images for embedding high-dimensi... more Learning to hash involves learning hash functions from a set of images for embedding high-dimensional visual descriptors into a similarity-preserving low-dimensional Hamming space. Most of existing methods resort to a single representation of images, that is, only one type of visual descriptors is used to learn a hash function to assign binary codes to images. However, images are often described by multiple different visual descriptors (such as SIFT, GIST, HOG), so it is desirable to incorporate these multiple representations into learning a hash function, leading to multi-view hashing. In this paper we present a sequential spectral learning approach to multi-view hashing where a hash function is sequentially determined by solving the successive maximization of local variances subject to decorrelation constraints. We compute multi-view local variances by α-averaging view-specific distance matrices such that the best averaged distance matrix is determined by minimizing its α-divergence from view-specific distance matrices. We also present a scalable implementation, exploiting a fast approximate k-NN graph construction method, in which α-averaged distances computed in small partitions determined by recursive spectral bisection are gradually merged in conquer steps until whole examples are used. Numerical experiments on Caltech-256, CIFAR-20, and NUS-WIDE datasets confirm the high performance of our method, in comparison to single-view spectral hashing as well as existing multi-view hashing methods.

Research paper thumbnail of Hashing with Generalized Nyström Approximation

2012 IEEE 12th International Conference on Data Mining, 2012

Hashing, which involves learning binary codes to embed high-dimensional data into a similarity-pr... more Hashing, which involves learning binary codes to embed high-dimensional data into a similarity-preserving low-dimensional Hamming space, is often formulated as linear dimensionality reduction followed by binary quantization. Linear dimensionality reduction, based on maximum variance formulation, requires leading eigenvectors of data covariance or graph Laplacian matrix. Computing leading singular vectors or eigenvectors in the case of high-dimension and large sample size, is a main bottleneck in most of data-driven hashing methods. In this paper we address the use of generalized Nyström method where a subset of rows and columns are used to approximately compute leading singular vectors of the data matrix, in order to improve the scalability of hashing methods in the case of high-dimensional data with large sample size. Especially we validate the useful behavior of generalized Nyström approximation with uniform sampling, in the case of a recentlydeveloped hashing method based on principal component analysis (PCA) followed by an iterative quantization, referred to as PCA+ITQ, developed by Gong and Lazebnik. We compare the performance of generalized Nyström approximation with uniform and non-uniform sampling, to the full singular value decomposition (SVD) method, confirming that the uniform sampling improves the computational and space complexities dramatically, while the performance is not much sacrificed. In addition we present low-rank approximation error bounds for generalized Nyström approximation with uniform sampling, which is not a trivial extension of available results on the nonuniform sampling case.

Research paper thumbnail of Multi-view anchor graph hashing

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

Multi-view hashing seeks compact integrated binary codes which preserve similarities averaged ove... more Multi-view hashing seeks compact integrated binary codes which preserve similarities averaged over multiple representations of objects. Most of existing multi-view hashing methods resort to linear hash functions where data manifold is not considered. In this paper we present multi-view anchor graph hashing (MVAGH), where nonlinear integrated binary codes are efficiently determined by a subset of eigenvectors of an averaged similarity matrix. The efficiency behind MVAGH is due to a low-rank form of the averaged similarity matrix induced by multi-view anchor graph, where the similarity between two points is measured by two-step transition probability through view-specific anchor (i.e. landmark) points. In addition, we observe that MVAGH suffers from the performance degradation when the high recall is required. To overcome this drawback, we propose a simple heuristic to combine MVAGH with locality sensitive hashing (LSH). Numerical experiments on CIFAR-10 dataset confirms that MVAGH(+LSH) outperforms the existing multi-and single-view hashing methods.

Research paper thumbnail of Incremental Tree-Based Inference with Dependent Normalized Random Measures

Research paper thumbnail of Clustering sequence sets for motif discovery

Research paper thumbnail of Self-labeling for P300 detection

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2012

Research paper thumbnail of Learning α-integration with partially-labeled data

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010

Sensory data integration is an important task in human brain for multimodal processing as well as... more Sensory data integration is an important task in human brain for multimodal processing as well as in machine learning for multisensor processing. α-integration was proposed by Amari as a principled way of blending multiple positive measures (e.g., stochastic models in the form of probability distributions), providing an optimal integration in the sense of minimizing the α-divergence. It also encompasses existing integration methods as its special case, e.g., weighted average and exponential mixture. In α-integration, the value of α determines the characteristics of the integration and the weight vector w assigns the degree of importance to each measure. In most of the existing work, however, α and w are given in advance rather than learned. In this paper we present two algorithms, for learning α and w from data when only a few integrated target values are available. Numerical experiments on synthetic as well as real-world data confirm the proposed method's effectiveness.

Research paper thumbnail of Source Separation with Gaussian Process Models

Lecture Notes in Computer Science

Research paper thumbnail of Online multi-label learning with accelerated nonsmooth stochastic gradient descent

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

Research paper thumbnail of Rao-Blackwellized Particle Filtering for Sequential Speech Enhancement

The 2006 IEEE International Joint Conference on Neural Network Proceedings

In this paper we present a method of sequential speech enhancement, where we infer clean speech s... more In this paper we present a method of sequential speech enhancement, where we infer clean speech signal using a Rao-Blackwellized particle filter (RBPF), given a noisecontaminated observed signal. In contrast to Kalman filteringbased methods, we consider a non-Gaussian speech generative model that is based on the generalized auto-regressive (GAR) model. Model parameters are learned by sequential expectation maximization, incorporating the RBPF. Empirical comparison to Kalman filter, confirms the high performance of the proposed method.

Research paper thumbnail of Orthogonal Nonnegative Matrix Factorization for Blind Image Separation

Lecture Notes in Computer Science, 2013

Research paper thumbnail of Online Video Segmentation by Bayesian Split-Merge Clustering

Lecture Notes in Computer Science, 2012

Research paper thumbnail of Minimum entropy, k-means, spectral clustering

2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)

Research paper thumbnail of Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility

arXiv (Cornell University), Jan 29, 2015

Research paper thumbnail of On Uncertainty Estimation by Tree-based Surrogate Models in Sequential Model-based Optimization

arXiv (Cornell University), Feb 21, 2022

Research paper thumbnail of Blind separation of nonstationary and temporally correlated sources from noisy mixtures

Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501)

Research paper thumbnail of Local Stability Analysis Of Flexible Independent Component Analysis Algorithm

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

This paper addresses local stability analysis for the exible independent component analysis (ICA)... more This paper addresses local stability analysis for the exible independent component analysis (ICA) algorithm 6] where the generalized Gaussian density m o d e l w as employed for blind separation of mixtures of sub-and super-Gaussian sources. In the exible ICA algorithm, the shape of nonlinear function in the learning algorithm varies depending on the Gaussian exponent which is properly selected according to the kurtosis of estimated source. In the framework of the natural gradient in Stiefel manifold, the exible ICA algorithm is revisited and some new results about its local stability analysis are presented.

Research paper thumbnail of Topographic Independent Component Analysis of Gene Expression Time Series Data

Research paper thumbnail of Iterative Projection Approximation Algorithms for PCA

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings

In this paper we introduce a new error measure, integrated reconstruction error (IRE), the minimi... more In this paper we introduce a new error measure, integrated reconstruction error (IRE), the minimization of which leads to principal eigenvectors (without rotational ambiguity) of the data covariance matrix. Then we present iterative algorithms for the IRE minimization, through the projection approximation. The proposed algorithm is referred to as COnstrained Projection Approximation (COPA) algorithm and its limiting case is called COPAL. We also discuss regularized algorithms, referred to as R-COPA and R-COPAL. Numerical experiments demonstrate that these algorithms successfully find exact principal eigenvectors of the data covariance matrix.

Research paper thumbnail of Bayesian Multi-task Learning for Common Spatial Patterns

2011 International Workshop on Pattern Recognition in NeuroImaging, 2011

Common spatial pattern (CSP) is a widely-used feature extraction method for electroencephalogram ... more Common spatial pattern (CSP) is a widely-used feature extraction method for electroencephalogram (EEG) classification and corresponding probabilistic models were recently developed, adopting a linear generative model for each class. These models are trained on a subject-by-subject basis so that inter-subject information is neglected. Moreover when only a few training samples are available for each subject, the performance is degraded. In this paper we employ Bayesian multi-task learning so that subject-to-subject information is transferred in learning the model for a subject of interest. We present two probabilistic models where precision parameters of multivariate or matrix-variate Gaussian prior for the dictionary are shared across subjects. Numerical experiments on the BCI competition IV 2a dataset confirm that our methods improve classification performance over the standard CSP (on a subject-by-subject basis), especially in the case of subjects with fewer number of training samples.

Research paper thumbnail of Sequential Spectral Learning to Hash with Multiple Representations

Computer Vision – ECCV 2012, 2012

Learning to hash involves learning hash functions from a set of images for embedding high-dimensi... more Learning to hash involves learning hash functions from a set of images for embedding high-dimensional visual descriptors into a similarity-preserving low-dimensional Hamming space. Most of existing methods resort to a single representation of images, that is, only one type of visual descriptors is used to learn a hash function to assign binary codes to images. However, images are often described by multiple different visual descriptors (such as SIFT, GIST, HOG), so it is desirable to incorporate these multiple representations into learning a hash function, leading to multi-view hashing. In this paper we present a sequential spectral learning approach to multi-view hashing where a hash function is sequentially determined by solving the successive maximization of local variances subject to decorrelation constraints. We compute multi-view local variances by α-averaging view-specific distance matrices such that the best averaged distance matrix is determined by minimizing its α-divergence from view-specific distance matrices. We also present a scalable implementation, exploiting a fast approximate k-NN graph construction method, in which α-averaged distances computed in small partitions determined by recursive spectral bisection are gradually merged in conquer steps until whole examples are used. Numerical experiments on Caltech-256, CIFAR-20, and NUS-WIDE datasets confirm the high performance of our method, in comparison to single-view spectral hashing as well as existing multi-view hashing methods.

Research paper thumbnail of Hashing with Generalized Nyström Approximation

2012 IEEE 12th International Conference on Data Mining, 2012

Hashing, which involves learning binary codes to embed high-dimensional data into a similarity-pr... more Hashing, which involves learning binary codes to embed high-dimensional data into a similarity-preserving low-dimensional Hamming space, is often formulated as linear dimensionality reduction followed by binary quantization. Linear dimensionality reduction, based on maximum variance formulation, requires leading eigenvectors of data covariance or graph Laplacian matrix. Computing leading singular vectors or eigenvectors in the case of high-dimension and large sample size, is a main bottleneck in most of data-driven hashing methods. In this paper we address the use of generalized Nyström method where a subset of rows and columns are used to approximately compute leading singular vectors of the data matrix, in order to improve the scalability of hashing methods in the case of high-dimensional data with large sample size. Especially we validate the useful behavior of generalized Nyström approximation with uniform sampling, in the case of a recentlydeveloped hashing method based on principal component analysis (PCA) followed by an iterative quantization, referred to as PCA+ITQ, developed by Gong and Lazebnik. We compare the performance of generalized Nyström approximation with uniform and non-uniform sampling, to the full singular value decomposition (SVD) method, confirming that the uniform sampling improves the computational and space complexities dramatically, while the performance is not much sacrificed. In addition we present low-rank approximation error bounds for generalized Nyström approximation with uniform sampling, which is not a trivial extension of available results on the nonuniform sampling case.

Research paper thumbnail of Multi-view anchor graph hashing

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

Multi-view hashing seeks compact integrated binary codes which preserve similarities averaged ove... more Multi-view hashing seeks compact integrated binary codes which preserve similarities averaged over multiple representations of objects. Most of existing multi-view hashing methods resort to linear hash functions where data manifold is not considered. In this paper we present multi-view anchor graph hashing (MVAGH), where nonlinear integrated binary codes are efficiently determined by a subset of eigenvectors of an averaged similarity matrix. The efficiency behind MVAGH is due to a low-rank form of the averaged similarity matrix induced by multi-view anchor graph, where the similarity between two points is measured by two-step transition probability through view-specific anchor (i.e. landmark) points. In addition, we observe that MVAGH suffers from the performance degradation when the high recall is required. To overcome this drawback, we propose a simple heuristic to combine MVAGH with locality sensitive hashing (LSH). Numerical experiments on CIFAR-10 dataset confirms that MVAGH(+LSH) outperforms the existing multi-and single-view hashing methods.

Research paper thumbnail of Incremental Tree-Based Inference with Dependent Normalized Random Measures

Research paper thumbnail of Clustering sequence sets for motif discovery

Research paper thumbnail of Self-labeling for P300 detection

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2012

Research paper thumbnail of Learning α-integration with partially-labeled data

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010

Sensory data integration is an important task in human brain for multimodal processing as well as... more Sensory data integration is an important task in human brain for multimodal processing as well as in machine learning for multisensor processing. α-integration was proposed by Amari as a principled way of blending multiple positive measures (e.g., stochastic models in the form of probability distributions), providing an optimal integration in the sense of minimizing the α-divergence. It also encompasses existing integration methods as its special case, e.g., weighted average and exponential mixture. In α-integration, the value of α determines the characteristics of the integration and the weight vector w assigns the degree of importance to each measure. In most of the existing work, however, α and w are given in advance rather than learned. In this paper we present two algorithms, for learning α and w from data when only a few integrated target values are available. Numerical experiments on synthetic as well as real-world data confirm the proposed method's effectiveness.

Research paper thumbnail of Source Separation with Gaussian Process Models

Lecture Notes in Computer Science

Research paper thumbnail of Online multi-label learning with accelerated nonsmooth stochastic gradient descent

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

Research paper thumbnail of Rao-Blackwellized Particle Filtering for Sequential Speech Enhancement

The 2006 IEEE International Joint Conference on Neural Network Proceedings

In this paper we present a method of sequential speech enhancement, where we infer clean speech s... more In this paper we present a method of sequential speech enhancement, where we infer clean speech signal using a Rao-Blackwellized particle filter (RBPF), given a noisecontaminated observed signal. In contrast to Kalman filteringbased methods, we consider a non-Gaussian speech generative model that is based on the generalized auto-regressive (GAR) model. Model parameters are learned by sequential expectation maximization, incorporating the RBPF. Empirical comparison to Kalman filter, confirms the high performance of the proposed method.

Research paper thumbnail of Orthogonal Nonnegative Matrix Factorization for Blind Image Separation

Lecture Notes in Computer Science, 2013

Research paper thumbnail of Online Video Segmentation by Bayesian Split-Merge Clustering

Lecture Notes in Computer Science, 2012

Research paper thumbnail of Minimum entropy, k-means, spectral clustering

2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)