Seungjin Choi - Academia.edu (original) (raw)
Papers by Seungjin Choi
arXiv (Cornell University), Jan 29, 2015
arXiv (Cornell University), Feb 21, 2022
Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501)
Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on
This paper addresses local stability analysis for the exible independent component analysis (ICA)... more This paper addresses local stability analysis for the exible independent component analysis (ICA) algorithm 6] where the generalized Gaussian density m o d e l w as employed for blind separation of mixtures of sub-and super-Gaussian sources. In the exible ICA algorithm, the shape of nonlinear function in the learning algorithm varies depending on the Gaussian exponent which is properly selected according to the kurtosis of estimated source. In the framework of the natural gradient in Stiefel manifold, the exible ICA algorithm is revisited and some new results about its local stability analysis are presented.
2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
In this paper we introduce a new error measure, integrated reconstruction error (IRE), the minimi... more In this paper we introduce a new error measure, integrated reconstruction error (IRE), the minimization of which leads to principal eigenvectors (without rotational ambiguity) of the data covariance matrix. Then we present iterative algorithms for the IRE minimization, through the projection approximation. The proposed algorithm is referred to as COnstrained Projection Approximation (COPA) algorithm and its limiting case is called COPAL. We also discuss regularized algorithms, referred to as R-COPA and R-COPAL. Numerical experiments demonstrate that these algorithms successfully find exact principal eigenvectors of the data covariance matrix.
2011 International Workshop on Pattern Recognition in NeuroImaging, 2011
Common spatial pattern (CSP) is a widely-used feature extraction method for electroencephalogram ... more Common spatial pattern (CSP) is a widely-used feature extraction method for electroencephalogram (EEG) classification and corresponding probabilistic models were recently developed, adopting a linear generative model for each class. These models are trained on a subject-by-subject basis so that inter-subject information is neglected. Moreover when only a few training samples are available for each subject, the performance is degraded. In this paper we employ Bayesian multi-task learning so that subject-to-subject information is transferred in learning the model for a subject of interest. We present two probabilistic models where precision parameters of multivariate or matrix-variate Gaussian prior for the dictionary are shared across subjects. Numerical experiments on the BCI competition IV 2a dataset confirm that our methods improve classification performance over the standard CSP (on a subject-by-subject basis), especially in the case of subjects with fewer number of training samples.
Computer Vision – ECCV 2012, 2012
Learning to hash involves learning hash functions from a set of images for embedding high-dimensi... more Learning to hash involves learning hash functions from a set of images for embedding high-dimensional visual descriptors into a similarity-preserving low-dimensional Hamming space. Most of existing methods resort to a single representation of images, that is, only one type of visual descriptors is used to learn a hash function to assign binary codes to images. However, images are often described by multiple different visual descriptors (such as SIFT, GIST, HOG), so it is desirable to incorporate these multiple representations into learning a hash function, leading to multi-view hashing. In this paper we present a sequential spectral learning approach to multi-view hashing where a hash function is sequentially determined by solving the successive maximization of local variances subject to decorrelation constraints. We compute multi-view local variances by α-averaging view-specific distance matrices such that the best averaged distance matrix is determined by minimizing its α-divergence from view-specific distance matrices. We also present a scalable implementation, exploiting a fast approximate k-NN graph construction method, in which α-averaged distances computed in small partitions determined by recursive spectral bisection are gradually merged in conquer steps until whole examples are used. Numerical experiments on Caltech-256, CIFAR-20, and NUS-WIDE datasets confirm the high performance of our method, in comparison to single-view spectral hashing as well as existing multi-view hashing methods.
2012 IEEE 12th International Conference on Data Mining, 2012
Hashing, which involves learning binary codes to embed high-dimensional data into a similarity-pr... more Hashing, which involves learning binary codes to embed high-dimensional data into a similarity-preserving low-dimensional Hamming space, is often formulated as linear dimensionality reduction followed by binary quantization. Linear dimensionality reduction, based on maximum variance formulation, requires leading eigenvectors of data covariance or graph Laplacian matrix. Computing leading singular vectors or eigenvectors in the case of high-dimension and large sample size, is a main bottleneck in most of data-driven hashing methods. In this paper we address the use of generalized Nyström method where a subset of rows and columns are used to approximately compute leading singular vectors of the data matrix, in order to improve the scalability of hashing methods in the case of high-dimensional data with large sample size. Especially we validate the useful behavior of generalized Nyström approximation with uniform sampling, in the case of a recentlydeveloped hashing method based on principal component analysis (PCA) followed by an iterative quantization, referred to as PCA+ITQ, developed by Gong and Lazebnik. We compare the performance of generalized Nyström approximation with uniform and non-uniform sampling, to the full singular value decomposition (SVD) method, confirming that the uniform sampling improves the computational and space complexities dramatically, while the performance is not much sacrificed. In addition we present low-rank approximation error bounds for generalized Nyström approximation with uniform sampling, which is not a trivial extension of available results on the nonuniform sampling case.
2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013
Multi-view hashing seeks compact integrated binary codes which preserve similarities averaged ove... more Multi-view hashing seeks compact integrated binary codes which preserve similarities averaged over multiple representations of objects. Most of existing multi-view hashing methods resort to linear hash functions where data manifold is not considered. In this paper we present multi-view anchor graph hashing (MVAGH), where nonlinear integrated binary codes are efficiently determined by a subset of eigenvectors of an averaged similarity matrix. The efficiency behind MVAGH is due to a low-rank form of the averaged similarity matrix induced by multi-view anchor graph, where the similarity between two points is measured by two-step transition probability through view-specific anchor (i.e. landmark) points. In addition, we observe that MVAGH suffers from the performance degradation when the high recall is required. To overcome this drawback, we propose a simple heuristic to combine MVAGH with locality sensitive hashing (LSH). Numerical experiments on CIFAR-10 dataset confirms that MVAGH(+LSH) outperforms the existing multi-and single-view hashing methods.
2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2012
2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010
Sensory data integration is an important task in human brain for multimodal processing as well as... more Sensory data integration is an important task in human brain for multimodal processing as well as in machine learning for multisensor processing. α-integration was proposed by Amari as a principled way of blending multiple positive measures (e.g., stochastic models in the form of probability distributions), providing an optimal integration in the sense of minimizing the α-divergence. It also encompasses existing integration methods as its special case, e.g., weighted average and exponential mixture. In α-integration, the value of α determines the characteristics of the integration and the weight vector w assigns the degree of importance to each measure. In most of the existing work, however, α and w are given in advance rather than learned. In this paper we present two algorithms, for learning α and w from data when only a few integrated target values are available. Numerical experiments on synthetic as well as real-world data confirm the proposed method's effectiveness.
Lecture Notes in Computer Science
2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013
The 2006 IEEE International Joint Conference on Neural Network Proceedings
In this paper we present a method of sequential speech enhancement, where we infer clean speech s... more In this paper we present a method of sequential speech enhancement, where we infer clean speech signal using a Rao-Blackwellized particle filter (RBPF), given a noisecontaminated observed signal. In contrast to Kalman filteringbased methods, we consider a non-Gaussian speech generative model that is based on the generalized auto-regressive (GAR) model. Model parameters are learned by sequential expectation maximization, incorporating the RBPF. Empirical comparison to Kalman filter, confirms the high performance of the proposed method.
Lecture Notes in Computer Science, 2013
Lecture Notes in Computer Science, 2012
2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)
arXiv (Cornell University), Jan 29, 2015
arXiv (Cornell University), Feb 21, 2022
Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501)
Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on
This paper addresses local stability analysis for the exible independent component analysis (ICA)... more This paper addresses local stability analysis for the exible independent component analysis (ICA) algorithm 6] where the generalized Gaussian density m o d e l w as employed for blind separation of mixtures of sub-and super-Gaussian sources. In the exible ICA algorithm, the shape of nonlinear function in the learning algorithm varies depending on the Gaussian exponent which is properly selected according to the kurtosis of estimated source. In the framework of the natural gradient in Stiefel manifold, the exible ICA algorithm is revisited and some new results about its local stability analysis are presented.
2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
In this paper we introduce a new error measure, integrated reconstruction error (IRE), the minimi... more In this paper we introduce a new error measure, integrated reconstruction error (IRE), the minimization of which leads to principal eigenvectors (without rotational ambiguity) of the data covariance matrix. Then we present iterative algorithms for the IRE minimization, through the projection approximation. The proposed algorithm is referred to as COnstrained Projection Approximation (COPA) algorithm and its limiting case is called COPAL. We also discuss regularized algorithms, referred to as R-COPA and R-COPAL. Numerical experiments demonstrate that these algorithms successfully find exact principal eigenvectors of the data covariance matrix.
2011 International Workshop on Pattern Recognition in NeuroImaging, 2011
Common spatial pattern (CSP) is a widely-used feature extraction method for electroencephalogram ... more Common spatial pattern (CSP) is a widely-used feature extraction method for electroencephalogram (EEG) classification and corresponding probabilistic models were recently developed, adopting a linear generative model for each class. These models are trained on a subject-by-subject basis so that inter-subject information is neglected. Moreover when only a few training samples are available for each subject, the performance is degraded. In this paper we employ Bayesian multi-task learning so that subject-to-subject information is transferred in learning the model for a subject of interest. We present two probabilistic models where precision parameters of multivariate or matrix-variate Gaussian prior for the dictionary are shared across subjects. Numerical experiments on the BCI competition IV 2a dataset confirm that our methods improve classification performance over the standard CSP (on a subject-by-subject basis), especially in the case of subjects with fewer number of training samples.
Computer Vision – ECCV 2012, 2012
Learning to hash involves learning hash functions from a set of images for embedding high-dimensi... more Learning to hash involves learning hash functions from a set of images for embedding high-dimensional visual descriptors into a similarity-preserving low-dimensional Hamming space. Most of existing methods resort to a single representation of images, that is, only one type of visual descriptors is used to learn a hash function to assign binary codes to images. However, images are often described by multiple different visual descriptors (such as SIFT, GIST, HOG), so it is desirable to incorporate these multiple representations into learning a hash function, leading to multi-view hashing. In this paper we present a sequential spectral learning approach to multi-view hashing where a hash function is sequentially determined by solving the successive maximization of local variances subject to decorrelation constraints. We compute multi-view local variances by α-averaging view-specific distance matrices such that the best averaged distance matrix is determined by minimizing its α-divergence from view-specific distance matrices. We also present a scalable implementation, exploiting a fast approximate k-NN graph construction method, in which α-averaged distances computed in small partitions determined by recursive spectral bisection are gradually merged in conquer steps until whole examples are used. Numerical experiments on Caltech-256, CIFAR-20, and NUS-WIDE datasets confirm the high performance of our method, in comparison to single-view spectral hashing as well as existing multi-view hashing methods.
2012 IEEE 12th International Conference on Data Mining, 2012
Hashing, which involves learning binary codes to embed high-dimensional data into a similarity-pr... more Hashing, which involves learning binary codes to embed high-dimensional data into a similarity-preserving low-dimensional Hamming space, is often formulated as linear dimensionality reduction followed by binary quantization. Linear dimensionality reduction, based on maximum variance formulation, requires leading eigenvectors of data covariance or graph Laplacian matrix. Computing leading singular vectors or eigenvectors in the case of high-dimension and large sample size, is a main bottleneck in most of data-driven hashing methods. In this paper we address the use of generalized Nyström method where a subset of rows and columns are used to approximately compute leading singular vectors of the data matrix, in order to improve the scalability of hashing methods in the case of high-dimensional data with large sample size. Especially we validate the useful behavior of generalized Nyström approximation with uniform sampling, in the case of a recentlydeveloped hashing method based on principal component analysis (PCA) followed by an iterative quantization, referred to as PCA+ITQ, developed by Gong and Lazebnik. We compare the performance of generalized Nyström approximation with uniform and non-uniform sampling, to the full singular value decomposition (SVD) method, confirming that the uniform sampling improves the computational and space complexities dramatically, while the performance is not much sacrificed. In addition we present low-rank approximation error bounds for generalized Nyström approximation with uniform sampling, which is not a trivial extension of available results on the nonuniform sampling case.
2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013
Multi-view hashing seeks compact integrated binary codes which preserve similarities averaged ove... more Multi-view hashing seeks compact integrated binary codes which preserve similarities averaged over multiple representations of objects. Most of existing multi-view hashing methods resort to linear hash functions where data manifold is not considered. In this paper we present multi-view anchor graph hashing (MVAGH), where nonlinear integrated binary codes are efficiently determined by a subset of eigenvectors of an averaged similarity matrix. The efficiency behind MVAGH is due to a low-rank form of the averaged similarity matrix induced by multi-view anchor graph, where the similarity between two points is measured by two-step transition probability through view-specific anchor (i.e. landmark) points. In addition, we observe that MVAGH suffers from the performance degradation when the high recall is required. To overcome this drawback, we propose a simple heuristic to combine MVAGH with locality sensitive hashing (LSH). Numerical experiments on CIFAR-10 dataset confirms that MVAGH(+LSH) outperforms the existing multi-and single-view hashing methods.
2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2012
2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010
Sensory data integration is an important task in human brain for multimodal processing as well as... more Sensory data integration is an important task in human brain for multimodal processing as well as in machine learning for multisensor processing. α-integration was proposed by Amari as a principled way of blending multiple positive measures (e.g., stochastic models in the form of probability distributions), providing an optimal integration in the sense of minimizing the α-divergence. It also encompasses existing integration methods as its special case, e.g., weighted average and exponential mixture. In α-integration, the value of α determines the characteristics of the integration and the weight vector w assigns the degree of importance to each measure. In most of the existing work, however, α and w are given in advance rather than learned. In this paper we present two algorithms, for learning α and w from data when only a few integrated target values are available. Numerical experiments on synthetic as well as real-world data confirm the proposed method's effectiveness.
Lecture Notes in Computer Science
2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013
The 2006 IEEE International Joint Conference on Neural Network Proceedings
In this paper we present a method of sequential speech enhancement, where we infer clean speech s... more In this paper we present a method of sequential speech enhancement, where we infer clean speech signal using a Rao-Blackwellized particle filter (RBPF), given a noisecontaminated observed signal. In contrast to Kalman filteringbased methods, we consider a non-Gaussian speech generative model that is based on the generalized auto-regressive (GAR) model. Model parameters are learned by sequential expectation maximization, incorporating the RBPF. Empirical comparison to Kalman filter, confirms the high performance of the proposed method.
Lecture Notes in Computer Science, 2013
Lecture Notes in Computer Science, 2012
2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)