Guy Lebanon - Academia.edu (original) (raw)

Papers by Guy Lebanon

Research paper thumbnail of Fast bregman divergence NMF using taylor expansion and coordinate descent

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12, 2012

Non-negative matrix factorization (NMF) provides a lower rank approximation of a matrix. Due to n... more Non-negative matrix factorization (NMF) provides a lower rank approximation of a matrix. Due to nonnegativity imposed on the factors, it gives a latent structure that is often more physically meaningful than other lower rank approximations such as singular value decomposition (SVD). Most of the algorithms proposed in literature for NMF have been based on minimizing the Frobenius norm. This is partly due to the fact that the minimization problem based on the Frobenius norm provides much more flexibility in algebraic manipulation than other divergences. In this paper we propose a fast NMF algorithm that is applicable to general Bregman divergences. Through Taylor series expansion of the Bregman divergences, we reveal a relationship between Bregman divergences and Euclidean distance. This key relationship provides a new direction for NMF algorithms with general Bregman divergences when combined with the scalar block coordinate descent method. The proposed algorithm generalizes several recently proposed methods for computation of NMF with Bregman divergences and is computationally faster than existing alternatives. We demonstrate the effectiveness of our approach with experiments conducted on artificial as well as real world data.

Research paper thumbnail of Local likelihood modeling of temporal text streams

Proceedings of the 25th international conference on Machine learning - ICML '08, 2008

... More recently, Forman (2006) studied the concept drift phenomenon in the context of infor-mat... more ... More recently, Forman (2006) studied the concept drift phenomenon in the context of infor-mation retrieval in large textual databases. Sharan and Neville (2007) consider the modeling of temporal changes in relational databases and its application to text classification. ...

Research paper thumbnail of Determining Placement of Intrusion Detectors for a Distributed Application through Bayesian Network Modeling

Lecture Notes in Computer Science, 2008

To secure today's computer systems, it is critical to have different... more To secure today's computer systems, it is critical to have different intrusion detection sensors embedded in them. The complexity of distributed computer systems makes it difficult to determine the appropriate configuration of these detectors, ie, their choice and placement. In this paper, we describe a method to evaluate the effect of the detector configuration on the accuracy and precision of determining security goals in the system. For this, we develop a Bayesian network model for the distributed system, from an attack graph representation of ...

Research paper thumbnail of A kernel smoothing approach to censored preference data

Contemporary Mathematics, 2010

ABSTRACT Many real world applications produce ranked data which are partially missing or tied. He... more ABSTRACT Many real world applications produce ranked data which are partially missing or tied. Heterogeneous patterns of ties and missing values require the development of statistically sound techniques for capturing information in varied ranking types. We examine the application of kernel smoothing to such data, with a kernel that is the discrete analogue of the triangular kernel on the real line. We demonstrate the use of generating functions and an asymptotic approximation in computing the kernel smoothing estimator for ranked data with arbitrary missing values and tie structure.

Research paper thumbnail of A Linear Approximation to the chi^2 Kernel with Geometric Convergence

We propose a new analytical approximation to the chi2\chi^2chi2 kernel that converges geometrically. Th... more We propose a new analytical approximation to the chi2\chi^2chi2 kernel that converges geometrically. The analytical approximation is derived with elementary methods and adapts to the input distribution for optimal convergence rate. Experiments show the new approximation leads to improved performance in image classification and semantic segmentation tasks using a random Fourier feature approximation of the exp−chi2\exp-\chi^2expchi2 kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. Experiments conducted on the PASCAL VOC 2010 segmentation and the ImageNet ILSVRC 2010 datasets show statistically significant improvements over alternative approximation methods.

Research paper thumbnail of Beyond Sentiment: The Manifold of Human Emotions

Sentiment analysis predicts the presence of positive or negative emotions in a text document. In ... more Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.

Research paper thumbnail of Beyond

Beyond appeared in Church of Totem, an exhibit by Totem Shriver, at the Linfield Gallery at Linfi... more Beyond appeared in Church of Totem, an exhibit by Totem Shriver, at the Linfield Gallery at Linfield College in McMinnville, Oregon. This pen and ink drawing with acrylic paint measures 20 inches by 18 inches.

Research paper thumbnail of Asymptotic Analysis of Generative Semi-Supervised Learning

Computing Research Repository, 2010

Semisupervised learning has emerged as a popular framework for improving modeling accuracy while ... more Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semi-supervised learning. In doing so, we complement distribution-free analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of

Research paper thumbnail of Stochastic Composite Likelihood

Journal of Machine Learning Research, 2010

Research paper thumbnail of Beyond k-Anonymity: A Decision Theoretic Framework for Assessing Privacy Risk

Lecture Notes in Computer Science, 2006

An important issue any organization or individual has to face when managing data containing sensi... more An important issue any organization or individual has to face when managing data containing sensitive information, is the risk that can be incurred when releasing such data. Even though data may be sanitized, before being released, it is still possible for an adversary to reconstruct the original data by using additional information that may be available, for example, from other data sources. To date, however, no comprehensive approach exists to quantify such risks. In this paper we develop a framework, based on statistical decision theory, to assess the relationship between the disclosed data and the resulting privacy risk. We relate our framework with the k-anonymity disclosure method; we make the assumptions behind k-anonymity explicit, quantify them, and extend them in several natural directions.

Research paper thumbnail of Composite Statistical Inference for Semantic Segmentation

2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

In this paper we present an inference procedure for the semantic segmentation of images. Differen... more In this paper we present an inference procedure for the semantic segmentation of images. Different from many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals and the objects present in the image. We define continuous latent variables on superpixels obtained by multiple intersections of segments, then output the optimal segments from the inferred superpixel statistics. The algorithm can recombine and refine initial mid-level proposals, as well as handle multiple interacting objects, even from the same class, all in a consistent joint inference framework by maximizing the composite likelihood of the underlying statistical model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains high accuracy and successfully handles images of complex object interactions.

Research paper thumbnail of Chebyshev approximations to the histogram χ<sup>2</sup> kernel

2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

ABSTRACT The random Fourier embedding methodology can be used to approximate the performance of n... more ABSTRACT The random Fourier embedding methodology can be used to approximate the performance of non-linear kernel classifiers in linear time on the number of training examples. However, there still exists a non-trivial performance gap between the approximation and the nonlinear models, especially for the exponential χ2 kernel, one of the most powerful models for histograms. Based on analogies with Chebyshev polynomials, we propose an asymptotically convergent analytic series of the χ2 measure. The new series removes the need to use periodic approximations to the χ2 function, as typical in previous methods, and improves the classification accuracy when used in the random Fourier approximation of the exponential χ2 kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. The proposed approaches are tested on the PASCAL VOC 2010 segmentation and the ImageNet ILSVR-C 2010 datasets, and shown to give statistically significant improvements over alternative approximation methods.

Research paper thumbnail of Hyperplane margin classifiers on the multinomial manifold

Twenty-first international conference on Machine learning - ICML '04, 2004

Research paper thumbnail of Learning multiple-question decision trees for cold-start recommendation

Proceedings of the sixth ACM international conference on Web search and data mining - WSDM '13, 2013

ABSTRACT For cold-start recommendation, it is important to rapidly profile new users and generate... more ABSTRACT For cold-start recommendation, it is important to rapidly profile new users and generate a good initial set of recommendations through an interview process --- users should be queried adaptively in a sequential fashion, and multiple items should be offered for opinion solicitation at each trial. In this work, we propose a novel algorithm that learns to conduct the interview process guided by a decision tree with multiple questions at each split. The splits, represented as sparse weight vectors, are learned through an L_1-constrained optimization framework. The users are directed to child nodes according to the inner product of their responses and the corresponding weight vector. More importantly, to account for the variety of responses coming to a node, a linear regressor is learned within each node using all the previously obtained answers as input to predict item ratings. A user study, preliminary but first in its kind in cold-start recommendation, is conducted to explore the efficient number and format of questions being asked in a recommendation survey to minimize user cognitive efforts. Quantitative experimental validations also show that the proposed algorithm outperforms state-of-the-art approaches in terms of both the prediction accuracy and user cognitive efforts.

Research paper thumbnail of CSI: Composite Statistical Inference for Semantic Segmentation

Research paper thumbnail of CSI: Composite Statistical Inference Techniques for Semantic Segmentation

Research paper thumbnail of Linear Regression

Research paper thumbnail of Unsupervised Supervised Learning II: Training Margin Based Classifiers without Labels

Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are tra... more Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel technique for estimating such ...

Research paper thumbnail of Local Space-Time Smoothing for Versioned Documents

Research paper thumbnail of Proceedings of the 21st ACM international conference on Information and knowledge management

Research paper thumbnail of Fast bregman divergence NMF using taylor expansion and coordinate descent

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12, 2012

Non-negative matrix factorization (NMF) provides a lower rank approximation of a matrix. Due to n... more Non-negative matrix factorization (NMF) provides a lower rank approximation of a matrix. Due to nonnegativity imposed on the factors, it gives a latent structure that is often more physically meaningful than other lower rank approximations such as singular value decomposition (SVD). Most of the algorithms proposed in literature for NMF have been based on minimizing the Frobenius norm. This is partly due to the fact that the minimization problem based on the Frobenius norm provides much more flexibility in algebraic manipulation than other divergences. In this paper we propose a fast NMF algorithm that is applicable to general Bregman divergences. Through Taylor series expansion of the Bregman divergences, we reveal a relationship between Bregman divergences and Euclidean distance. This key relationship provides a new direction for NMF algorithms with general Bregman divergences when combined with the scalar block coordinate descent method. The proposed algorithm generalizes several recently proposed methods for computation of NMF with Bregman divergences and is computationally faster than existing alternatives. We demonstrate the effectiveness of our approach with experiments conducted on artificial as well as real world data.

Research paper thumbnail of Local likelihood modeling of temporal text streams

Proceedings of the 25th international conference on Machine learning - ICML '08, 2008

... More recently, Forman (2006) studied the concept drift phenomenon in the context of infor-mat... more ... More recently, Forman (2006) studied the concept drift phenomenon in the context of infor-mation retrieval in large textual databases. Sharan and Neville (2007) consider the modeling of temporal changes in relational databases and its application to text classification. ...

Research paper thumbnail of Determining Placement of Intrusion Detectors for a Distributed Application through Bayesian Network Modeling

Lecture Notes in Computer Science, 2008

To secure today&amp;amp;amp;amp;amp;#x27;s computer systems, it is critical to have different... more To secure today&amp;amp;amp;amp;amp;#x27;s computer systems, it is critical to have different intrusion detection sensors embedded in them. The complexity of distributed computer systems makes it difficult to determine the appropriate configuration of these detectors, ie, their choice and placement. In this paper, we describe a method to evaluate the effect of the detector configuration on the accuracy and precision of determining security goals in the system. For this, we develop a Bayesian network model for the distributed system, from an attack graph representation of ...

Research paper thumbnail of A kernel smoothing approach to censored preference data

Contemporary Mathematics, 2010

ABSTRACT Many real world applications produce ranked data which are partially missing or tied. He... more ABSTRACT Many real world applications produce ranked data which are partially missing or tied. Heterogeneous patterns of ties and missing values require the development of statistically sound techniques for capturing information in varied ranking types. We examine the application of kernel smoothing to such data, with a kernel that is the discrete analogue of the triangular kernel on the real line. We demonstrate the use of generating functions and an asymptotic approximation in computing the kernel smoothing estimator for ranked data with arbitrary missing values and tie structure.

Research paper thumbnail of A Linear Approximation to the chi^2 Kernel with Geometric Convergence

We propose a new analytical approximation to the chi2\chi^2chi2 kernel that converges geometrically. Th... more We propose a new analytical approximation to the chi2\chi^2chi2 kernel that converges geometrically. The analytical approximation is derived with elementary methods and adapts to the input distribution for optimal convergence rate. Experiments show the new approximation leads to improved performance in image classification and semantic segmentation tasks using a random Fourier feature approximation of the exp−chi2\exp-\chi^2expchi2 kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. Experiments conducted on the PASCAL VOC 2010 segmentation and the ImageNet ILSVRC 2010 datasets show statistically significant improvements over alternative approximation methods.

Research paper thumbnail of Beyond Sentiment: The Manifold of Human Emotions

Sentiment analysis predicts the presence of positive or negative emotions in a text document. In ... more Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.

Research paper thumbnail of Beyond

Beyond appeared in Church of Totem, an exhibit by Totem Shriver, at the Linfield Gallery at Linfi... more Beyond appeared in Church of Totem, an exhibit by Totem Shriver, at the Linfield Gallery at Linfield College in McMinnville, Oregon. This pen and ink drawing with acrylic paint measures 20 inches by 18 inches.

Research paper thumbnail of Asymptotic Analysis of Generative Semi-Supervised Learning

Computing Research Repository, 2010

Semisupervised learning has emerged as a popular framework for improving modeling accuracy while ... more Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semi-supervised learning. In doing so, we complement distribution-free analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of

Research paper thumbnail of Stochastic Composite Likelihood

Journal of Machine Learning Research, 2010

Research paper thumbnail of Beyond k-Anonymity: A Decision Theoretic Framework for Assessing Privacy Risk

Lecture Notes in Computer Science, 2006

An important issue any organization or individual has to face when managing data containing sensi... more An important issue any organization or individual has to face when managing data containing sensitive information, is the risk that can be incurred when releasing such data. Even though data may be sanitized, before being released, it is still possible for an adversary to reconstruct the original data by using additional information that may be available, for example, from other data sources. To date, however, no comprehensive approach exists to quantify such risks. In this paper we develop a framework, based on statistical decision theory, to assess the relationship between the disclosed data and the resulting privacy risk. We relate our framework with the k-anonymity disclosure method; we make the assumptions behind k-anonymity explicit, quantify them, and extend them in several natural directions.

Research paper thumbnail of Composite Statistical Inference for Semantic Segmentation

2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

In this paper we present an inference procedure for the semantic segmentation of images. Differen... more In this paper we present an inference procedure for the semantic segmentation of images. Different from many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals and the objects present in the image. We define continuous latent variables on superpixels obtained by multiple intersections of segments, then output the optimal segments from the inferred superpixel statistics. The algorithm can recombine and refine initial mid-level proposals, as well as handle multiple interacting objects, even from the same class, all in a consistent joint inference framework by maximizing the composite likelihood of the underlying statistical model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains high accuracy and successfully handles images of complex object interactions.

Research paper thumbnail of Chebyshev approximations to the histogram χ<sup>2</sup> kernel

2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

ABSTRACT The random Fourier embedding methodology can be used to approximate the performance of n... more ABSTRACT The random Fourier embedding methodology can be used to approximate the performance of non-linear kernel classifiers in linear time on the number of training examples. However, there still exists a non-trivial performance gap between the approximation and the nonlinear models, especially for the exponential χ2 kernel, one of the most powerful models for histograms. Based on analogies with Chebyshev polynomials, we propose an asymptotically convergent analytic series of the χ2 measure. The new series removes the need to use periodic approximations to the χ2 function, as typical in previous methods, and improves the classification accuracy when used in the random Fourier approximation of the exponential χ2 kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. The proposed approaches are tested on the PASCAL VOC 2010 segmentation and the ImageNet ILSVR-C 2010 datasets, and shown to give statistically significant improvements over alternative approximation methods.

Research paper thumbnail of Hyperplane margin classifiers on the multinomial manifold

Twenty-first international conference on Machine learning - ICML '04, 2004

Research paper thumbnail of Learning multiple-question decision trees for cold-start recommendation

Proceedings of the sixth ACM international conference on Web search and data mining - WSDM '13, 2013

ABSTRACT For cold-start recommendation, it is important to rapidly profile new users and generate... more ABSTRACT For cold-start recommendation, it is important to rapidly profile new users and generate a good initial set of recommendations through an interview process --- users should be queried adaptively in a sequential fashion, and multiple items should be offered for opinion solicitation at each trial. In this work, we propose a novel algorithm that learns to conduct the interview process guided by a decision tree with multiple questions at each split. The splits, represented as sparse weight vectors, are learned through an L_1-constrained optimization framework. The users are directed to child nodes according to the inner product of their responses and the corresponding weight vector. More importantly, to account for the variety of responses coming to a node, a linear regressor is learned within each node using all the previously obtained answers as input to predict item ratings. A user study, preliminary but first in its kind in cold-start recommendation, is conducted to explore the efficient number and format of questions being asked in a recommendation survey to minimize user cognitive efforts. Quantitative experimental validations also show that the proposed algorithm outperforms state-of-the-art approaches in terms of both the prediction accuracy and user cognitive efforts.

Research paper thumbnail of CSI: Composite Statistical Inference for Semantic Segmentation

Research paper thumbnail of CSI: Composite Statistical Inference Techniques for Semantic Segmentation

Research paper thumbnail of Linear Regression

Research paper thumbnail of Unsupervised Supervised Learning II: Training Margin Based Classifiers without Labels

Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are tra... more Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel technique for estimating such ...

Research paper thumbnail of Local Space-Time Smoothing for Versioned Documents

Research paper thumbnail of Proceedings of the 21st ACM international conference on Information and knowledge management