Guy Lebanon - Academia.edu (original) (raw)
Papers by Guy Lebanon
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12, 2012
Non-negative matrix factorization (NMF) provides a lower rank approximation of a matrix. Due to n... more Non-negative matrix factorization (NMF) provides a lower rank approximation of a matrix. Due to nonnegativity imposed on the factors, it gives a latent structure that is often more physically meaningful than other lower rank approximations such as singular value decomposition (SVD). Most of the algorithms proposed in literature for NMF have been based on minimizing the Frobenius norm. This is partly due to the fact that the minimization problem based on the Frobenius norm provides much more flexibility in algebraic manipulation than other divergences. In this paper we propose a fast NMF algorithm that is applicable to general Bregman divergences. Through Taylor series expansion of the Bregman divergences, we reveal a relationship between Bregman divergences and Euclidean distance. This key relationship provides a new direction for NMF algorithms with general Bregman divergences when combined with the scalar block coordinate descent method. The proposed algorithm generalizes several recently proposed methods for computation of NMF with Bregman divergences and is computationally faster than existing alternatives. We demonstrate the effectiveness of our approach with experiments conducted on artificial as well as real world data.
Proceedings of the 25th international conference on Machine learning - ICML '08, 2008
... More recently, Forman (2006) studied the concept drift phenomenon in the context of infor-mat... more ... More recently, Forman (2006) studied the concept drift phenomenon in the context of infor-mation retrieval in large textual databases. Sharan and Neville (2007) consider the modeling of temporal changes in relational databases and its application to text classification. ...
Lecture Notes in Computer Science, 2008
To secure today's computer systems, it is critical to have different... more To secure today's computer systems, it is critical to have different intrusion detection sensors embedded in them. The complexity of distributed computer systems makes it difficult to determine the appropriate configuration of these detectors, ie, their choice and placement. In this paper, we describe a method to evaluate the effect of the detector configuration on the accuracy and precision of determining security goals in the system. For this, we develop a Bayesian network model for the distributed system, from an attack graph representation of ...
Contemporary Mathematics, 2010
ABSTRACT Many real world applications produce ranked data which are partially missing or tied. He... more ABSTRACT Many real world applications produce ranked data which are partially missing or tied. Heterogeneous patterns of ties and missing values require the development of statistically sound techniques for capturing information in varied ranking types. We examine the application of kernel smoothing to such data, with a kernel that is the discrete analogue of the triangular kernel on the real line. We demonstrate the use of generating functions and an asymptotic approximation in computing the kernel smoothing estimator for ranked data with arbitrary missing values and tie structure.
We propose a new analytical approximation to the chi2\chi^2chi2 kernel that converges geometrically. Th... more We propose a new analytical approximation to the chi2\chi^2chi2 kernel that converges geometrically. The analytical approximation is derived with elementary methods and adapts to the input distribution for optimal convergence rate. Experiments show the new approximation leads to improved performance in image classification and semantic segmentation tasks using a random Fourier feature approximation of the exp−chi2\exp-\chi^2exp−chi2 kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. Experiments conducted on the PASCAL VOC 2010 segmentation and the ImageNet ILSVRC 2010 datasets show statistically significant improvements over alternative approximation methods.
Sentiment analysis predicts the presence of positive or negative emotions in a text document. In ... more Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.
Beyond appeared in Church of Totem, an exhibit by Totem Shriver, at the Linfield Gallery at Linfi... more Beyond appeared in Church of Totem, an exhibit by Totem Shriver, at the Linfield Gallery at Linfield College in McMinnville, Oregon. This pen and ink drawing with acrylic paint measures 20 inches by 18 inches.
Computing Research Repository, 2010
Semisupervised learning has emerged as a popular framework for improving modeling accuracy while ... more Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semi-supervised learning. In doing so, we complement distribution-free analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of
Journal of Machine Learning Research, 2010
Lecture Notes in Computer Science, 2006
An important issue any organization or individual has to face when managing data containing sensi... more An important issue any organization or individual has to face when managing data containing sensitive information, is the risk that can be incurred when releasing such data. Even though data may be sanitized, before being released, it is still possible for an adversary to reconstruct the original data by using additional information that may be available, for example, from other data sources. To date, however, no comprehensive approach exists to quantify such risks. In this paper we develop a framework, based on statistical decision theory, to assess the relationship between the disclosed data and the resulting privacy risk. We relate our framework with the k-anonymity disclosure method; we make the assumptions behind k-anonymity explicit, quantify them, and extend them in several natural directions.
2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013
In this paper we present an inference procedure for the semantic segmentation of images. Differen... more In this paper we present an inference procedure for the semantic segmentation of images. Different from many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals and the objects present in the image. We define continuous latent variables on superpixels obtained by multiple intersections of segments, then output the optimal segments from the inferred superpixel statistics. The algorithm can recombine and refine initial mid-level proposals, as well as handle multiple interacting objects, even from the same class, all in a consistent joint inference framework by maximizing the composite likelihood of the underlying statistical model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains high accuracy and successfully handles images of complex object interactions.
2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012
ABSTRACT The random Fourier embedding methodology can be used to approximate the performance of n... more ABSTRACT The random Fourier embedding methodology can be used to approximate the performance of non-linear kernel classifiers in linear time on the number of training examples. However, there still exists a non-trivial performance gap between the approximation and the nonlinear models, especially for the exponential χ2 kernel, one of the most powerful models for histograms. Based on analogies with Chebyshev polynomials, we propose an asymptotically convergent analytic series of the χ2 measure. The new series removes the need to use periodic approximations to the χ2 function, as typical in previous methods, and improves the classification accuracy when used in the random Fourier approximation of the exponential χ2 kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. The proposed approaches are tested on the PASCAL VOC 2010 segmentation and the ImageNet ILSVR-C 2010 datasets, and shown to give statistically significant improvements over alternative approximation methods.
Twenty-first international conference on Machine learning - ICML '04, 2004
Proceedings of the sixth ACM international conference on Web search and data mining - WSDM '13, 2013
ABSTRACT For cold-start recommendation, it is important to rapidly profile new users and generate... more ABSTRACT For cold-start recommendation, it is important to rapidly profile new users and generate a good initial set of recommendations through an interview process --- users should be queried adaptively in a sequential fashion, and multiple items should be offered for opinion solicitation at each trial. In this work, we propose a novel algorithm that learns to conduct the interview process guided by a decision tree with multiple questions at each split. The splits, represented as sparse weight vectors, are learned through an L_1-constrained optimization framework. The users are directed to child nodes according to the inner product of their responses and the corresponding weight vector. More importantly, to account for the variety of responses coming to a node, a linear regressor is learned within each node using all the previously obtained answers as input to predict item ratings. A user study, preliminary but first in its kind in cold-start recommendation, is conducted to explore the efficient number and format of questions being asked in a recommendation survey to minimize user cognitive efforts. Quantitative experimental validations also show that the proposed algorithm outperforms state-of-the-art approaches in terms of both the prediction accuracy and user cognitive efforts.
Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are tra... more Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel technique for estimating such ...
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12, 2012
Non-negative matrix factorization (NMF) provides a lower rank approximation of a matrix. Due to n... more Non-negative matrix factorization (NMF) provides a lower rank approximation of a matrix. Due to nonnegativity imposed on the factors, it gives a latent structure that is often more physically meaningful than other lower rank approximations such as singular value decomposition (SVD). Most of the algorithms proposed in literature for NMF have been based on minimizing the Frobenius norm. This is partly due to the fact that the minimization problem based on the Frobenius norm provides much more flexibility in algebraic manipulation than other divergences. In this paper we propose a fast NMF algorithm that is applicable to general Bregman divergences. Through Taylor series expansion of the Bregman divergences, we reveal a relationship between Bregman divergences and Euclidean distance. This key relationship provides a new direction for NMF algorithms with general Bregman divergences when combined with the scalar block coordinate descent method. The proposed algorithm generalizes several recently proposed methods for computation of NMF with Bregman divergences and is computationally faster than existing alternatives. We demonstrate the effectiveness of our approach with experiments conducted on artificial as well as real world data.
Proceedings of the 25th international conference on Machine learning - ICML '08, 2008
... More recently, Forman (2006) studied the concept drift phenomenon in the context of infor-mat... more ... More recently, Forman (2006) studied the concept drift phenomenon in the context of infor-mation retrieval in large textual databases. Sharan and Neville (2007) consider the modeling of temporal changes in relational databases and its application to text classification. ...
Lecture Notes in Computer Science, 2008
To secure today's computer systems, it is critical to have different... more To secure today's computer systems, it is critical to have different intrusion detection sensors embedded in them. The complexity of distributed computer systems makes it difficult to determine the appropriate configuration of these detectors, ie, their choice and placement. In this paper, we describe a method to evaluate the effect of the detector configuration on the accuracy and precision of determining security goals in the system. For this, we develop a Bayesian network model for the distributed system, from an attack graph representation of ...
Contemporary Mathematics, 2010
ABSTRACT Many real world applications produce ranked data which are partially missing or tied. He... more ABSTRACT Many real world applications produce ranked data which are partially missing or tied. Heterogeneous patterns of ties and missing values require the development of statistically sound techniques for capturing information in varied ranking types. We examine the application of kernel smoothing to such data, with a kernel that is the discrete analogue of the triangular kernel on the real line. We demonstrate the use of generating functions and an asymptotic approximation in computing the kernel smoothing estimator for ranked data with arbitrary missing values and tie structure.
We propose a new analytical approximation to the chi2\chi^2chi2 kernel that converges geometrically. Th... more We propose a new analytical approximation to the chi2\chi^2chi2 kernel that converges geometrically. The analytical approximation is derived with elementary methods and adapts to the input distribution for optimal convergence rate. Experiments show the new approximation leads to improved performance in image classification and semantic segmentation tasks using a random Fourier feature approximation of the exp−chi2\exp-\chi^2exp−chi2 kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. Experiments conducted on the PASCAL VOC 2010 segmentation and the ImageNet ILSVRC 2010 datasets show statistically significant improvements over alternative approximation methods.
Sentiment analysis predicts the presence of positive or negative emotions in a text document. In ... more Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.
Beyond appeared in Church of Totem, an exhibit by Totem Shriver, at the Linfield Gallery at Linfi... more Beyond appeared in Church of Totem, an exhibit by Totem Shriver, at the Linfield Gallery at Linfield College in McMinnville, Oregon. This pen and ink drawing with acrylic paint measures 20 inches by 18 inches.
Computing Research Repository, 2010
Semisupervised learning has emerged as a popular framework for improving modeling accuracy while ... more Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semi-supervised learning. In doing so, we complement distribution-free analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of
Journal of Machine Learning Research, 2010
Lecture Notes in Computer Science, 2006
An important issue any organization or individual has to face when managing data containing sensi... more An important issue any organization or individual has to face when managing data containing sensitive information, is the risk that can be incurred when releasing such data. Even though data may be sanitized, before being released, it is still possible for an adversary to reconstruct the original data by using additional information that may be available, for example, from other data sources. To date, however, no comprehensive approach exists to quantify such risks. In this paper we develop a framework, based on statistical decision theory, to assess the relationship between the disclosed data and the resulting privacy risk. We relate our framework with the k-anonymity disclosure method; we make the assumptions behind k-anonymity explicit, quantify them, and extend them in several natural directions.
2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013
In this paper we present an inference procedure for the semantic segmentation of images. Differen... more In this paper we present an inference procedure for the semantic segmentation of images. Different from many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals and the objects present in the image. We define continuous latent variables on superpixels obtained by multiple intersections of segments, then output the optimal segments from the inferred superpixel statistics. The algorithm can recombine and refine initial mid-level proposals, as well as handle multiple interacting objects, even from the same class, all in a consistent joint inference framework by maximizing the composite likelihood of the underlying statistical model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains high accuracy and successfully handles images of complex object interactions.
2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012
ABSTRACT The random Fourier embedding methodology can be used to approximate the performance of n... more ABSTRACT The random Fourier embedding methodology can be used to approximate the performance of non-linear kernel classifiers in linear time on the number of training examples. However, there still exists a non-trivial performance gap between the approximation and the nonlinear models, especially for the exponential χ2 kernel, one of the most powerful models for histograms. Based on analogies with Chebyshev polynomials, we propose an asymptotically convergent analytic series of the χ2 measure. The new series removes the need to use periodic approximations to the χ2 function, as typical in previous methods, and improves the classification accuracy when used in the random Fourier approximation of the exponential χ2 kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. The proposed approaches are tested on the PASCAL VOC 2010 segmentation and the ImageNet ILSVR-C 2010 datasets, and shown to give statistically significant improvements over alternative approximation methods.
Twenty-first international conference on Machine learning - ICML '04, 2004
Proceedings of the sixth ACM international conference on Web search and data mining - WSDM '13, 2013
ABSTRACT For cold-start recommendation, it is important to rapidly profile new users and generate... more ABSTRACT For cold-start recommendation, it is important to rapidly profile new users and generate a good initial set of recommendations through an interview process --- users should be queried adaptively in a sequential fashion, and multiple items should be offered for opinion solicitation at each trial. In this work, we propose a novel algorithm that learns to conduct the interview process guided by a decision tree with multiple questions at each split. The splits, represented as sparse weight vectors, are learned through an L_1-constrained optimization framework. The users are directed to child nodes according to the inner product of their responses and the corresponding weight vector. More importantly, to account for the variety of responses coming to a node, a linear regressor is learned within each node using all the previously obtained answers as input to predict item ratings. A user study, preliminary but first in its kind in cold-start recommendation, is conducted to explore the efficient number and format of questions being asked in a recommendation survey to minimize user cognitive efforts. Quantitative experimental validations also show that the proposed algorithm outperforms state-of-the-art approaches in terms of both the prediction accuracy and user cognitive efforts.
Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are tra... more Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel technique for estimating such ...