Dongryeol Lee | Georgia Institute of Technology (original) (raw)

Papers by Dongryeol Lee

Mean shift is a powerful but computationally expensive method for nonparametric cluster-ing and o... more Mean shift is a powerful but computationally expensive method for nonparametric cluster-ing and optimization. It iteratively moves each data point to its local mean until con-vergence. We introduce a fast algorithm for computing mean shift based on the dual-tree. Unlike ...

The long-standing problem of efficient nearest-neighbor (NN) search has ubiquitous applications r... more The long-standing problem of efficient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 fingerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1+) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the first time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior.

Several key computational bottlenecks in machine learning involve pairwise distance computations,... more Several key computational bottlenecks in machine learning involve pairwise distance computations, including all-nearest-neighbors (finding the nearest neighbor(s) for each point, e.g. in manifold learning) and kernel summations (e.g. in kernel density estimation or kernel machines). We consider the general, bichromatic case for these problems, in addition to the scientific problem of N-body simulation. In this paper we show for the first time O() worst case runtimes for practical algorithms for these problems based on the cover tree data structure [1].

Since its inception, the modus operandi of multi-task learning (MTL) has been to minimize the tas... more Since its inception, the modus operandi of multi-task learning (MTL) has been to minimize the task-wise mean of the empirical risks. We introduce a generalized loss-compositional paradigm for MTL that includes a spectrum of formulations as a subfamily. One endpoint of this spectrum is minimax MTL: a new MTL formulation that minimizes the maximum of the tasks' empirical risks. Via a certain relaxation of minimax MTL, we obtain a continuum of MTL formulations spanning minimax MTL and classical MTL. The full paradigm itself is loss-compositional, operating on the vector of empirical risks. It incorporates minimax MTL, its relaxations, and many new MTL formulations as special cases. We show theoretically that minimax MTL tends to avoid worst case outcomes on newly drawn test tasks in the learning to learn (LTL) test setting. The results of several MTL formulations on synthetic and real problems in the MTL and LTL test settings are encouraging.

Uai, 2006

We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine ... more We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine learning methods. We develop two new extensions-an O(D p) Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error control scheme integrating any arbitrary approximation method-within the best discretealgorithmic framework using adaptive hierarchical data structures. We rigorously evaluate these techniques empirically in the context of optimal bandwidth selection in kernel density estimation, revealing the strengths and weaknesses of current state-of-the-art approaches for the first time. Our results demonstrate that the new error control scheme yields improved performance, whereas the series expansion approach is only effective in low dimensions (five or less). 1 Fast Gaussian Summation Kernel summations occur ubiquitously in both old and new machine learning algorithms, including kernel density estimation, kernel regression, radial basis function networks, spectral clustering, and kernel PCA (Gray & Moore, 2001; de Freitas et al., 2006). This paper will focus on the most common form G(x q) = N r=1 w r K(δ qr) in which we desire the sum for M different query points x q 's, each using N reference points x r 's weighted by w r > 0. K(δ qr) = e −δ 2 qr 2h 2

Numerous machine learning algorithms contain pairwise statistical problems at their core---that i... more Numerous machine learning algorithms contain pairwise statistical problems at their core---that is, tasks that require computations over all pairs of input points if implemented naively. Often, tree structures are used to solve these problems efficiently. Dual-tree algorithms can efficiently solve or approximate many of these problems. Using cover trees, rigorous worst-case runtime guarantees have been proven for some of these algorithms. In this paper, we present a problem-independent runtime guarantee for any dual-tree algorithm using the cover tree, separating out the problem-dependent and the problem-independent elements. This allows us to just plug in bounds for the problem-dependent elements to get runtime guarantees for dual-tree algorithms for any pairwise statistical problem without re-deriving the entire proof. We demonstrate this plug-and-play procedure for nearest-neighbor search and approximate kernel density estimation to get improved runtime guarantees. Under mild assumptions, we also present the first linear runtime guarantee for dual-tree based range search.

Nips, 2005

Kernel density estimation (KDE) is a popular statistical technique for estimating the underlying ... more Kernel density estimation (KDE) is a popular statistical technique for estimating the underlying density distribution with minimal assumptions. Although they can be shown to achieve asymptotic estimation optimality for any input distribution, cross-validating for an optimal parameter requires significant computation dominated by kernel summations. In this paper we present an improvement to the dual-tree algorithm, the first practical kernel summation algorithm for general dimension. Our extension is based on the series-expansion for the Gaussian kernel used by fast Gauss transform. First, we derive two additional analytical machinery for extending the original algorithm to utilize a hierarchical data structure, demonstrating the first truly hierarchical fast Gauss transform. Second, we show how to integrate the series-expansion approximation within the dual-tree approach to compute kernel summations with a user-controllable relative error bound. We evaluate our algorithm on real-world datasets in the context of optimal bandwidth selection in kernel density estimation. Our results demonstrate that our new algorithm is the only one that guarantees a hard relative error bound and offers fast performance across a wide range of bandwidths evaluated in cross validation procedures.

Our work is a step towards the goal of embedding machine learning (ML) in Microsoft products. We ... more Our work is a step towards the goal of embedding machine learning (ML) in Microsoft products. We introduce the most successful spatial data algorithms in literature as extensions to Microsoft SQL Server. Our library is extensible and works in-place, sidestepping cost of data transfer and transformation. Our library runs within SQL CLR, and is thus immediately available to all existing SQL Server users.

Proceedings of the 2012 SIAM International Conference on Data Mining, 2012

Many high-profile applications pose high-dimensional nearest-neighbor search problems. Yet, it st... more Many high-profile applications pose high-dimensional nearest-neighbor search problems. Yet, it still remains difficult to achieve fast query times for state-of-the-art approaches which use multidimensional trees for either exact or approximate search, possibly in combination with hashing approaches. Moreover, a number of these applications only have a limited amount of time to answer nearest-neighbor queries. However, we observe empirically that the correct neighbor is often found early within the tree-search process, while the bulk of the time is spent on verifying its correctness. Motivated by this, we propose an algorithm for finding the best neighbor given any particular time limit, and develop a new data structure, the max-margin tree, to achieve accurate results even with small time budgets. Max-margin trees perform better in the limited-time setting than current commonly-used data structures such as the kd-tree and more recently developed data structures like the RP-tree.

Statistical Analysis and Data Mining, 2013

Abstract Kernel summations are a ubiquitous key computational bottleneck in many data analysis me... more Abstract Kernel summations are a ubiquitous key computational bottleneck in many data analysis methods. In this paper, we attempt to marry, for the first time, the best relevant techniques in parallel computing, where kernel summations are in low dimensions, with the best general-dimension algorithms from the machine learning literature. We provide the first distributed implementation of kernel summation framework that can utilize: 1) various types of deterministic and probabilistic approximations that may be suitable for low and high- ...

Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, 2012

Numerous machine learning algorithms contain pairwise statistical problems at their corethat is, ... more Numerous machine learning algorithms contain pairwise statistical problems at their corethat is, tasks that require computations over all pairs of input points if implemented naively. Often, tree structures are used to solve these problems efficiently. Dual-tree algorithms can efficiently solve or approximate many of these problems. Using cover trees, rigorous worstcase runtime guarantees have been proven for some of these algorithms. In this paper, we present a problem-independent runtime guarantee for any dual-tree algorithm using the cover tree, separating out the problem-dependent and the problem-independent elements. This allows us to just plug in bounds for the problem-dependent elements to get runtime guarantees for dual-tree algorithms for any pairwise statistical problem without re-deriving the entire proof. We demonstrate this plug-and-play procedure for nearest-neighbor search and approximate kernel density estimation to get improved runtime guarantees. Under mild assumptions, we also present the first linear runtime guarantee for dual-tree based range search.

2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012

The n-point correlation functions (npcf) are powerful statistics that are widely used for data an... more The n-point correlation functions (npcf) are powerful statistics that are widely used for data analyses in astronomy and other fields. These statistics have played a crucial role in fundamental physical breakthroughs, including the discovery of dark energy. Unfortunately, directly computing the npcf at a single value requires O (N n) time for N points and values of n of 2, 3, 4, or even larger. Astronomical data sets can contain billions of points, and the next generation of surveys will generate terabytes of data per night. To meet these computational demands, we present a highly-tuned npcf computation code that show an order-of-magnitude speedup over current state-of-theart. This enables a much larger 3-point correlation computation on the galaxy distribution than was previously possible. We show a detailed performance evaluation on many different architectures.

2011 International Conference on Computer Vision, 2011

Recognition of motions and activities of objects in videos requires effective representations for... more Recognition of motions and activities of objects in videos requires effective representations for analysis and matching of motion trajectories. In this paper, we introduce a new representation specifically aimed at matching motion trajectories. We model a trajectory as a continuous dense flow field from a sparse set of vector sequences using Gaussian Process Regression. Furthermore, we introduce a random sampling strategy for learning stable classes of motions from limited data. Our representation allows for incrementally predicting possible paths and detecting anomalous events from online trajectories. This representation also supports matching of complex motions with acceleration changes and pauses or stops within a trajectory. We use the proposed approach for classifying and predicting motion trajectories in traffic monitoring domains and test on several data sets. We show that our approach works well on various types of complete and incomplete trajectories from a variety of video data sets with different frame rates.

2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

Abstract We present a method to detect the regions of interests in moving camera views of dynamic... more Abstract We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple moving objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noise and can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting ...

Journal of Computational Physics, 2012

A three-body potential function can account for interactions among triples of particles which are... more A three-body potential function can account for interactions among triples of particles which are uncaptured by pairwise interaction functions such as Coulombic or Lennard-Jones potentials. Likewise, a multibody potential of order n can account for interactions among n-tuples of particles uncaptured by interaction functions of lower orders. To date, the computation of multibody potential functions for a large number of particles has not been possible due to its O(N n) scaling cost. In this paper we describe a fast tree-code for efficiently approximating multibody potentials that can be factorized as products of functions of pairwise distances. For the first time, we show how to derive a Barnes-Hut type algorithm for handling interactions among more than two particles. Our algorithm uses two approximation schemes: 1) a deterministic series expansion-based method; 2) a Monte Carlo-based approximation based on the central limit theorem. Our approach guarantees a user-specified bound on the absolute or relative error in the computed potential with an asymptotic probability guarantee. We provide speedup results on a three-body dispersion potential, the Axilrod-Teller potential.

Advances in Neural Information Processing Systems, 2008

We propose a new fast Gaussian summation algorithm for high-dimensional datasets with high accura... more We propose a new fast Gaussian summation algorithm for high-dimensional datasets with high accuracy. First, we extend the original fast multipole-type meth-ods to use approximation schemes with both hard and probabilistic error. Second, we utilize a new data structure called ...

Arxiv preprint arXiv:1102.2878, 2011

In previous work we presented an efficient approach to computing kernel summations which arise in... more In previous work we presented an efficient approach to computing kernel summations which arise in many machine learning methods such as kernel density estimation. This approach, dual-tree recursion with finitedifference approximation, generalized existing methods for similar problems arising in computational physics in two ways appropriate for statistical problems: toward distribution sensitivity and general dimension, partly by avoiding series expansions. While this proved to be the fastest practical method for multivariate kernel density estimation at the optimal bandwidth, it is much less efficient at larger-than-optimal bandwidths. In this work, we explore the extent to which the dual-tree approach can be integrated with multipole-like Hermite expansions in order to achieve reasonable efficiency across all bandwidth scales, though only for low dimensionalities. In the process, we derive and demonstrate the first truly hierarchical fast Gauss transforms, effectively combining the best tools from discrete algorithms and continuous approximation theory. 1 These include the Barnes-Hut algorithm [2], the Fast Multipole Method [8], Appel's algorithm [1], and the WSPD [4]: the dual-tree method is a node-node algorithm (considers query regions rather than points), is fully recursive, can use distribution-sensitive data structures such as kd-trees, and is bichromatic (can specialize for differing query and reference sets).