Michael Lindenbaum - Academia.edu (original) (raw)

Papers by Michael Lindenbaum

Measurement Science and Technology, Feb 6, 2023

Accurate and cost-effective sea state measurements, in terms of spatio-temporal distribution of w... more Accurate and cost-effective sea state measurements, in terms of spatio-temporal distribution of water surface elevation (water waves), is of great interest for scientific research and various engineering, industrial, and recreational applications. To this end, numerous measurement techniques have been developed over the years. None of these techniques, however, are universally applicable across various ocean and laboratory conditions and none provide near-real-time data. We utilized the latest advances in polarimetric imaging to develop a new remote sensing method based on machine learning methodology and polarimetric reflection measurements for inferring surface waves elevation and slope. The method utilizes a newly available, inexpensive polarimetric camera providing images of the water surface in a high spatio-temporal resolution at several linear polarization angles. Algorithms based on artificial neural networks ( ANN s) are then trained to obtain high-resolution reconstructions of the water surface slope state from those images. The ANN s are trained on laboratory-collected supervised datasets of prescribed mechanically generated monochromatic wave trains and tested on a stochastic wave field of JONSWAP spectral shape. The proposed method, based on inferring the surface slope from polarimetric images, provides a dense estimate of the water surface. The results of this study pave the way for the development of accurate and cost-effective near-real-time remote sensing tools for both laboratory and open sea wave measurements.

Lecture Notes in Computer Science, 1996

Many recognition procedures rely on the consistency of a subset of data features with a hypothesi... more Many recognition procedures rely on the consistency of a subset of data features with a hypothesis as the sufficient evidence to the presence of the corresponding object. We analyze here the performance of such procedures, using a probabilistic model, and provide expressions for the sufficient size of such data subsets, that, if consistent, guarantee the validity of the hypotheses with arbitrary confidence. We focus on 2D objects and the affine transformation class, and provide, for the first time, an integrated model which takes into account the shape of the objects involved, the accuracy of the data collected, the clutter present in the scene, the class of the transformations involved, the accuracy of the localization, and the confidence we would like to have in our hypotheses. Interestingly, it turns out that most of these factors can be quantified cumulatively by one parameter, denoted "effective similarity," which largely determines the sufficient subset size. The analysis is based on representing the class of instances corresponding to a model object and a group of transformations, as members of a metric space, and quantifying the variation of the instances by a metric cover.

Lecture Notes in Computer Science, 1995

We propose a mathematical model for learning the high-density areas of an unknown distribution fr... more We propose a mathematical model for learning the high-density areas of an unknown distribution from (unlabeled) random points drawn according to this distribution. While this type of a learning task has not been previously addressed in the Computational Learnability literature, we b e l i e v e that this it a rather basic problem that appears in many practical learning scenarios. From a statistical theory standpoint, our model may be viewed as a restricted instance of the fundamental issue of inferring information about a probability distribution from the random samples it generates. From a computational learning angle, what we propose is a new framework of un-supervised concept learning. The examples provided to the learner in our model are not labeled (and are not necessarily all positive or all negative). The only information about their membership is indirectly disclosed to the student through the sampling distribution. We i n vestigate the basic features of the proposed model and provide lower and upper bounds on the sample complexity of such learning tasks. Our main result is that the learnability of a class of distributions in this setting is equivalent to the niteness of the VC-dimension of the class of the high-density areas of these distributions. One direction of the proof involves a reduction of the density-level-learnability to p-concepts learnability, while the su ciency condition is proved through the introduction of a generic learning algorithm.

2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun 1, 2009

Nonnegative Matrix Factorization (NMF) approximates a given data matrix as a product of two low r... more Nonnegative Matrix Factorization (NMF) approximates a given data matrix as a product of two low rank nonnegative matrices, usually by minimizing the L 2 or the KL distance between the data matrix and the matrix product. This factorization was shown to be useful for several important computer vision applications. We propose here a new NMF algorithm that minimizes the Earth Mover's Distance (EMD) error between the data and the matrix product. We propose an iterative NMF algorithm (EMD NMF) and prove its convergence. The algorithm is based on linear programming. We discuss the numerical difficulties of the EMD NMF and propose an efficient approximation. Naturally, the matrices obtained with EMD NMF are different from those obtained with L 2 NMF. We discuss these differences in the context of two challenging computer vision tasks-texture classification and face recognition-and demonstrate the advantages of the proposed method.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Jun 1, 2003

arXiv (Cornell University), Dec 3, 2018

In this paper, we propose a normal estimation method for unstructured 3D point clouds. This metho... more In this paper, we propose a normal estimation method for unstructured 3D point clouds. This method, called Nesti-Net, builds on a new local point cloud representation which consists of multi-scale point statistics (MuPS), estimated on a local coarse Gaussian grid. This representation is a suitable input to a CNN architecture. The normals are estimated using a mixtureof-experts (MoE) architecture, which relies on a datadriven approach for selecting the optimal scale around each point and encourages sub-network specialization. Interesting insights into the network's resource distribution are provided. The scale prediction significantly improves robustness to different noise levels, point density variations and different levels of detail. We achieve state-of-the-art results on a benchmark synthetic dataset and present qualitative results on real scanned scenes.

Lecture Notes in Computer Science, 2020

Humans possess an intricate and powerful visual system in order to perceive and understand the en... more Humans possess an intricate and powerful visual system in order to perceive and understand the environing world. Human perception can effortlessly detect and correctly group features in visual data and can even interpret random-dot videos induced by imaging natural dynamic scenes with highly noisy sensors such as ultrasound imaging. Remarkably, this happens even if perception completely fails when the same information is presented frame by frame rather than in a video sequence. We study this property of surprising dynamic perception with the first goal of proposing a new detection and spatiotemporal grouping algorithm for such signals when, per frame, the information on objects is both random and sparse and embedded in random noise. The algorithm is based on the succession of temporal integration and spatial statistical tests of unlikeliness, the a contrario framework. The algorithm not only manages to handle such signals but the striking similarity in its performance to the perception by human observers, as witnessed by a series of psychophysical experiments on image and video data, leads us to see in it a simple computational Gestalt model of human perception with only two parameters: the time integration and the visual angle for candidate shapes to be detected.

IEEE robotics and automation letters, Oct 1, 2018

Modern robotic systems are often equipped with a direct three-dimensional (3-D) data acquisition ... more Modern robotic systems are often equipped with a direct three-dimensional (3-D) data acquisition device, e.g., LiDAR, which provides a rich 3-D point cloud representation of the surroundings. This representation is commonly used for obstacle avoidance and mapping. Here, we propose a new approach for using point clouds for another critical robotic capability, semantic understanding of the environment (i.e., object classification). Convolutional neural networks (CNNs), that perform extremely well for object classification in 2-D images, are not easily extendible to 3-D point clouds analysis. It is not straightforward due to point clouds’ irregular format and a varying number of points. The common solution of transforming the point cloud data into a 3-D voxel grid needs to address severe accuracy versus memory size tradeoffs. In this letter, we propose a novel, intuitively interpretable, 3-D point cloud representation called 3-D modified Fisher vectors. Our representation is hybrid as it combines a coarse discrete grid structure with continuous generalized Fisher vectors. Using the grid enables us to design a new CNN architecture for real-time point cloud classification. In a series of performance analysis experiments, we demonstrate competitive results or even better than state of the art on challenging benchmark datasets while maintaining robustness to various data corruptions.

arXiv (Cornell University), Oct 12, 2020

In contrast to human vision, common recognition algorithms often fail on partially occluded image... more In contrast to human vision, common recognition algorithms often fail on partially occluded images. We propose characterizing, empirically, the algorithmic limits by finding a minimal recognizable patch (MRP) that is by itself sufficient to recognize the image. A specialized deep network allows us to find the most informative patches of a given size, and serves as an experimental tool. A human vision study recently characterized related (but different) minimally recognizable configurations (MIRCs) [1], for which we specify computational analogues (denoted cMIRCs). The drop in human decision accuracy associated with size reduction of these MIRCs is substantial and sharp. Interestingly, such sharp reductions were also found for the computational versions we specified.

This paper proposes a new, e cient, gure from ground method. At every stage the data features are... more This paper proposes a new, e cient, gure from ground method. At every stage the data features are classi ed to either "background" or "unknown yet" classes, thus emphasizing the background detection task and implying the name of the method. The sequential application of such classi cation stages creates a bootstrap mechanism which improves performance in very cluttered scenes. This method can be applied to many perceptual grouping cues, and an application to smoothness-based classi cation of edge points is given. A fast implementation using a kd-tree allows to work on large, realistic images.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Sep 1, 2008

This paper addresses the problem of visual tracking under very general conditions: a possibly non... more This paper addresses the problem of visual tracking under very general conditions: a possibly nonrigid target whose appearance may drastically change over time; general camera motion; a 3D scene; and no a priori information except initialization. This is in contrast to the vast majority of trackers which rely on some limited model in which, for example, the target's appearance is known a priori or restricted, the scene is planar, or a pan tilt zoom camera is used. Their goal is to achieve speed and robustness, but their limited context may cause them to fail in the more general case. The proposed tracker works by approximating, in each frame, a PDF (probability distribution function) of the target's bitmap and then estimating the maximum a posteriori bitmap. The PDF is marginalized over all possible motions per pixel, thus avoiding the stage in which optical flow is determined. This is an advantage over other general-context trackers that do not use the motion cue at all or rely on the error-prone calculation of optical flow. Using a Gibbs distribution with a firstorder neighborhood system yields a bitmap PDF whose maximization may be transformed into that of a quadratic pseudo-Boolean function, the maximum of which is approximated via a reduction to a maximum-flow problem. Many experiments were conducted to demonstrate that the tracker is able to track under the aforementioned general context.

Pattern Recognition, 1990

Lecture Notes in Computer Science, 2020

We introduce a new deep learning method for point cloud comparison. Our approach, named Deep Poin... more We introduce a new deep learning method for point cloud comparison. Our approach, named Deep Point Cloud Distance (DPDist), measures the distance between the points in one cloud and the estimated surface from which the other point cloud is sampled. The surface is estimated locally using the 3D modified Fisher vector representation. The local representation reduces the complexity of the surface, enabling effective learning, which generalizes well between object categories. We test the proposed distance in challenging tasks, such as similar object comparison and registration, and show that it provides significant improvements over commonly used distances such as Chamfer distance, Earth mover's distance, and others.

arXiv (Cornell University), Nov 22, 2017

The point cloud is gaining prominence as a method for representing 3D shapes, but its irregular f... more The point cloud is gaining prominence as a method for representing 3D shapes, but its irregular format poses a challenge for deep learning methods. The common solution of transforming the data into a 3D voxel grid introduces its own challenges, mainly large memory size. In this paper we propose a novel 3D point cloud representation called 3D Modified Fisher Vectors (3DmFV). Our representation is hybrid as it combines the discrete structure of a grid with continuous generalization of Fisher vectors, in a compact and computationally efficient way. Using the grid enables us to design a new CNN architecture for point cloud classification and part segmentation. In a series of experiments we demonstrate competitive performance or even better than state-of-the-art on challenging benchmark datasets. • We design a new deep ConvNet architecture (3DmFV-Net) based on this representation and use it for point cloud classification, obtaining state of the art results. • We conduct a thorough empirical analysis on the stability of our method.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Feb 1, 2011

Computer Vision and Image Understanding, Oct 1, 1999

This paper proposes a new, e cient, gure from ground method. At every stage the data features are... more This paper proposes a new, e cient, gure from ground method. At every stage the data features are classi ed to either "background" or "unknown yet" classes, thus emphasizing the background detection task (and implying the name of the method). The sequential application of such classi cation stages creates a bootstrap mechanism which improves performance in very cluttered scenes. This method can be applied to many perceptual grouping cues, and an application to smoothness-based classi cation of edge points is given. A fast implementation using a kd-tree allows to work on large, realistic images.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Dec 1, 2006

Detecting salient structures is a basic task in perceptual organization. Saliency algorithms typi... more Detecting salient structures is a basic task in perceptual organization. Saliency algorithms typically mark edge-points with some saliency measure, which grows with the length and smoothness of the curve on which these edge-points lie. Here, we propose a modified saliency estimation mechanism that is based on probabilistically specified grouping cues and on curve length distributions. In this framework, the Shashua and Ullman saliency mechanism may be interpreted as a process for detecting the curve with maximal expected length. Generalized types of saliency naturally follow. We propose several specific generalizations (e.g., gray-level-based saliency) and rigorously derive the limitations on generalized saliency types. We then carry out a probabilistic analysis of expected length saliencies. Using ergodicity and asymptotic analysis, we derive the saliency distributions associated with the main curves and with the rest of the image. We then extend this analysis to finite-length curves. Using the derived distributions, we derive the optimal threshold on the saliency for discriminating between figure and background and bound the saliency-based figure-from-ground performance.

Measurement Science and Technology, Feb 6, 2023

Lecture Notes in Computer Science, 1996

Lecture Notes in Computer Science, 1995

2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun 1, 2009

IEEE Transactions on Pattern Analysis and Machine Intelligence, Jun 1, 2003

arXiv (Cornell University), Dec 3, 2018

Lecture Notes in Computer Science, 2020

IEEE robotics and automation letters, Oct 1, 2018

arXiv (Cornell University), Oct 12, 2020

IEEE Transactions on Pattern Analysis and Machine Intelligence, Sep 1, 2008

Pattern Recognition, 1990

Lecture Notes in Computer Science, 2020

arXiv (Cornell University), Nov 22, 2017

IEEE Transactions on Pattern Analysis and Machine Intelligence, Feb 1, 2011

Computer Vision and Image Understanding, Oct 1, 1999

This paper proposes a new, e cient, gure from ground method. At every stage the data features are... more This paper proposes a new, e cient, gure from ground method. At every stage the data features are classi ed to either "background" or "unknown yet" classes, thus emphasizing the background detection task (and implying the name of the method). The sequential application of such classi cation stages creates a bootstrap mechanism which improves performance in very cluttered scenes. This method can be applied to many perceptual grouping cues, and an application to smoothness-based classi cation of edge points is given. A fast implementation using a kd-tree allows to work on large, realistic images.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Dec 1, 2006