robert pless - Academia.edu (original) (raw)

Papers by robert pless

Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)

We consider a problem central in aerial visual surveillance applications { detection and tracking... more We consider a problem central in aerial visual surveillance applications { detection and tracking of small, independently moving objects in long and noisy video sequences. We directly use spatiotemporal image intensity gradient measurements to compute an exact model of background motion. This allows the creation of accurate mosaics over many frames and the de nition o f a c onstraint violation function which acts as an indicator of independent motion. A novel temporal integration method maintains con dence m e asures over long subsequences without computing the optic ow, requiring object models, or using a Kalman lter. The mosaic acts as a stable feature f r ame, allowing precise localization of the independently moving objects. We present a statistical analysis of the e ects of image noise on the constraint violation measure and nd a good match between the predicted p r obability distribution function and the measured sample frequencies in a test sequence.

We introduce the integral-pixel camera model, where measurements integrate over large and potenti... more We introduce the integral-pixel camera model, where measurements integrate over large and potentially overlapping parts of the visual field. This models a wide variety of novel camera designs, including omnidirectional cameras, compressive sensing cameras, and novel programmable-pixel imaging chips. We explore the relationship of integral-pixel measurements with image motion and find (a) that direct motion estimation using integral-pixels is possible and in some cases quite good, (b) standard compressive-sensing reconstructions are not good for estimating motion, and (c) when we design image reconstruction algorithms that explicitly reason about image motion, they outperform standard compressive-sensing video reconstruction. We show experimental results for a variety of simulated cases, and have preliminary results showing a prototype camera with integral-pixels whose design makes direct motion estimation possible. Type of Report: Other Department of Computer Science & Engineering W...

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

We consider the problem of camera pose estimation for a scenario where the camera may have contin... more We consider the problem of camera pose estimation for a scenario where the camera may have continuous and unknown changes in its focal length. Understanding frame by frame changes in camera focal length is vital to accurately estimating camera pose and vital to accurately rendering virtual objects in a scene with the correct perspective. However, most approaches to camera calibration require geometric constraints from many frames or the observation of a 3D calibration object-both of which may not be feasible in augmented reality settings. This paper introduces a calibration object based on a flat lenticular array that creates a color coded light-field whose observed color changes depending on the angle from which it is viewed. We derive an approach to estimate the focal length of the camera and the relative pose of an object from a single image. We characterize the performance of camera calibration across various focal lengths and camera models, and we demonstrate the advantages of the focal length estimation in rendering a virtual object in a video with constant zooming.

2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings.

Background subtraction is the first step of many video surveillance applications. What is conside... more Background subtraction is the first step of many video surveillance applications. What is considered background varies by application, and may include regular, systematic, or complex motions. This paper explores the use of several different local spatio-temporal models of a background, defined at each pixel in the image. We present experiments with real image data and conclude that appropriate local representations are sufficient to make background models of complicated real world motions. Empirical studies illustrate, for example, that an optical flow-based model is able to detect emergency vehicles whose motion is different from those typically observed in traffic scenes. We conclude that "different models are appropriate for different scenes", but give criteria by which one can choose which model will be best.

2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007

We develop a framework to allow generic object detection algorithms to exploit geometric informat... more We develop a framework to allow generic object detection algorithms to exploit geometric information commonly available to robot vision systems. Robot systems take pictures with calibrated cameras from known positions and may simultaneously capture depth measurements in the scene. This allows known constraints on the 3D size and position of objects to be translated into constraints on potential locations and scales of objects in the image, eliminating potentially expensive image operations for geometrically infeasible object locations. We show this integration to be very natural in the context of face detection and find that the computational effort of the standard Viola Jones face detector (as implemented in OpenCV) can be reduced by 85 percent with three times fewer false positives.

Lecture Notes in Computer Science, 2008

Machine learning approaches have become the de-facto standard for creating object detectors (such... more Machine learning approaches have become the de-facto standard for creating object detectors (such as face and pedestrian detectors) which are robust to lighting, viewpoint, and pose. Generating su ciently large labeled data sets to support accurate training is often the most challenging problem. To address this, the active learning paradigm suggests interactive user input, creating an initial classifier based on a few samples and refining that classifier by identifying errors and retraining. In this paper we seek to maximize the e ciency of the user input; minimizing the number of labels the user must provide and minimizing the accuracy with which the user must identify the object. We propose, implement, and test a system that allows an untrained user to create high-quality classifiers in minutes for many di↵erent types of objects in arbitrary scenes.

IPSJ Transactions on Computer Vision and Applications, 2009

Many natural image sets are samples of a low-dimensional manifold in the space of all possible im... more Many natural image sets are samples of a low-dimensional manifold in the space of all possible images. Understanding this manifold is a key first step in understanding many sets of images, and manifold learning approaches have recently been used within many application domains, including face recognition, medical image segmentation, gait recognition and handwritten character recognition. This paper attempts to characterize the special features of manifold learning on image data sets, and to highlight the value and limitations of these approaches.

2007 IEEE 11th International Conference on Computer Vision, 2007

A key problem in widely distributed camera networks is geolocating the cameras. This paper consid... more A key problem in widely distributed camera networks is geolocating the cameras. This paper considers three scenarios for camera localization: localizing a camera in an unknown environment, adding a new camera in a region with many other cameras, and localizing a camera by finding correlations with satellite imagery. We find that simple summary statistics (the time course of principal component coefficients) are sufficient to geolocate cameras without determining correspondences between cameras or explicitly reasoning about weather in the scene. We present results from a database of images from 538 cameras collected over the course of a year. We find that for cameras that remain stationary and for which we have accurate image timestamps, we can localize most cameras to within 50 miles of the known location. In addition, we demonstrate the use of a distributed camera network in the construction a map of weather conditions.

2008 IEEE Workshop on Applications of Computer Vision, 2008

Automating tools for geo-locating and geo-orienting static cameras is a key step in creating a us... more Automating tools for geo-locating and geo-orienting static cameras is a key step in creating a useful global imaging network from cameras attached to the Internet. We present algorithms for partial camera calibration that rely on access to accurately time-stamped images captured over time from cameras that do not move. To support these algorithms we also offer a method of camera viewpoint change detection, or "tamper detection", which determines if a camera has moved in the challenging case when images are only captured every half hour. These algorithms are tested on a subset of the AMOS (Archive of Many Outdoor Scenes) database, and we present preliminary results that highlight the promise of these approaches.

2008 IEEE Workshop on Motion and video Computing, 2008

Surveillance and tracking systems often observe the same scene over extended time periods. When o... more Surveillance and tracking systems often observe the same scene over extended time periods. When object motion is constrained by the scene (for instance, cars on roads, or pedestrians on sidewalks), it is advantageous to characterize and use scene-specific and location-specific priors to aid the tracking algorithm. This paper develops and demonstrates a method for creating priors for tracking that are conditioned on the current location of the object in the scene. These priors can be naturally incorporated in a number of tracking algorithms to make tracking more efficient and more accurate. We present a novel method to sample from these priors and show performance improvements (in both efficiency and accuracy) for two different tracking algorithms in two different problem domains.

2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), 2009

The recent deployment of very large-scale camera networks has led to a unique version of the trac... more The recent deployment of very large-scale camera networks has led to a unique version of the tracking problem whose goal is to detect and track every vehicle within a large urban area. To address this problem we exploit constraints inherent in urban environments (i.e. while there are often many vehicles, they follow relatively consistent paths) to create novel visual processing tools that are highly efficient in detecting cars in a fixed scene and at connecting these detections into partial tracks. We derive extensions to a network flow based probabilistic data association model to connect these tracks between cameras. Our real time system is evaluated on a large set of ground-truthed traffic videos collected by a network of seven cameras in a dense urban scene.

2007 IEEE Workshop on Motion and Video Computing (WMVC'07), 2007

Detecting, isolating, and tracking moving objects in an outdoor scene is a fundamental problem of... more Detecting, isolating, and tracking moving objects in an outdoor scene is a fundamental problem of visual surveillance. A key component of most approaches to this problem is the construction of a background model of intensity values. We propose extending background modeling to include learning a model of the expected shape of foreground objects. This paper describes our approach to shape description, shape space density estimation, and unsupervised model training. A key contribution is a description of properties of the joint distribution of object shape and image location. We show object segmentation and anomalous shape detection results on video captured from road intersections. Our results demonstrate the usefulness of building scene-specific and spatially-localized shape background models.

Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks, 2006

In surveillance applications there may be multiple time scales at which it is important to monito... more In surveillance applications there may be multiple time scales at which it is important to monitor a scene. This work develops online, real-time algorithms that maintain background models simultaneously at many time scales. This creates a novel temporal decomposition of video sequence which can be used as a visualization tool for a human operator or an adaptive background model for classical anomaly detection and tracking algorithms. This paper solves the design problem for choosing appropriate time scales for the decomposition and derives the equations to approximately reconstruct the original video given only the temporal decomposition. We present two applications that highlight the potential of video processing; first a visualization tool that summarizes recent video behavior for a human operator in a single image, and second a pre-processing tool to detect "left bags" in the challenging PETS 2006 dataset which includes many occlusions of the left bag by pedestrians.

2015 IEEE Winter Conference on Applications of Computer Vision, 2015

Many computer vision applications rely on matching features of a query image to reference data se... more Many computer vision applications rely on matching features of a query image to reference data sets, but little work has explored how quickly data sets become out of date. In this paper we measure feature matching performance across 5 years of time-lapse data from 20 static cameras to empirically study how feature matching is affected by changing sunlight direction, seasons, weather, and the structural changes over time in outdoor settings. We identify several trends that may be relevant in realworld applications: (1) features are much more likely to match within a few days of the reference data, (2) weather and sun-direction have a large effect on feature matching, and (3) there is a slow decay over time due to physical changes in a scene, but this decay is much smaller than effects of lighting direction and weather. These trends are consistent across standard choices for feature detection (DoG, MSER) and feature description (SIFT, SURF, and DAISY). Across all choices, analysis of the feature detection and matching pipeline highlights that performance decay is mostly due to failures in key point detection rather than feature description.

Lecture Notes in Computer Science, 2012

In this work, we present a method to uncover shape from webcams "in the wild." We present a varia... more In this work, we present a method to uncover shape from webcams "in the wild." We present a variant of photometric stereo which uses the sun as a distant light source, so that lighting direction can be computed from known GPS and timestamps. We propose an iterative, non-linear optimization process that optimizes the error in reproducing all images from an extended time-lapse with an image formation model that accounts for ambient lighting, shadows, changing light color, dense surface normal maps, radiometric calibration, and exposure. Unlike many approaches to uncalibrated outdoor image analysis, this procedure is automatic, and we report quantitative results by comparing extracted surface normals to Google Earth 3D models. We evaluate this procedure on data from a varied set of scenes and emphasize the advantages of including imagery from many months.

2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Shadows encode a powerful geometric cue: if one pixel casts a shadow onto another, then the two p... more Shadows encode a powerful geometric cue: if one pixel casts a shadow onto another, then the two pixels are colinear with the lighting direction. Given many images over many lighting directions, this constraint can be leveraged to recover the depth of a scene from a single viewpoint. For outdoor scenes with solar illumination, we term this the episolar constraint, which provides a convex optimization to solve for the sparse depth of a scene from shadow correspondences, a method to reduce the search space when finding shadow correspondences, and a method to geometrically calibrate a camera using shadow constraints. Our method constructs a dense network of nonlocal constraints which complements recent work on outdoor photometric stereo and cloud based cues for 3D. We demonstrate results across a variety of time-lapse sequences from webcams "in the wild."

2012 IEEE Workshop on the Applications of Computer Vision (WACV), 2012

We introduce the Longterm Observation of Scenes (with Tracks) dataset. This dataset comprises vid... more We introduce the Longterm Observation of Scenes (with Tracks) dataset. This dataset comprises videos taken from streaming outdoor webcams, capturing the same half hour, each day, for over a year. LOST contains rich metadata, including geolocation, day-by-day weather annotation, object detections, and tracking results. We believe that sharing this dataset opens opportunities for computer vision research involving very long-term outdoor surveillance, robust anomaly detection, and scene analysis methods based on trajectories. Efficient analysis of changes in behavior in a scene at very long time scale requires features that summarize large amounts of trajectory data in an economical way. We describe a trajectory clustering algorithm and aggregate statistics about these exemplars through time and show that these statistics exhibit strong correlations with external meta-data, such as weather signals and day of the week.

Methods in enzymology, 2009

Two-photon (2P) microscopy is a high-resolution imaging technique that was initially applied by n... more Two-photon (2P) microscopy is a high-resolution imaging technique that was initially applied by neurobiologists and developmental cell biologists but has subsequently been broadly adapted by immunologists. The value of 2P microscopy is that it affords an unparalleled view of single-cell spatiotemporal dynamics deep within intact tissues and organs. As the technology develops and new transgenic mice and fluorescent probes become available, 2P microscopy will serve as an increasingly valuable tool for assessing cell function and probing molecular mechanisms. Here we discuss the technical aspects related to 2P microscope design, explain in detail various tissue imaging preparations, and walk the reader through the often daunting process of analyzing multidimensional data sets and presenting the experimental results.

2014 IEEE International Conference on Computational Photography (ICCP), 2014

In outdoor images, cast shadows define 3D constraints between the sun, the points casting a shado... more In outdoor images, cast shadows define 3D constraints between the sun, the points casting a shadow, and the surfaces onto which shadows are cast. This cast shadow structure provides a powerful cue for 3D reconstruction, but requires that shadows be tracked over time, and this is difficult as shadows have minimal texture. Thus, we develop a shadow tracking system that enforces geometric consistency for each track and then combines thousands of tracking results to create a 3D model of scene geometry. We demonstrate reconstruction results on a variety of outdoor scenes, including some that show the 3D structure of occluders never directly observed by the camera.

2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010

We explore the use of clouds as a form of structured lighting to capture the 3D structure of outd... more We explore the use of clouds as a form of structured lighting to capture the 3D structure of outdoor scenes observed over time from a static camera. We derive two cues that relate 3D distances to changes in pixel intensity due to clouds shadows. The first cue is primarily spatial, works with low frame-rate time lapses, and supports estimating focal length and scene structure, up to a scale ambiguity. The second cue depends on cloud motion and has a more complex, but still linear, ambiguity. We describe a method that uses the spatial cue to estimate a depth map and a method that combines both cues. Results on time lapses of several outdoor scenes show that these cues enable estimating scene geometry and camera focal length.