Contextual Object Detection with a Few Relevant Neighbors (original) (raw)

Exploring the Bounds of the Utility of Context for Object Detection

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

The recurring context in which objects appear holds valuable information that can be employed to predict their existence. This intuitive observation indeed led many researchers to endow appearance-based detectors with explicit reasoning about context. The underlying thesis suggests that stronger contextual relations would facilitate greater improvements in detection capacity. In practice, however, the observed improvement in many cases is modest at best, and often only marginal. In this work we seek to improve our understanding of this phenomenon, in part by pursuing an opposite approach. Instead of attempting to improve detection scores by employing context, we treat the utility of context as an optimization problem: to what extent can detection scores be improved by considering context or any other kind of additional information? With this approach we explore the bounds on improvement by using contextual relations between objects and provide a tool for identifying the most helpful ones. We show that simple cooccurrence relations can often provide large gains, while in other cases a significant improvement is simply impossible or impractical with either co-occurrence or more precise spatial relations. To better understand these results we then analyze the ability of context to handle different types of false detections, revealing that tested contextual information cannot ameliorate localization errors, severely limiting its gains. These and additional insights further our understanding on where and why utilization of context for object detection succeeds and fails.

Integrating visual context and object detection within a probabilistic framework

2009

Visual context provides cues about an object's presence, position and size within an observed scene, which are used to increase the performance of object detection techniques. However, state-of-theart methods for context aware object detection could decrease the initial performance. We discuss the reasons for failure and propose a concept that overcomes these limitations, by introducing a novel technique for integrating visual context and object detection. Therefore, we apply the prior probability function of an object detector, that maps the detector's output to probabilities. Together, with an appropriate contextual weighting, a probabilistic framework is established. In addition, we present an extension to state-of-the-art methods to learn scale-dependent visual context information and show how this increases the initial performance. The standard methods and our proposed extensions are compared on a novel, demanding image data set. Results show that visual context facilitates object detection methods.

The Role of Context for Object Detection and Semantic Segmentation in the Wild

2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

In this paper we study the role of context in existing stateof-the-art detection and segmentation approaches. Towards this goal, we label every pixel of PASCAL VOC 2010 detection challenge with a semantic category. We believe this data will provide plenty of challenges to the community, as it contains 520 additional classes for semantic segmentation and object detection. Our analysis shows that nearest neighbor based approaches perform poorly on semantic segmentation of contextual classes, showing the variability of PASCAL imagery. Furthermore, improvements of existing contextual models for detection is rather modest. In order to push forward the performance in this difficult scenario, we propose a novel deformable part-based model, which exploits both local context around each candidate detection as well as global context at the level of the scene. We show that this contextual reasoning significantly helps in detecting objects at all scales.

A Context Aware Deep Learning Architecture for Object Detection

2019

The utility of exploiting contextual information present in scenes to improve the overall performance of deep learning based object detectors is a well accepted fact in the computer vision community. In this work we propose an architecture aimed at learning contextual relationships and improving the precision of existing CNN-based object detectors. An off-the-shelf detector is modified to extract contextual cues present in scenes. We implement a fully convolutional architecture aimed at learning this information. A synthetic image generator is implemented that generates random images while implementing a series of predefined contextual rules, allowing the systematic training of such relationships. Finally, a series of experiments are carried out to evaluate the effectiveness of our design in recognizing such associations by measuring the improvement in average precision.

A Systematic Analysis of a Context Aware Deep Learning Architecture for Object Detection

Belgium-Netherlands Conference on Artificial Intelligence, 2019

The utility of exploiting contextual information present in scenes to improve the overall performance of deep learning based object detectors is a well accepted fact in the computer vision community. In this work we propose an architecture aimed at learning contextual relationships and improving the precision of existing CNN-based object detectors. An off-the-shelf detector is modified to extract contextual cues present in scenes. We implement a fully convolutional architecture aimed at learning this information. A synthetic image generator is implemented that generates random images while implementing a series of predefined contextual rules, allowing the systematic training of such relationships. Finally, a series of experiments are carried out to evaluate the effectiveness of our design in recognizing such associations by measuring the improvement in average precision.

Inner-Scene Similarities as a Contextual Cue for Object Detection

ArXiv, 2017

Using image context is an effective approach for improving object detection. Previously proposed methods used contextual cues that rely on semantic or spatial information. In this work, we explore a different kind of contextual information: inner-scene similarity. We present the CISS (Context by Inner Scene Similarity) algorithm, which is based on the observation that two visually similar sub-image patches are likely to share semantic identities, especially when both appear in the same image. CISS uses base-scores provided by a base detector and performs as a post-detection stage. For each candidate sub-image (denoted anchor), the CISS algorithm finds a few similar sub-images (denoted supporters), and, using them, calculates a new enhanced score for the anchor. This is done by utilizing the base-scores of the supporters and a pre-trained dependency model. The new scores are modeled as a linear function of the base scores of the anchor and the supporters and is estimated using a mini...

segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection

In this paper, we propose an approach that exploits object segmentation in order to improve the accuracy of object detection. We frame the problem as inference in a Markov Random Field, in which each detection hypothesis scores object appearance as well as contextual information using Convolutional Neural Networks, and allows the hypothesis to choose and score a segment out of a large pool of accurate object segmentation proposals. This enables the detector to incorporate additional evidence when it is available and thus results in more accurate detections. Our experiments show an improvement of 4.1% in mAP over the R-CNN baseline on PASCAL VOC 2010, and 3.4% over the current state-of-the-art, demonstrating the power of our approach.

Local Context Priors for Object Proposal Generation

2012

State-of-the-art methods for object detection are mostly based on an expensive exhaustive search over the image at different scales. In order to reduce the computational time, one can perform a selective search to obtain a small subset of relevant object hypotheses that need to be evaluated by the detector. For that purpose, we employ a regression to predict possible object scales and locations by exploiting the local context of an image. Furthermore, we show how a priori information, if available, can be integrated to improve the prediction. The experimental results on three datasets including the Caltech pedestrian and PASCAL VOC dataset show that our method achieves the detection performance of an exhaustive search approach with much less computational load. Since we model the prior distribution over the proposals locally, it generalizes well and can be successfully applied across datasets.

Recovering Hard-to-Find Object Instances by Sampling Context-based Object Proposals

Computer Vision and Image Understanding, 2016

In this paper we focus on improving object detection performance in terms of recall. We propose a post-detection stage during which we explore the image with the objective of recovering missed detections. This exploration is performed by sampling object proposals in the image. We analyse four different strategies to perform this sampling, giving special attention to strategies that exploit spatial relations between objects. In addition, we propose a novel method to discover higher-order relations between groups of objects. Experiments on the challenging KITTI dataset show that our proposed relations-based proposal generation strategies can help improving recall at the cost of a relatively low amount of object proposals.

Context as Supervisory Signal: Discovering Objects with Predictable Context

Lecture Notes in Computer Science, 2014

This paper addresses the well-established problem of unsupervised object discovery with a novel method inspired by weaklysupervised approaches. In particular, the ability of an object patch to predict the rest of the object (its context) is used as supervisory signal to help discover visually consistent object clusters. The main contributions of this work are: 1) framing unsupervised clustering as a leaveone-out context prediction task; 2) evaluating the quality of context prediction by statistical hypothesis testing between thing and stuff appearance models; and 3) an iterative region prediction and context alignment approach that gradually discovers a visual object cluster together with a segmentation mask and fine-grained correspondences. The proposed method outperforms previous unsupervised as well as weaklysupervised object discovery approaches, and is shown to provide correspondences detailed enough to transfer keypoint annotations.