ReMOTS: Refining Multi-Object Tracking and Segmentation (1 st Place Solution for MOTS 2020 Challenge 1) 0 (original) (raw)

ReMOTS: Self-Supervised Refining Multi-Object Tracking and Segmentation

ArXiv, 2020

We aim to improve the performance of Multiple Object Tracking and Segmentation (MOTS) by refinement. However, it remains challenging for refining MOTS results, which could be attributed to that appearance features are not adapted to target videos and it is also difficult to find proper thresholds to discriminate them. To tackle this issue, we propose a self-supervised refining MOTS (i.e., ReMOTS) framework. ReMOTS mainly takes four steps to refine MOTS results from the data association perspective. (1) Training the appearance encoder using predicted masks. (2) Associating observations across adjacent frames to form short-term tracklets. (3) Training the appearance encoder using short-term tracklets as reliable pseudo labels. (4) Merging short-term tracklets to long-term tracklets utilizing adopted appearance features and thresholds that are automatically obtained from statistical information. Using ReMOTS, we reached the 1st1^{st}1st place on CVPR 2020 MOTS Challenge 1, with an sMOTSA s...

Supplementary Material for MOTS : Multi-Object Tracking and Segmentation

2019

TrackR-CNN uses association scores based on vectors predicted by an association head to identify the same object across time. In our baseline model, we train this head using a batch hard triplet loss proposed by Hermans et al. [3], which we state again here: Let D denote the set of detections for a video. Each detection d ∈ D has a corresponding association vector ad and is assigned a ground truth track id idd determined by its overlap with the ground truth objects (we only consider detections which sufficiently overlap with a ground truth object here). For a video sequence of T time steps, the association loss in the batch-hard formulation with margin α is then given by

MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking

In the recent past, the computer vision community has developed centralized benchmarks for the performance evaluation of a variety of tasks, including generic object and pedestrian detection, 3D reconstruction, optical flow, single-object short-term tracking, and stereo estimation. Despite potential pitfalls of such benchmarks, they have proved to be extremely helpful to advance the state of the art in the respective area. Interestingly, there has been rather limited work on the standardization of quantitative benchmarks for multiple target tracking. One of the few exceptions is the well-known PETS dataset [20], targeted primarily at surveillance applications. Despite being widely used, it is often applied inconsistently, for example involving using different subsets of the available data, different ways of training the models, or differing evaluation scripts. This paper describes our work toward a novel multiple object tracking benchmark aimed to address such issues. We discuss the...

The Visual Object Tracking VOT2016 challenge results

The Visual Object Tracking VOT2016 challenge results

The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art trackers makes the VOT 2016 the largest and most challenging benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the Appendix. The VOT2016 goes beyond its predecessors by (i) introducing a new semi-automatic ground truth bounding box annotation methodology and (ii) extending the evaluation system with the no-reset experiment. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.

The Visual Object Tracking VOT2014 challenge results

Proceedings, European Conference on Computer Vision (ECCV) Visual Object Tracking Challenge Workshop, 2014

The Visual Object Tracking challenge 2014, VOT2014, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 38 trackers are presented. The number of tested trackers makes VOT 2014 the largest benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2014 challenge that go beyond its VOT2013 predecessor are introduced: (i) a new VOT2014 dataset with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2013 evaluation methodology, (iii) a new unit for tracking speed assessment less dependent on the hardware and (iv) the VOT2014 evaluation toolkit that significantly speeds up execution of experiments. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.