Unsupervised learning of supervoxel embeddings for video Segmentation (original) (raw)
Related papers
A STUDY OF ACTOR AND ACTION SEMANTIC RETENTION IN VIDEO SUPERVOXEL SEGMENTATION
International Journal of Semantic Computing, 2013
Existing methods in the semantic computer vision community seem unable to deal with the explosion and richness of modern, open-source and social video content. Although sophisticated methods such as object detection or bag-of-words models have been well studied, they typically operate on low level features and ultimately suffer from either scalability issues or a lack of semantic meaning. On the other hand, video supervoxel segmentation has recently been established and applied to large scale data processing, which potentially serves as an intermediate representation to high level video semantic extraction. The supervoxels are rich decompositions of the video content: they capture object shape and motion well. However, it is not yet known if the supervoxel segmentation retains the semantics of the underlying video content. In this paper, we conduct a systematic study of how well the actor and action semantics are retained in video supervoxel segmentation. Our study has human observers watching supervoxel segmentation videos and trying to discriminate both actor (human or animal) and action (one of eight everyday actions). We gather and analyze a large set of 640 human perceptions over 96 videos in 3 different supervoxel scales. Furthermore, we conduct machine recognition experiments on a feature defined on supervoxel segmentation, called supervoxel shape context, which is inspired by the higher order processes in human perception. Our ultimate findings suggest that a significant amount of semantics have been well retained in the video supervoxel segmentation and can be used for further video analysis.
SEEK: A Framework of Superpixel Learning with CNN Features for Unsupervised Segmentation
Electronics, 2020
Supervised semantic segmentation algorithms have been a hot area of exploration recently, but now the attention is being drawn towards completely unsupervised semantic segmentation. In an unsupervised framework, neither the targets nor the ground truth labels are provided to the network. That being said, the network is unaware about any class instance or object present in the given data sample. So, we propose a convolutional neural network (CNN) based architecture for unsupervised segmentation. We used the squeeze and excitation network, due to its peculiar ability to capture the features’ interdependencies, which increases the network’s sensitivity to more salient features. We iteratively enable our CNN architecture to learn the target generated by a graph-based segmentation method, while simultaneously preventing our network from falling into the pit of over-segmentation. Along with this CNN architecture, image enhancement and refinement techniques are exploited to improve the seg...
Super-Trajectory for Video Segmentation
2017 IEEE International Conference on Computer Vision (ICCV), 2017
We introduce a novel semi-supervised video segmentation approach based on an efficient video representation, called as "super-trajectory". Each super-trajectory corresponds to a group of compact trajectories that exhibit consistent motion patterns, similar appearance and close spatiotemporal relationships. We generate trajectories using a probabilistic model, which handles occlusions and drifts in a robust and natural way. To reliably group trajectories, we adopt a modified version of the density peaks based clustering algorithm that allows capturing rich spatiotemporal relations among trajectories in the clustering process. The presented video representation is discriminative enough to accurately propagate the initial annotations in the first frame onto the remaining video frames. Extensive experimental analysis on challenging benchmarks demonstrate our method is capable of distinguishing the target objects from complex backgrounds and even reidentifying them after occlusions.
Video Object Segmentation via Super-trajectories
2017
We introduce a novel semi-supervised video segmentation approach based on an efficient video representation, called as "super-trajectory". Each super-trajectory corresponds to a group of compact trajectories that exhibit consistent motion patterns, similar appearance and close spatiotemporal relationships. We generate trajectories using a probabilistic model, which handles occlusions and drifts in a robust and natural way. To reliably group trajectories, we adopt a modified version of the density peaks based clustering algorithm that allows capturing rich spatiotemporal relations among trajectories in the clustering process. The presented video representation is discriminative enough to accurately propagate the initial annotations in the first frame onto the remaining video frames. Extensive experimental analysis on challenging benchmarks demonstrate our method is capable of distinguishing the target objects from complex backgrounds and even reidentifying them after long-t...
Superframes, A Temporal Video Segmentation
2018 24th International Conference on Pattern Recognition (ICPR), 2018
The goal of video segmentation is to turn video data into a set of concrete motion clusters that can be easily interpreted as building blocks of the video. There are some works on similar topics like detecting scene cuts in a video, but there is few specific research on clustering video data into the desired number of compact segments. It would be more intuitive, and more efficient, to work with perceptually meaningful entity obtained from a low-level grouping process which we call it 'superframe'. This paper presents a new simple and efficient technique to detect superframes of similar content patterns in videos. We calculate the similarity of content-motion to obtain the strength of change between consecutive frames. With the help of existing optical flow technique using deep models, the proposed method is able to perform more accurate motion estimation efficiently. We also propose two criteria for measuring and comparing the performance of different algorithms on various databases. Experimental results on the videos from benchmark databases have demonstrated the effectiveness of the proposed method.
Learning contextual superpixel similarity for consistent image segmentation
Multimedia Tools and Applications
This paper addresses the problem of image segmentation by iterative region aggregations starting from an initial superpixel decomposition. Classical approaches for this task compute superpixel similarity using distance measures between superpixel descriptor vectors. This usually poses the well-known problem of the semantic gap and fails to properly aggregate visually non-homogeneous superpixels that belong to the same high-level object. This work proposes to use random forests to learn the merging probability between adjacent superpixels in order to overcome the aforementioned issues. Compared to existing works, this approach learns the fusion rules without explicit similarity measure computation. We also introduce a new superpixel context descriptor to strengthen the learned characteristics towards better similarity prediction. Image segmentation is then achieved by iteratively merging the most similar superpixel pairs selected using a similarity weighting objective function. Experimental results of our approach on four datasets including DAVIS 2017 and ISIC 2018 show its potential compared to state-of-the-art approaches.
Multiple Hypothesis Video Segmentation from Superpixel Flows
2010
Multiple Hypothesis Video Segmentation (MHVS) is a method for the unsupervised photometric segmentation of video sequences. MHVS segments arbitrarily long video streams by considering only a few frames at a time, and handles the automatic creation, continuation and termination of labels with no user initialization or supervision. The process begins by generating several pre-segmentations per frame and enumerating multiple possible trajectories of pixel regions within a short time window. After assigning each trajectory a score, we let the trajectories compete with each other to segment the sequence. We determine the solution of this segmentation problem as the MAP labeling of a higher-order random field. This framework allows MHVS to achieve spatial and temporal long-range label consistency while operating in an on-line manner. We test MHVS on several videos of natural scenes with arbitrary camera and object motion.
Semi-Supervised Video Object Segmentation with Super-Trajectories
IEEE Transactions on Pattern Analysis and Machine Intelligence
We introduce a semi-supervised video segmentation approach based on an efficient video representation, called as "super-trajectory". A super-trajectory corresponds to a group of compact point trajectories that exhibit consistent motion patterns, similar appearances, and close spatiotemporal relationships. We generate the compact trajectories using a probabilistic model, which enables handling of occlusions and drifts effectively. To reliably group point trajectories, we adopt a modified version of the density peaks based clustering algorithm that allows capturing rich spatiotemporal relations among trajectories in the clustering process. We incorporate two intuitive mechanisms for segmentation, called as reverse-tracking and object re-occurrence, for robustness and boosting the performance. Building on the proposed video representation, our segmentation method is discriminative enough to accurately propagate the initial annotations in the first frame onto the remaining frames. Our extensive experimental analyses on three challenging benchmarks demonstrate that, given the annotation in the first frame, our method is capable of extracting the target objects from complex backgrounds, and even reidentifying them after prolonged occlusions, producing high-quality video object segments.
Graph-based hierarchical video segmentation based on a simple dissimilarity measure
2014
Hierarchical video segmentation provides region-oriented scale-space, i.e., a set of video segmentations at different detail levels in which the segmentations at finer levels are nested with respect to those at coarser levels. In this work, the hierarchical video segmentation is transformed into a graph partitioning problem in which each part corresponds to one supervoxel of the video, and we present a new methodology for hierarchical video segmentation which computes a hierarchy of partitions by a reweighting of the original graph using a simple dissimilarity measure in which a not too coarse segmentation can be easily inferred. We also provide an extensive comparative analysis, considering quantitative assessments showing accuracy, ease of use, and temporal coherence of our methods -p-HOScale, cp-HOScale and 2cp-HOScale. According to the experiments, the hierarchy inferred by our methods produces good quantitative results when applied to video segmentation. Moreover, unlike to other tested methods, space and time cost of our methods are not influenced by the number of supervoxels to be computed.