Tinne Tuytelaars - Academia.edu (original) (raw)

Uploads

Papers by Tinne Tuytelaars

Research paper thumbnail of Mining Mid-level Features for Image Classification

Mid-level or semi-local features learnt using class-level information are potentially more distin... more Mid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In particular, we mine relevant patterns of local compositions of densely sampled low-level features. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During this process, we pay special attention to keeping all the local histogram information and to selecting the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and an extension to exploit both local and global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the-art results on various image classification benchmarks, including Pascal VOC.

Research paper thumbnail of Guided Long-Short Term Memory for Image Caption Generation

In this work we focus on the problem of image caption generation. We propose an extension of the ... more In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin Guided LSTM or G-LSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search in order to prevent it from favoring short sentences. On various benchmark datasets, we obtain results that are on par with or even outperform the current state-of-the-art.

Research paper thumbnail of A relational distance-based framework for hierarchical image understanding

status: …, Jan 1, 2012

KULeuven. ... reserved. KULeuven - CWIS, Copyright © Katholieke Universiteit Leuven DSpace Softwa... more KULeuven. ... reserved. KULeuven - CWIS, Copyright © Katholieke Universiteit Leuven DSpace Software Copyright © 2002-2005 MIT and Hewlett-Packard - Feedback reacties op de inhoud: DOC | Realisatie: @mire NV | Disclaimer.

Research paper thumbnail of Not far away from home: a relational distance-based approach to understanding images of houses

Inductive Logic …, Jan 1, 2011

Augmenting vision systems with high-level knowledge and reasoning can improve lower-level vision ... more Augmenting vision systems with high-level knowledge and reasoning can improve lower-level vision processes, such as object detection, with richer and more structured information. In this paper we tackle the problem of delimiting conceptual elements of street views based on spatial relations between lowerlevel components, e.g. the element 'house' is composed of windows and a door in a spatial arrangement. We use structured data: each concept can be seen as a graph representing spatial relations between components, e.g. in terms of right, up, close. We employ distances between logical interpretations to match parts of images with known examples and describe experimental results.

Research paper thumbnail of Local Alignments for Fine-Grained Categorization

The aim of this paper is fine-grained categorization without human interaction. Different from pr... more The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape. Then, one may proceed to the differential classification by examining the corresponding regions of the alignments. More specifically, the alignments are used to transfer part annotations from training images to unseen images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We further argue that for the distinction of sub-classes, distribution-based features like color Fisher vectors are better suited for describing localized appearance of fine-grained categories than popular matching oriented intensity features, like HOG. They allow capturing the subtle local differences between subclasses, while at the same time being robust to misalignments between distinctive details. We evaluate the local alignments on the CUB-2011 and on the Stanford Dogs datasets, composed of 200 and 120, visually very hard to distinguish bird and dog species. In our experiments we study and show the benefit of the color Fisher vector parameterization, the influence of the alignment partitioning, and the significance of object segmentation on fine-grained categorization. We, furthermore, show that by using object detectors not for detection but as voters to generate object confidence saliency maps, we arrive at fully unsupervised, yet highly accurate fine-grained categorization. The proposed local alignments set a new state-of-the-art on both the finegrained birds and dogs datasets, even without any human intervention. What is more, the local alignments reveal what appearance details are most decisive per fine-grained object category.

Research paper thumbnail of Towards Sign Language Recognition based on Body Parts Relations

Over the years, hand gesture recognition has been mostly addressed considering hand trajectories ... more Over the years, hand gesture recognition has been mostly
addressed considering hand trajectories in isolation. However, in most sign languages, hand gestures are defined on a particular context (body region). We propose a pipeline which models hand movements in the context of other parts of the body captured in the 3D space using the Kinect sensor. In addition, we perform sign recognition based on the different hand postures that occur during a sign. Our experiments show that considering different body parts brings improved performance when compared with methods which only consider global hand trajectories. Finally, we demonstrate that the combination of hand postures features with hand gestures features helps to improve the prediction of a given sign.

Research paper thumbnail of Wide Baseline Stereo Matching based on Local

Research paper thumbnail of Real-Time Vision-Based Pedestrian Detection in a Truck’s Blind Spot Zone Using a Warping Window Approach

Lecture Notes in Electrical Engineering, 2014

Research paper thumbnail of Local Invariant Feature Detectors: A Survey

Foundations and Trends® in Computer Graphics and Vision, 2007

Research paper thumbnail of HPAT Indexing for Fast Object/Scene Recognition Based on Local Appearance

Lecture Notes in Computer Science, 2003

Research paper thumbnail of The cascaded Hough transform as support for grouping and finding vanishing points and lines

Lecture Notes in Computer Science, 1997

Research paper thumbnail of Grouping via the Matching of Repeated Patterns

Lecture Notes in Computer Science, 2001

Research paper thumbnail of Local Features for Image Retrieval

Computational Imaging and Vision, 2001

Research paper thumbnail of Pedestrian Detection at Warp Speed: Exceeding 500 Detections per Second

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013

Research paper thumbnail of Wide Baseline Stereo based on Local, A nely invariant Regions

British Machine Vision Conference, 2000

Research paper thumbnail of Fine-Grained Categorization by Alignments

2013 IEEE International Conference on Computer Vision, 2013

Research paper thumbnail of Is 2D Information Enough For Viewpoint Estimation?

Proceedings of the British Machine Vision Conference 2014, 2014

Research paper thumbnail of Towards Multi-View Object Class Detection

2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR'06), 2006

Research paper thumbnail of Camera-Based Fall Detection on Real World Data

Lecture Notes in Computer Science, 2012

Research paper thumbnail of An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

Lecture Notes in Computer Science, 2008

Research paper thumbnail of Mining Mid-level Features for Image Classification

Mid-level or semi-local features learnt using class-level information are potentially more distin... more Mid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In particular, we mine relevant patterns of local compositions of densely sampled low-level features. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During this process, we pay special attention to keeping all the local histogram information and to selecting the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and an extension to exploit both local and global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the-art results on various image classification benchmarks, including Pascal VOC.

Research paper thumbnail of Guided Long-Short Term Memory for Image Caption Generation

In this work we focus on the problem of image caption generation. We propose an extension of the ... more In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin Guided LSTM or G-LSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search in order to prevent it from favoring short sentences. On various benchmark datasets, we obtain results that are on par with or even outperform the current state-of-the-art.

Research paper thumbnail of A relational distance-based framework for hierarchical image understanding

status: …, Jan 1, 2012

KULeuven. ... reserved. KULeuven - CWIS, Copyright © Katholieke Universiteit Leuven DSpace Softwa... more KULeuven. ... reserved. KULeuven - CWIS, Copyright © Katholieke Universiteit Leuven DSpace Software Copyright © 2002-2005 MIT and Hewlett-Packard - Feedback reacties op de inhoud: DOC | Realisatie: @mire NV | Disclaimer.

Research paper thumbnail of Not far away from home: a relational distance-based approach to understanding images of houses

Inductive Logic …, Jan 1, 2011

Augmenting vision systems with high-level knowledge and reasoning can improve lower-level vision ... more Augmenting vision systems with high-level knowledge and reasoning can improve lower-level vision processes, such as object detection, with richer and more structured information. In this paper we tackle the problem of delimiting conceptual elements of street views based on spatial relations between lowerlevel components, e.g. the element 'house' is composed of windows and a door in a spatial arrangement. We use structured data: each concept can be seen as a graph representing spatial relations between components, e.g. in terms of right, up, close. We employ distances between logical interpretations to match parts of images with known examples and describe experimental results.

Research paper thumbnail of Local Alignments for Fine-Grained Categorization

The aim of this paper is fine-grained categorization without human interaction. Different from pr... more The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape. Then, one may proceed to the differential classification by examining the corresponding regions of the alignments. More specifically, the alignments are used to transfer part annotations from training images to unseen images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We further argue that for the distinction of sub-classes, distribution-based features like color Fisher vectors are better suited for describing localized appearance of fine-grained categories than popular matching oriented intensity features, like HOG. They allow capturing the subtle local differences between subclasses, while at the same time being robust to misalignments between distinctive details. We evaluate the local alignments on the CUB-2011 and on the Stanford Dogs datasets, composed of 200 and 120, visually very hard to distinguish bird and dog species. In our experiments we study and show the benefit of the color Fisher vector parameterization, the influence of the alignment partitioning, and the significance of object segmentation on fine-grained categorization. We, furthermore, show that by using object detectors not for detection but as voters to generate object confidence saliency maps, we arrive at fully unsupervised, yet highly accurate fine-grained categorization. The proposed local alignments set a new state-of-the-art on both the finegrained birds and dogs datasets, even without any human intervention. What is more, the local alignments reveal what appearance details are most decisive per fine-grained object category.

Research paper thumbnail of Towards Sign Language Recognition based on Body Parts Relations

Over the years, hand gesture recognition has been mostly addressed considering hand trajectories ... more Over the years, hand gesture recognition has been mostly
addressed considering hand trajectories in isolation. However, in most sign languages, hand gestures are defined on a particular context (body region). We propose a pipeline which models hand movements in the context of other parts of the body captured in the 3D space using the Kinect sensor. In addition, we perform sign recognition based on the different hand postures that occur during a sign. Our experiments show that considering different body parts brings improved performance when compared with methods which only consider global hand trajectories. Finally, we demonstrate that the combination of hand postures features with hand gestures features helps to improve the prediction of a given sign.

Research paper thumbnail of Wide Baseline Stereo Matching based on Local

Research paper thumbnail of Real-Time Vision-Based Pedestrian Detection in a Truck’s Blind Spot Zone Using a Warping Window Approach

Lecture Notes in Electrical Engineering, 2014

Research paper thumbnail of Local Invariant Feature Detectors: A Survey

Foundations and Trends® in Computer Graphics and Vision, 2007

Research paper thumbnail of HPAT Indexing for Fast Object/Scene Recognition Based on Local Appearance

Lecture Notes in Computer Science, 2003

Research paper thumbnail of The cascaded Hough transform as support for grouping and finding vanishing points and lines

Lecture Notes in Computer Science, 1997

Research paper thumbnail of Grouping via the Matching of Repeated Patterns

Lecture Notes in Computer Science, 2001

Research paper thumbnail of Local Features for Image Retrieval

Computational Imaging and Vision, 2001

Research paper thumbnail of Pedestrian Detection at Warp Speed: Exceeding 500 Detections per Second

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013

Research paper thumbnail of Wide Baseline Stereo based on Local, A nely invariant Regions

British Machine Vision Conference, 2000

Research paper thumbnail of Fine-Grained Categorization by Alignments

2013 IEEE International Conference on Computer Vision, 2013

Research paper thumbnail of Is 2D Information Enough For Viewpoint Estimation?

Proceedings of the British Machine Vision Conference 2014, 2014

Research paper thumbnail of Towards Multi-View Object Class Detection

2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR'06), 2006

Research paper thumbnail of Camera-Based Fall Detection on Real World Data

Lecture Notes in Computer Science, 2012

Research paper thumbnail of An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

Lecture Notes in Computer Science, 2008