Tinne Tuytelaars - Academia.edu (original) (raw)
Uploads
Papers by Tinne Tuytelaars
Mid-level or semi-local features learnt using class-level information are potentially more distin... more Mid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In particular, we mine relevant patterns of local compositions of densely sampled low-level features. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During this process, we pay special attention to keeping all the local histogram information and to selecting the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and an extension to exploit both local and global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the-art results on various image classification benchmarks, including Pascal VOC.
In this work we focus on the problem of image caption generation. We propose an extension of the ... more In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin Guided LSTM or G-LSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search in order to prevent it from favoring short sentences. On various benchmark datasets, we obtain results that are on par with or even outperform the current state-of-the-art.
status: …, Jan 1, 2012
KULeuven. ... reserved. KULeuven - CWIS, Copyright © Katholieke Universiteit Leuven DSpace Softwa... more KULeuven. ... reserved. KULeuven - CWIS, Copyright © Katholieke Universiteit Leuven DSpace Software Copyright © 2002-2005 MIT and Hewlett-Packard - Feedback reacties op de inhoud: DOC | Realisatie: @mire NV | Disclaimer.
Inductive Logic …, Jan 1, 2011
Augmenting vision systems with high-level knowledge and reasoning can improve lower-level vision ... more Augmenting vision systems with high-level knowledge and reasoning can improve lower-level vision processes, such as object detection, with richer and more structured information. In this paper we tackle the problem of delimiting conceptual elements of street views based on spatial relations between lowerlevel components, e.g. the element 'house' is composed of windows and a door in a spatial arrangement. We use structured data: each concept can be seen as a graph representing spatial relations between components, e.g. in terms of right, up, close. We employ distances between logical interpretations to match parts of images with known examples and describe experimental results.
The aim of this paper is fine-grained categorization without human interaction. Different from pr... more The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape. Then, one may proceed to the differential classification by examining the corresponding regions of the alignments. More specifically, the alignments are used to transfer part annotations from training images to unseen images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We further argue that for the distinction of sub-classes, distribution-based features like color Fisher vectors are better suited for describing localized appearance of fine-grained categories than popular matching oriented intensity features, like HOG. They allow capturing the subtle local differences between subclasses, while at the same time being robust to misalignments between distinctive details. We evaluate the local alignments on the CUB-2011 and on the Stanford Dogs datasets, composed of 200 and 120, visually very hard to distinguish bird and dog species. In our experiments we study and show the benefit of the color Fisher vector parameterization, the influence of the alignment partitioning, and the significance of object segmentation on fine-grained categorization. We, furthermore, show that by using object detectors not for detection but as voters to generate object confidence saliency maps, we arrive at fully unsupervised, yet highly accurate fine-grained categorization. The proposed local alignments set a new state-of-the-art on both the finegrained birds and dogs datasets, even without any human intervention. What is more, the local alignments reveal what appearance details are most decisive per fine-grained object category.
Over the years, hand gesture recognition has been mostly addressed considering hand trajectories ... more Over the years, hand gesture recognition has been mostly
addressed considering hand trajectories in isolation. However, in most sign languages, hand gestures are defined on a particular context (body region). We propose a pipeline which models hand movements in the context of other parts of the body captured in the 3D space using the Kinect sensor. In addition, we perform sign recognition based on the different hand postures that occur during a sign. Our experiments show that considering different body parts brings improved performance when compared with methods which only consider global hand trajectories. Finally, we demonstrate that the combination of hand postures features with hand gestures features helps to improve the prediction of a given sign.
Lecture Notes in Electrical Engineering, 2014
Foundations and Trends® in Computer Graphics and Vision, 2007
Lecture Notes in Computer Science, 2003
Lecture Notes in Computer Science, 1997
Lecture Notes in Computer Science, 2001
Computational Imaging and Vision, 2001
2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013
British Machine Vision Conference, 2000
2013 IEEE International Conference on Computer Vision, 2013
Proceedings of the British Machine Vision Conference 2014, 2014
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR'06), 2006
Lecture Notes in Computer Science, 2012
Lecture Notes in Computer Science, 2008
Mid-level or semi-local features learnt using class-level information are potentially more distin... more Mid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In particular, we mine relevant patterns of local compositions of densely sampled low-level features. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During this process, we pay special attention to keeping all the local histogram information and to selecting the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and an extension to exploit both local and global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the-art results on various image classification benchmarks, including Pascal VOC.
In this work we focus on the problem of image caption generation. We propose an extension of the ... more In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin Guided LSTM or G-LSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search in order to prevent it from favoring short sentences. On various benchmark datasets, we obtain results that are on par with or even outperform the current state-of-the-art.
status: …, Jan 1, 2012
KULeuven. ... reserved. KULeuven - CWIS, Copyright © Katholieke Universiteit Leuven DSpace Softwa... more KULeuven. ... reserved. KULeuven - CWIS, Copyright © Katholieke Universiteit Leuven DSpace Software Copyright © 2002-2005 MIT and Hewlett-Packard - Feedback reacties op de inhoud: DOC | Realisatie: @mire NV | Disclaimer.
Inductive Logic …, Jan 1, 2011
Augmenting vision systems with high-level knowledge and reasoning can improve lower-level vision ... more Augmenting vision systems with high-level knowledge and reasoning can improve lower-level vision processes, such as object detection, with richer and more structured information. In this paper we tackle the problem of delimiting conceptual elements of street views based on spatial relations between lowerlevel components, e.g. the element 'house' is composed of windows and a door in a spatial arrangement. We use structured data: each concept can be seen as a graph representing spatial relations between components, e.g. in terms of right, up, close. We employ distances between logical interpretations to match parts of images with known examples and describe experimental results.
The aim of this paper is fine-grained categorization without human interaction. Different from pr... more The aim of this paper is fine-grained categorization without human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape. Then, one may proceed to the differential classification by examining the corresponding regions of the alignments. More specifically, the alignments are used to transfer part annotations from training images to unseen images (supervised alignment), or to blindly yet consistently segment the object in a number of regions (unsupervised alignment). We further argue that for the distinction of sub-classes, distribution-based features like color Fisher vectors are better suited for describing localized appearance of fine-grained categories than popular matching oriented intensity features, like HOG. They allow capturing the subtle local differences between subclasses, while at the same time being robust to misalignments between distinctive details. We evaluate the local alignments on the CUB-2011 and on the Stanford Dogs datasets, composed of 200 and 120, visually very hard to distinguish bird and dog species. In our experiments we study and show the benefit of the color Fisher vector parameterization, the influence of the alignment partitioning, and the significance of object segmentation on fine-grained categorization. We, furthermore, show that by using object detectors not for detection but as voters to generate object confidence saliency maps, we arrive at fully unsupervised, yet highly accurate fine-grained categorization. The proposed local alignments set a new state-of-the-art on both the finegrained birds and dogs datasets, even without any human intervention. What is more, the local alignments reveal what appearance details are most decisive per fine-grained object category.
Over the years, hand gesture recognition has been mostly addressed considering hand trajectories ... more Over the years, hand gesture recognition has been mostly
addressed considering hand trajectories in isolation. However, in most sign languages, hand gestures are defined on a particular context (body region). We propose a pipeline which models hand movements in the context of other parts of the body captured in the 3D space using the Kinect sensor. In addition, we perform sign recognition based on the different hand postures that occur during a sign. Our experiments show that considering different body parts brings improved performance when compared with methods which only consider global hand trajectories. Finally, we demonstrate that the combination of hand postures features with hand gestures features helps to improve the prediction of a given sign.
Lecture Notes in Electrical Engineering, 2014
Foundations and Trends® in Computer Graphics and Vision, 2007
Lecture Notes in Computer Science, 2003
Lecture Notes in Computer Science, 1997
Lecture Notes in Computer Science, 2001
Computational Imaging and Vision, 2001
2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013
British Machine Vision Conference, 2000
2013 IEEE International Conference on Computer Vision, 2013
Proceedings of the British Machine Vision Conference 2014, 2014
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR'06), 2006
Lecture Notes in Computer Science, 2012
Lecture Notes in Computer Science, 2008