Justus Piater - Academia.edu (original) (raw)

Papers by Justus Piater

Research paper thumbnail of Supervised Learning of Gesture-Action Associations for Human-Robot Collaboration

2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)

Research paper thumbnail of Autonomous skill-centric testing using deep learning

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Software testing is an important tool to ensure software quality. This is a hard task in robotics... more Software testing is an important tool to ensure software quality. This is a hard task in robotics due to dynamic environments and the expensive development and timeconsuming execution of test cases. Most testing approaches use model-based and / or simulation-based testing to overcome these problems. We propose model-free skill-centric testing in which a robot autonomously executes skills in the real world and compares it to previous experiences. The skills are selected by maximising the expected information gain on the distribution of erroneous software functions. We use deep learning to model the sensor data observed during previous successful skill executions and to detect irregularities. Sensor data is connected to function call profiles such that certain misbehaviour can be related to specific functions. We evaluate our approach in simulation and in experiments with a KUKA LWR 4+ robot by purposefully introducing bugs to the software. We demonstrate that these bugs can be detected with high accuracy and without the need for the implementation of specific tests or task-specific models.

Research paper thumbnail of Visual task outcome verification using deep learning

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

Manipulation tasks requiring high precision are difficult for reasons such as imprecise calibrati... more Manipulation tasks requiring high precision are difficult for reasons such as imprecise calibration and perceptual inaccuracies. We present a method for visual task outcome verification that provides an assessment of the task status as well as information for the robot to improve this status. The final status of the task is assessed as success, failure or in progress. We propose a deep learning strategy to learn the task with a small number of training episodes and without requiring the robot. A probabilistic, appearance-based pose estimation method is used to learn the demonstrated task. For real-data efficiency, synthetic training images are created around the trajectory of the demonstrated task. We show that our method can estimate the task status with high accuracy in several instances of different tasks, and demonstrate the accuracy of a high-precision task on a real robot.

Research paper thumbnail of Initial State Prediction in Planning

While recent advances in offline reasoning techniques and online execution strategies have made p... more While recent advances in offline reasoning techniques and online execution strategies have made planning under uncertainty more robust, the application of plans in partially-known environments is still a difficult and important topic. In this paper we present an approach for predicting new information about a partially-known initial state, represented as a multi- graph utilizing Maximum-Margin Multi-Valued Regression. We evaluate this approach in four different domains, demonstrating high recall and accuracy.

Research paper thumbnail of FlexRoP – flexible, assistive robots for customized production

Proceedings of the Austrian Robotics Workshop 2018, 2018

Flexible production assistants of the future are required to be skillful, universally applicable,... more Flexible production assistants of the future are required to be skillful, universally applicable, safe and easy to program. State of the art robot systems that are intended to be used for human robot collaboration require in some cases unintuitive text based programming, and remain, especially in combination with peripheral hardware like external sensors or machine vision algorithms, complicated. The FlexRoP project tries to overcome current limitations by development and usage of a flexible skill-based robot programming middleware and improved user interface technologies. This paper introduces usecases, the intended system architecture, methodology for description and training of kinesthetic skills as well as first application results and intentions for future developments.

Research paper thumbnail of A Block-based IDE Extension for the ESP32

Robotics with state-of-the-art microcontrollers leads to plenty of unique opportunities for compu... more Robotics with state-of-the-art microcontrollers leads to plenty of unique opportunities for computer science education at school. This paper introduces an ESP32 extension to Ardublockly specifically designed for educational purposes. It discusses the advantages of blockbased programming for school education and presents the new key features within the IDE. To demonstrate the capabilities of the developed extension for computer science education in schools, an exercise in the field of swarm robotics was developed.

Research paper thumbnail of A Visual Intelligence Scheme for Hard Drive Disassembly in Automated Recycling Routines

Proceedings of the International Conference on Robotics, Computer Vision and Intelligent Systems, 2020

As the state-of-the-art deep learning models are taking the leap to generalize and leverage autom... more As the state-of-the-art deep learning models are taking the leap to generalize and leverage automation, they are becoming useful in real-world tasks such as disassembly of devices by robotic manipulation. We address the problem of analyzing the visual scenes on industrial-grade tasks, for example, automated robotic recycling of a computer hard drive with small components and little space for manipulation. We implement a supervised learning architecture combining deep neural networks and standard pointcloud processing for detecting and recognizing hard drives parts, screws, and gaps. We evaluate the architecture on a custom hard drive dataset and reach an accuracy higher than 75% in every component used in our pipeline. Additionally, we show that the pipeline can generalize on damaged hard drives. Our approach combining several specialized modules can provide a robust description of a device usable for manipulation by a robotic system. To our knowledge, we are the pioneers to offer a complete scheme to address the entire disassembly process of the chosen device. To facilitate the pursuit of this issue of global concern, we provide a taxonomy for the target device to be used in automated disassembly scenarios and publish our collected dataset and code.

Research paper thumbnail of Action representations in robotics: A taxonomy and systematic classification

The International Journal of Robotics Research, 2019

Understanding and defining the meaning of “action” is substantial for robotics research. This bec... more Understanding and defining the meaning of “action” is substantial for robotics research. This becomes utterly evident when aiming at equipping autonomous robots with robust manipulation skills for action execution. Unfortunately, to this day we still lack both a clear understanding of the concept of an action and a set of established criteria that ultimately characterize an action. In this survey, we thus first review existing ideas and theories on the notion and meaning of action. Subsequently, we discuss the role of action in robotics and attempt to give a seminal definition of action in accordance with its use in robotics research. Given this definition we then introduce a taxonomy for categorizing action representations in robotics along various dimensions. Finally, we provide a meticulous literature survey on action representations in robotics where we categorize relevant literature along our taxonomy. After discussing the current state of the art we conclude with an outlook to...

Research paper thumbnail of Integration of Probabilistic Pose Estimates from Multiple Views

Lecture Notes in Computer Science, 2016

Research paper thumbnail of The Effects of Social Gaze in Human-Robot Collaborative Assembly

Social Robotics, 2015

In this paper we explore how social gaze in an assembly robot affects how naïve users interact wi... more In this paper we explore how social gaze in an assembly robot affects how naïve users interact with it. In a controlled experimental study, 30 participants instructed an industrial robot to fetch parts needed to assemble a wooden toolbox. Participants either interacted with a robot employing a simple gaze following the movements of its own arm, or with a robot that follows its own movements during tasks, but which also gazes at the participant between instructions. Our qualitative and quantitative analyses show that people in the social gaze condition are significantly more quick to engage the robot, smile significantly more often, and can better account for where the robot is looking. In addition, we find people in the social gaze condition to feel more responsible for the task performance. We conclude that social gaze in assembly scenarios fulfills floor management functions and provides an indicator for the robot's affordance, yet that it does not influence likability, mutual interest and suspected competence of the robot.

Research paper thumbnail of General Object Tip Detection and Pose Estimation for Robot Manipulation

Lecture Notes in Computer Science, 2015

Robot manipulation tasks like inserting screws and pegs into a hole or automatic screwing require... more Robot manipulation tasks like inserting screws and pegs into a hole or automatic screwing require precise tip pose estimation. We propose a novel method to detect and estimate the tip of elongated objects. We demonstrate that our method can estimate tip pose to millimeter-level accuracy. We adopt a probabilistic, appearance-based object detection framework to detect pegs and bits for electric screw drivers. Screws are difficult to detect with feature-or appearancebased methods due to their reflective characteristics. To overcome this we propose a novel adaptation of RANSAC with a parallel-line model. Subsequently, we employ image moments to detect the tip and its pose. We show that the proposed method allows a robot to perform object insertion with only two pairs of orthogonal views, without visual servoing.

Research paper thumbnail of A multi-view hand gesture RGB-D dataset for human-robot interaction scenarios

2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2016

Understanding semantic meaning from hand gestures is a challenging but essential task in human-ro... more Understanding semantic meaning from hand gestures is a challenging but essential task in human-robot interaction scenarios. In this paper we present a baseline evaluation of the Innsbruck Multi-View Hand Gesture (IMHG) dataset [1] recorded with two RGB-D cameras (Kinect). As a baseline, we adopt a probabilistic appearance-based framework [2] to detect a hand gesture and estimate its pose using two cameras. The dataset consists of two types of deictic gestures with the ground truth location of the target, two symbolic gestures, two manipulative gestures, and two interactional gestures. We discuss the effect of parallax due to the offset between head and hand while performing deictic gestures. Furthermore, we evaluate the proposed framework to estimate the potential referents on the Innsbruck Pointing at Objects (IPO) dataset [2].

Research paper thumbnail of \"OAGM/AAPR 2013 - The 37th Annual Workshop of the Austrian Association for Pattern Recognition

arXiv (Cornell University), May 25, 2013

The 37th Annual Workshop of the Austrian Association for Pattern Recognition took place May 23-24... more The 37th Annual Workshop of the Austrian Association for Pattern Recognition took place May 23-24, 2013, in the Plenary Hall of the new City Hall of Innsbruck, Austria, under the motto Pattern Recognition and Computer Vision in Action, and was attended by 56 participants.

Research paper thumbnail of Evaluating the progress of deep learning for visual relational concepts

Journal of Vision, 2021

Convolutional Neural Networks (CNNs) have become the state of the art method for image classifica... more Convolutional Neural Networks (CNNs) have become the state of the art method for image classification in the last ten years. Despite the fact that they achieve superhuman classification accuracy on many popular datasets, they often perform much worse on more abstract image classification tasks. We will show that these difficult tasks are linked to relational concepts from cognitive psychology and that despite progress over the last few years, such relational reasoning tasks still remain difficult for current neural network architectures. We will review deep learning research that is linked to relational concept learning, even if it was not originally presented from this angle. Reviewing the current literature, we will argue that some form of attention will be an important component of future systems to solve relational tasks. In addition, we will point out the shortcomings of currently used datasets, and we will recommend steps to make future datasets more relevant for testing systems on relational reasoning.

Research paper thumbnail of Semi-Autonomous 3 rd-Hand Robot

We present the principles, current work and plans for the EU-FP7 Project Semi Autonomous 3rd-Hand... more We present the principles, current work and plans for the EU-FP7 Project Semi Autonomous 3rd-Hand Robot. In this project, we pursue a breakthrough in flexible manufacturing by developing a symbiotic robot assistant that acts as a third hand of a human worker. It will be straightforward to instruct even by untrained workers and allow for efficient knowledge transfer between tasks. We demonstrate its efficiency in the collaborative assembly of furniture.

Research paper thumbnail of SCurV: A 3D descriptor for object classification

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015

3D Object recognition is one of the big problems in Computer Vision which has a direct impact in ... more 3D Object recognition is one of the big problems in Computer Vision which has a direct impact in Robotics. There have been great advances in the last decade thanks to point cloud descriptors. These descriptors do very well at recognizing object instances in a wide variety of situations. Of great interest is also to know how descriptors perform in object classification tasks. With that idea in mind, we introduce a descriptor designed for the representation of object classes. Our descriptor, named SCurV, exploits 3D shape information and is inspired by recent findings from neurophysiology. We compute and incorporate surface curvatures and distributions of local surface point projections that represent flatness, concavity and convexity in a 3D object-centered and view-dependent descriptor. These different sources of information are combined in a novel and simple, yet effective, way of combining different features to improve classification results which can be extended to the combination of any type of descriptor. Our experimental setup compares SCurV with other recent descriptors on a large classification task. Using a large and heterogeneous database of 3D objects, we perform our experiments both on a classical, flat classification task and within a novel framework for hierarchical classification. On both tasks, the SCurV descriptor outperformed all other 3D descriptors tested.

Research paper thumbnail of Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks

Lecture Notes in Computer Science, 2006

Approximate Policy Iteration (API) is a reinforcement learning paradigm that is able to solve hig... more Approximate Policy Iteration (API) is a reinforcement learning paradigm that is able to solve high-dimensional, continuous control problems. We propose to exploit API for the closed-loop learning of mappings from images to actions. This approach requires a family of function approximators that maps visual percepts to a real-valued function. For this purpose, we use Regression Extra-Trees, a fast, yet accurate and versatile machine learning algorithm. The inputs of the Extra-Trees consist of a set of visual features that digest the informative patterns in the visual signal. We also show how to parallelize the Extra-Tree learning process to further reduce the computational expense, which is often essential in visual tasks. Experimental results on real-world images are given that indicate that the combination of API with Extra-Trees is a promising framework for the interactive learning of visual tasks.

Research paper thumbnail of Beyond Simple and Complex Neurons: Towards Intermediate-level Representations of Shapes and Objects

KI - Künstliche Intelligenz, 2014

Knowledge of the brain has much advanced since the concept of the neuron doctrine developed by Ra... more Knowledge of the brain has much advanced since the concept of the neuron doctrine developed by Ramón y Cajal (R Trim Histol Norm Patol 1:33-49, 1888). Over the last six decades a wide range of functionalities of neurons in the visual cortex have been identified. These neurons can be hierarchically organized into areas since neurons cluster according to structural properties and related function. The neurons in such areas can be characterized to a first order approximation by their (static) receptive field function, viz their filter characteristic implemented by their connection weights to neighboring cells. This paper aims to provide insights on the steps that computer models in our opinion must pursue in order to develop robust recognition mechanisms that mimic biological processing capabilities beyond the level of cells with classical simple and complex receptive field response properties. We stress the importance of intermediate-level representations to achieve higher-level object abstraction in the context of feature representations, and summarize two current approaches that we consider are advances toward achieving that goal.

Research paper thumbnail of Toward learning visual discrimination strategies

Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)

Humans learn strategies for visual discrimination through interaction with their environment. Dis... more Humans learn strategies for visual discrimination through interaction with their environment. Discrimination skills are refined as demanded by the task at hand, and are not a priori determined by any particular feature set. Tasks are typically incompletely specified and evolve continually. This work presents a general framework for learning visual discrimination that addresses some of these characteristics. It is based on an infinite combinatorial feature space consisting of primitive features such as oriented edgels and texture signatures, and compositions thereof. Features are progressively sampled from this space in a simple-to-complex manner. A simple recognition procedure queries learned features one by one and rules out candidate object classes that do not sufficiently exhibit the queried feature. Training images are presented sequentially to the learning system, which incrementally discovers features for recognition. Experimental results on two databases of geometric objects illustrate the applicability of the framework.

Research paper thumbnail of 3D Object Class Geometry Modeling with Spatial Latent Dirichlet Markov Random Fields

Lecture Notes in Computer Science, 2013

This paper presents a novel part-based geometry model for 3D object classes based on latent Diric... more This paper presents a novel part-based geometry model for 3D object classes based on latent Dirichlet allocation (LDA). With all object instances of the same category aligned to a canonical pose, the bounding box is discretized to form a 3D space dictionary for LDA. To enhance the spatial coherence of each part during model learning, we extend LDA by strategically constructing a Markov random field (MRF) on the part labels, and adding an extra spatial parameter for each part. We refer to the improved model as spatial latent Dirichlet Markov random fields (SLDMRF). The experimental results demonstrate that SLDMRF exhibits superior semantic interpretation and discriminative ability in model classification to LDA and other related models.

Research paper thumbnail of Supervised Learning of Gesture-Action Associations for Human-Robot Collaboration

2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)

Research paper thumbnail of Autonomous skill-centric testing using deep learning

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Software testing is an important tool to ensure software quality. This is a hard task in robotics... more Software testing is an important tool to ensure software quality. This is a hard task in robotics due to dynamic environments and the expensive development and timeconsuming execution of test cases. Most testing approaches use model-based and / or simulation-based testing to overcome these problems. We propose model-free skill-centric testing in which a robot autonomously executes skills in the real world and compares it to previous experiences. The skills are selected by maximising the expected information gain on the distribution of erroneous software functions. We use deep learning to model the sensor data observed during previous successful skill executions and to detect irregularities. Sensor data is connected to function call profiles such that certain misbehaviour can be related to specific functions. We evaluate our approach in simulation and in experiments with a KUKA LWR 4+ robot by purposefully introducing bugs to the software. We demonstrate that these bugs can be detected with high accuracy and without the need for the implementation of specific tests or task-specific models.

Research paper thumbnail of Visual task outcome verification using deep learning

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

Manipulation tasks requiring high precision are difficult for reasons such as imprecise calibrati... more Manipulation tasks requiring high precision are difficult for reasons such as imprecise calibration and perceptual inaccuracies. We present a method for visual task outcome verification that provides an assessment of the task status as well as information for the robot to improve this status. The final status of the task is assessed as success, failure or in progress. We propose a deep learning strategy to learn the task with a small number of training episodes and without requiring the robot. A probabilistic, appearance-based pose estimation method is used to learn the demonstrated task. For real-data efficiency, synthetic training images are created around the trajectory of the demonstrated task. We show that our method can estimate the task status with high accuracy in several instances of different tasks, and demonstrate the accuracy of a high-precision task on a real robot.

Research paper thumbnail of Initial State Prediction in Planning

While recent advances in offline reasoning techniques and online execution strategies have made p... more While recent advances in offline reasoning techniques and online execution strategies have made planning under uncertainty more robust, the application of plans in partially-known environments is still a difficult and important topic. In this paper we present an approach for predicting new information about a partially-known initial state, represented as a multi- graph utilizing Maximum-Margin Multi-Valued Regression. We evaluate this approach in four different domains, demonstrating high recall and accuracy.

Research paper thumbnail of FlexRoP – flexible, assistive robots for customized production

Proceedings of the Austrian Robotics Workshop 2018, 2018

Flexible production assistants of the future are required to be skillful, universally applicable,... more Flexible production assistants of the future are required to be skillful, universally applicable, safe and easy to program. State of the art robot systems that are intended to be used for human robot collaboration require in some cases unintuitive text based programming, and remain, especially in combination with peripheral hardware like external sensors or machine vision algorithms, complicated. The FlexRoP project tries to overcome current limitations by development and usage of a flexible skill-based robot programming middleware and improved user interface technologies. This paper introduces usecases, the intended system architecture, methodology for description and training of kinesthetic skills as well as first application results and intentions for future developments.

Research paper thumbnail of A Block-based IDE Extension for the ESP32

Robotics with state-of-the-art microcontrollers leads to plenty of unique opportunities for compu... more Robotics with state-of-the-art microcontrollers leads to plenty of unique opportunities for computer science education at school. This paper introduces an ESP32 extension to Ardublockly specifically designed for educational purposes. It discusses the advantages of blockbased programming for school education and presents the new key features within the IDE. To demonstrate the capabilities of the developed extension for computer science education in schools, an exercise in the field of swarm robotics was developed.

Research paper thumbnail of A Visual Intelligence Scheme for Hard Drive Disassembly in Automated Recycling Routines

Proceedings of the International Conference on Robotics, Computer Vision and Intelligent Systems, 2020

As the state-of-the-art deep learning models are taking the leap to generalize and leverage autom... more As the state-of-the-art deep learning models are taking the leap to generalize and leverage automation, they are becoming useful in real-world tasks such as disassembly of devices by robotic manipulation. We address the problem of analyzing the visual scenes on industrial-grade tasks, for example, automated robotic recycling of a computer hard drive with small components and little space for manipulation. We implement a supervised learning architecture combining deep neural networks and standard pointcloud processing for detecting and recognizing hard drives parts, screws, and gaps. We evaluate the architecture on a custom hard drive dataset and reach an accuracy higher than 75% in every component used in our pipeline. Additionally, we show that the pipeline can generalize on damaged hard drives. Our approach combining several specialized modules can provide a robust description of a device usable for manipulation by a robotic system. To our knowledge, we are the pioneers to offer a complete scheme to address the entire disassembly process of the chosen device. To facilitate the pursuit of this issue of global concern, we provide a taxonomy for the target device to be used in automated disassembly scenarios and publish our collected dataset and code.

Research paper thumbnail of Action representations in robotics: A taxonomy and systematic classification

The International Journal of Robotics Research, 2019

Understanding and defining the meaning of “action” is substantial for robotics research. This bec... more Understanding and defining the meaning of “action” is substantial for robotics research. This becomes utterly evident when aiming at equipping autonomous robots with robust manipulation skills for action execution. Unfortunately, to this day we still lack both a clear understanding of the concept of an action and a set of established criteria that ultimately characterize an action. In this survey, we thus first review existing ideas and theories on the notion and meaning of action. Subsequently, we discuss the role of action in robotics and attempt to give a seminal definition of action in accordance with its use in robotics research. Given this definition we then introduce a taxonomy for categorizing action representations in robotics along various dimensions. Finally, we provide a meticulous literature survey on action representations in robotics where we categorize relevant literature along our taxonomy. After discussing the current state of the art we conclude with an outlook to...

Research paper thumbnail of Integration of Probabilistic Pose Estimates from Multiple Views

Lecture Notes in Computer Science, 2016

Research paper thumbnail of The Effects of Social Gaze in Human-Robot Collaborative Assembly

Social Robotics, 2015

In this paper we explore how social gaze in an assembly robot affects how naïve users interact wi... more In this paper we explore how social gaze in an assembly robot affects how naïve users interact with it. In a controlled experimental study, 30 participants instructed an industrial robot to fetch parts needed to assemble a wooden toolbox. Participants either interacted with a robot employing a simple gaze following the movements of its own arm, or with a robot that follows its own movements during tasks, but which also gazes at the participant between instructions. Our qualitative and quantitative analyses show that people in the social gaze condition are significantly more quick to engage the robot, smile significantly more often, and can better account for where the robot is looking. In addition, we find people in the social gaze condition to feel more responsible for the task performance. We conclude that social gaze in assembly scenarios fulfills floor management functions and provides an indicator for the robot's affordance, yet that it does not influence likability, mutual interest and suspected competence of the robot.

Research paper thumbnail of General Object Tip Detection and Pose Estimation for Robot Manipulation

Lecture Notes in Computer Science, 2015

Robot manipulation tasks like inserting screws and pegs into a hole or automatic screwing require... more Robot manipulation tasks like inserting screws and pegs into a hole or automatic screwing require precise tip pose estimation. We propose a novel method to detect and estimate the tip of elongated objects. We demonstrate that our method can estimate tip pose to millimeter-level accuracy. We adopt a probabilistic, appearance-based object detection framework to detect pegs and bits for electric screw drivers. Screws are difficult to detect with feature-or appearancebased methods due to their reflective characteristics. To overcome this we propose a novel adaptation of RANSAC with a parallel-line model. Subsequently, we employ image moments to detect the tip and its pose. We show that the proposed method allows a robot to perform object insertion with only two pairs of orthogonal views, without visual servoing.

Research paper thumbnail of A multi-view hand gesture RGB-D dataset for human-robot interaction scenarios

2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2016

Understanding semantic meaning from hand gestures is a challenging but essential task in human-ro... more Understanding semantic meaning from hand gestures is a challenging but essential task in human-robot interaction scenarios. In this paper we present a baseline evaluation of the Innsbruck Multi-View Hand Gesture (IMHG) dataset [1] recorded with two RGB-D cameras (Kinect). As a baseline, we adopt a probabilistic appearance-based framework [2] to detect a hand gesture and estimate its pose using two cameras. The dataset consists of two types of deictic gestures with the ground truth location of the target, two symbolic gestures, two manipulative gestures, and two interactional gestures. We discuss the effect of parallax due to the offset between head and hand while performing deictic gestures. Furthermore, we evaluate the proposed framework to estimate the potential referents on the Innsbruck Pointing at Objects (IPO) dataset [2].

Research paper thumbnail of \"OAGM/AAPR 2013 - The 37th Annual Workshop of the Austrian Association for Pattern Recognition

arXiv (Cornell University), May 25, 2013

The 37th Annual Workshop of the Austrian Association for Pattern Recognition took place May 23-24... more The 37th Annual Workshop of the Austrian Association for Pattern Recognition took place May 23-24, 2013, in the Plenary Hall of the new City Hall of Innsbruck, Austria, under the motto Pattern Recognition and Computer Vision in Action, and was attended by 56 participants.

Research paper thumbnail of Evaluating the progress of deep learning for visual relational concepts

Journal of Vision, 2021

Convolutional Neural Networks (CNNs) have become the state of the art method for image classifica... more Convolutional Neural Networks (CNNs) have become the state of the art method for image classification in the last ten years. Despite the fact that they achieve superhuman classification accuracy on many popular datasets, they often perform much worse on more abstract image classification tasks. We will show that these difficult tasks are linked to relational concepts from cognitive psychology and that despite progress over the last few years, such relational reasoning tasks still remain difficult for current neural network architectures. We will review deep learning research that is linked to relational concept learning, even if it was not originally presented from this angle. Reviewing the current literature, we will argue that some form of attention will be an important component of future systems to solve relational tasks. In addition, we will point out the shortcomings of currently used datasets, and we will recommend steps to make future datasets more relevant for testing systems on relational reasoning.

Research paper thumbnail of Semi-Autonomous 3 rd-Hand Robot

We present the principles, current work and plans for the EU-FP7 Project Semi Autonomous 3rd-Hand... more We present the principles, current work and plans for the EU-FP7 Project Semi Autonomous 3rd-Hand Robot. In this project, we pursue a breakthrough in flexible manufacturing by developing a symbiotic robot assistant that acts as a third hand of a human worker. It will be straightforward to instruct even by untrained workers and allow for efficient knowledge transfer between tasks. We demonstrate its efficiency in the collaborative assembly of furniture.

Research paper thumbnail of SCurV: A 3D descriptor for object classification

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015

3D Object recognition is one of the big problems in Computer Vision which has a direct impact in ... more 3D Object recognition is one of the big problems in Computer Vision which has a direct impact in Robotics. There have been great advances in the last decade thanks to point cloud descriptors. These descriptors do very well at recognizing object instances in a wide variety of situations. Of great interest is also to know how descriptors perform in object classification tasks. With that idea in mind, we introduce a descriptor designed for the representation of object classes. Our descriptor, named SCurV, exploits 3D shape information and is inspired by recent findings from neurophysiology. We compute and incorporate surface curvatures and distributions of local surface point projections that represent flatness, concavity and convexity in a 3D object-centered and view-dependent descriptor. These different sources of information are combined in a novel and simple, yet effective, way of combining different features to improve classification results which can be extended to the combination of any type of descriptor. Our experimental setup compares SCurV with other recent descriptors on a large classification task. Using a large and heterogeneous database of 3D objects, we perform our experiments both on a classical, flat classification task and within a novel framework for hierarchical classification. On both tasks, the SCurV descriptor outperformed all other 3D descriptors tested.

Research paper thumbnail of Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks

Lecture Notes in Computer Science, 2006

Approximate Policy Iteration (API) is a reinforcement learning paradigm that is able to solve hig... more Approximate Policy Iteration (API) is a reinforcement learning paradigm that is able to solve high-dimensional, continuous control problems. We propose to exploit API for the closed-loop learning of mappings from images to actions. This approach requires a family of function approximators that maps visual percepts to a real-valued function. For this purpose, we use Regression Extra-Trees, a fast, yet accurate and versatile machine learning algorithm. The inputs of the Extra-Trees consist of a set of visual features that digest the informative patterns in the visual signal. We also show how to parallelize the Extra-Tree learning process to further reduce the computational expense, which is often essential in visual tasks. Experimental results on real-world images are given that indicate that the combination of API with Extra-Trees is a promising framework for the interactive learning of visual tasks.

Research paper thumbnail of Beyond Simple and Complex Neurons: Towards Intermediate-level Representations of Shapes and Objects

KI - Künstliche Intelligenz, 2014

Knowledge of the brain has much advanced since the concept of the neuron doctrine developed by Ra... more Knowledge of the brain has much advanced since the concept of the neuron doctrine developed by Ramón y Cajal (R Trim Histol Norm Patol 1:33-49, 1888). Over the last six decades a wide range of functionalities of neurons in the visual cortex have been identified. These neurons can be hierarchically organized into areas since neurons cluster according to structural properties and related function. The neurons in such areas can be characterized to a first order approximation by their (static) receptive field function, viz their filter characteristic implemented by their connection weights to neighboring cells. This paper aims to provide insights on the steps that computer models in our opinion must pursue in order to develop robust recognition mechanisms that mimic biological processing capabilities beyond the level of cells with classical simple and complex receptive field response properties. We stress the importance of intermediate-level representations to achieve higher-level object abstraction in the context of feature representations, and summarize two current approaches that we consider are advances toward achieving that goal.

Research paper thumbnail of Toward learning visual discrimination strategies

Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)

Humans learn strategies for visual discrimination through interaction with their environment. Dis... more Humans learn strategies for visual discrimination through interaction with their environment. Discrimination skills are refined as demanded by the task at hand, and are not a priori determined by any particular feature set. Tasks are typically incompletely specified and evolve continually. This work presents a general framework for learning visual discrimination that addresses some of these characteristics. It is based on an infinite combinatorial feature space consisting of primitive features such as oriented edgels and texture signatures, and compositions thereof. Features are progressively sampled from this space in a simple-to-complex manner. A simple recognition procedure queries learned features one by one and rules out candidate object classes that do not sufficiently exhibit the queried feature. Training images are presented sequentially to the learning system, which incrementally discovers features for recognition. Experimental results on two databases of geometric objects illustrate the applicability of the framework.

Research paper thumbnail of 3D Object Class Geometry Modeling with Spatial Latent Dirichlet Markov Random Fields

Lecture Notes in Computer Science, 2013

This paper presents a novel part-based geometry model for 3D object classes based on latent Diric... more This paper presents a novel part-based geometry model for 3D object classes based on latent Dirichlet allocation (LDA). With all object instances of the same category aligned to a canonical pose, the bounding box is discretized to form a 3D space dictionary for LDA. To enhance the spatial coherence of each part during model learning, we extend LDA by strategically constructing a Markov random field (MRF) on the part labels, and adding an extra spatial parameter for each part. We refer to the improved model as spatial latent Dirichlet Markov random fields (SLDMRF). The experimental results demonstrate that SLDMRF exhibits superior semantic interpretation and discriminative ability in model classification to LDA and other related models.