Mona Köhler - Academia.edu (original) (raw)
Papers by Mona Köhler
Eine pixelgenaue semantische Segmentierung bildet die Grundlage fur ein umfassendes Szenenverstan... more Eine pixelgenaue semantische Segmentierung bildet die Grundlage fur ein umfassendes Szenenverstandnis. Semantisches Wissen uber die Struktur und den Aufbau von Indoor-Szenen kann mobilen Robotern bei verschiedenen Aufgaben nutzlich sein. Unter Anderem kann dadurch die Lokalisierung, die Hindernisvermeidung, die gezielte Navigation zu semantischen Entitaten oder die Mensch-Maschine-Interaktion unterstutzt werden. Durch den Einsatz von effizienten RGB-Verfahren konnten zuletzt bereits gute Segmentierungsergebnisse erzielt werden. Bei zusatzlicher Berucksichtigung von Tiefendaten kann die Segmentierungsleistung in der Regel noch weiter verbessert werden. In dieser Masterarbeit werden daher Verfahren zur effizienten semantischen Segmentierung und zur RGBD-Segmentierung kombiniert. Auf Basis einer breiten Recherche zu beiden Themengebieten wird ein eigener, effizienter Deep-Learning-basierter RGBD-Segmentierungsansatz entwickelt. Mittels ausfuhrlicher Experimente zu verschiedenen Bestand...
arXiv (Cornell University), Jul 10, 2022
Semantic scene understanding is essential for mobile agents acting in various environments. Altho... more Semantic scene understanding is essential for mobile agents acting in various environments. Although semantic segmentation already provides a lot of information, details about individual objects as well as the general scene are missing but required for many real-world applications. However, solving multiple tasks separately is expensive and cannot be accomplished in real time given limited computing and battery capabilities on a mobile platform. In this paper, we propose an efficient multi-task approach for RGB-D scene analysis (EMSANet) that simultaneously performs semantic and instance segmentation (panoptic segmentation), instance orientation estimation, and scene classification. We show that all tasks can be accomplished using a single neural network in real time on a mobile platform without diminishing performance-by contrast, the individual tasks are able to benefit from each other. In order to evaluate our multi-task approach, we extend the annotations of the common RGB-D indoor datasets NYUv2 and SUNRGB-D for instance segmentation and orientation estimation. To the best of our knowledge, we are the first to provide results in such a comprehensive multi-task setting for indoor scene analysis on NYUv2 and SUNRGB-D.
2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2020
In order to deploy service robots in environments where they encounter and/or cooperate with pers... more In order to deploy service robots in environments where they encounter and/or cooperate with persons, one important key factor is human acceptance. Hence, information on which upcoming actions of the robot are based has to be made transparent and understandable to the human. However, considering the restricted power resources of mobile robot platforms, systems for visualization not only have to be expressive but also energy efficient. In this paper, we applied the well-known technique of laser scanning on a mobile robot to create a novel system for intention visualization and human-robot-interaction. We conducted user tests to compare our system to a low-power consuming LED video projector solution in order to evaluate the suitability for mobile platforms and to get human impressions of both systems. We can show that the presented system is preferred by most users in a dynamic test setup on a mobile platform.
2021 IEEE International Conference on Robotics and Automation (ICRA), 2021
Analyzing scenes thoroughly is crucial for mobile robots acting in different environments. Semant... more Analyzing scenes thoroughly is crucial for mobile robots acting in different environments. Semantic segmentation can enhance various subsequent tasks, such as (semantically assisted) person perception, (semantic) free space detection, (semantic) mapping, and (semantic) navigation. In this paper, we propose an efficient and robust RGB-D segmentation approach that can be optimized to a high degree using NVIDIA TensorRT and, thus, is well suited as a common initial processing step in a complex system for scene analysis on mobile robots. We show that RGB-D segmentation is superior to processing RGB images solely and that it can still be performed in real time if the network architecture is carefully designed. We evaluate our proposed Efficient Scene Analysis Network (ESANet) on the common indoor datasets NYUv2 and SUNRGB-D and show that it reaches state-of-the-art performance when considering both segmentation performance and runtime. Furthermore, our evaluation on the outdoor dataset Cityscapes shows that our approach is suitable for other areas of application as well. Finally, instead of presenting benchmark results only, we show qualitative results in one of our indoor application scenarios.
ACM Computing Surveys
Deep Learning approaches have recently raised the bar in many fields, from Natural Language Proce... more Deep Learning approaches have recently raised the bar in many fields, from Natural Language Processing to Computer Vision, by leveraging large amounts of data. However, they could fail when the retrieved information is not enough to fit the vast number of parameters, frequently resulting in overfitting and, therefore, in poor generalizability. Few-Shot Learning aims at designing models which can effectively operate in a scarce data regime, yielding learning strategies that only need few supervised examples to be trained. These procedures are of both practical and theoretical importance, as they are crucial for many real-life scenarios in which data is either costly or even impossible to retrieve. Moreover, they bridge the distance between current data-hungry models and human-like generalization capability. Computer Vision offers various tasks which can be few-shot inherent, such as person re-identification. This survey, which to the best of our knowledge is the first tackling this p...
Eine pixelgenaue semantische Segmentierung bildet die Grundlage fur ein umfassendes Szenenverstan... more Eine pixelgenaue semantische Segmentierung bildet die Grundlage fur ein umfassendes Szenenverstandnis. Semantisches Wissen uber die Struktur und den Aufbau von Indoor-Szenen kann mobilen Robotern bei verschiedenen Aufgaben nutzlich sein. Unter Anderem kann dadurch die Lokalisierung, die Hindernisvermeidung, die gezielte Navigation zu semantischen Entitaten oder die Mensch-Maschine-Interaktion unterstutzt werden. Durch den Einsatz von effizienten RGB-Verfahren konnten zuletzt bereits gute Segmentierungsergebnisse erzielt werden. Bei zusatzlicher Berucksichtigung von Tiefendaten kann die Segmentierungsleistung in der Regel noch weiter verbessert werden. In dieser Masterarbeit werden daher Verfahren zur effizienten semantischen Segmentierung und zur RGBD-Segmentierung kombiniert. Auf Basis einer breiten Recherche zu beiden Themengebieten wird ein eigener, effizienter Deep-Learning-basierter RGBD-Segmentierungsansatz entwickelt. Mittels ausfuhrlicher Experimente zu verschiedenen Bestand...
arXiv (Cornell University), Jul 10, 2022
Semantic scene understanding is essential for mobile agents acting in various environments. Altho... more Semantic scene understanding is essential for mobile agents acting in various environments. Although semantic segmentation already provides a lot of information, details about individual objects as well as the general scene are missing but required for many real-world applications. However, solving multiple tasks separately is expensive and cannot be accomplished in real time given limited computing and battery capabilities on a mobile platform. In this paper, we propose an efficient multi-task approach for RGB-D scene analysis (EMSANet) that simultaneously performs semantic and instance segmentation (panoptic segmentation), instance orientation estimation, and scene classification. We show that all tasks can be accomplished using a single neural network in real time on a mobile platform without diminishing performance-by contrast, the individual tasks are able to benefit from each other. In order to evaluate our multi-task approach, we extend the annotations of the common RGB-D indoor datasets NYUv2 and SUNRGB-D for instance segmentation and orientation estimation. To the best of our knowledge, we are the first to provide results in such a comprehensive multi-task setting for indoor scene analysis on NYUv2 and SUNRGB-D.
2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2020
In order to deploy service robots in environments where they encounter and/or cooperate with pers... more In order to deploy service robots in environments where they encounter and/or cooperate with persons, one important key factor is human acceptance. Hence, information on which upcoming actions of the robot are based has to be made transparent and understandable to the human. However, considering the restricted power resources of mobile robot platforms, systems for visualization not only have to be expressive but also energy efficient. In this paper, we applied the well-known technique of laser scanning on a mobile robot to create a novel system for intention visualization and human-robot-interaction. We conducted user tests to compare our system to a low-power consuming LED video projector solution in order to evaluate the suitability for mobile platforms and to get human impressions of both systems. We can show that the presented system is preferred by most users in a dynamic test setup on a mobile platform.
2021 IEEE International Conference on Robotics and Automation (ICRA), 2021
Analyzing scenes thoroughly is crucial for mobile robots acting in different environments. Semant... more Analyzing scenes thoroughly is crucial for mobile robots acting in different environments. Semantic segmentation can enhance various subsequent tasks, such as (semantically assisted) person perception, (semantic) free space detection, (semantic) mapping, and (semantic) navigation. In this paper, we propose an efficient and robust RGB-D segmentation approach that can be optimized to a high degree using NVIDIA TensorRT and, thus, is well suited as a common initial processing step in a complex system for scene analysis on mobile robots. We show that RGB-D segmentation is superior to processing RGB images solely and that it can still be performed in real time if the network architecture is carefully designed. We evaluate our proposed Efficient Scene Analysis Network (ESANet) on the common indoor datasets NYUv2 and SUNRGB-D and show that it reaches state-of-the-art performance when considering both segmentation performance and runtime. Furthermore, our evaluation on the outdoor dataset Cityscapes shows that our approach is suitable for other areas of application as well. Finally, instead of presenting benchmark results only, we show qualitative results in one of our indoor application scenarios.
ACM Computing Surveys
Deep Learning approaches have recently raised the bar in many fields, from Natural Language Proce... more Deep Learning approaches have recently raised the bar in many fields, from Natural Language Processing to Computer Vision, by leveraging large amounts of data. However, they could fail when the retrieved information is not enough to fit the vast number of parameters, frequently resulting in overfitting and, therefore, in poor generalizability. Few-Shot Learning aims at designing models which can effectively operate in a scarce data regime, yielding learning strategies that only need few supervised examples to be trained. These procedures are of both practical and theoretical importance, as they are crucial for many real-life scenarios in which data is either costly or even impossible to retrieve. Moreover, they bridge the distance between current data-hungry models and human-like generalization capability. Computer Vision offers various tasks which can be few-shot inherent, such as person re-identification. This survey, which to the best of our knowledge is the first tackling this p...