Alen Smajic | Goethe-Universität Frankfurt am Main (original) (raw)
Uploads
Conference Papers by Alen Smajic
British Machine Vision Conference (BMVC 2022), Nov 23, 2022
Deep Neural Networks (DNNs) for perception in automated driving have been extensively studied, wh... more Deep Neural Networks (DNNs) for perception in automated driving have been extensively studied, while achieving strong results in detection performance on pre-annotated test sets. However, there has been a gap in the literature on a systematic analysis of DNNs behavior to investigate the factors contributing to their misbehavior. As part of DNNs safety, we propose to both analyze DNNs behavior in challenging scenarios as well as the respective factors that actually contribute to their misbehavior. Although some of such factors have been studied individually, there is not a thorough study to compare all together in a systematic manner to unveil the impact of each factor leading to DNNs failures. In this paper, we propose an approach to evaluate the DNNs performance limiting factors (PLF), and their contribution to the DNNs misbehavior. Accordingly, we analyze seventeen factors from the literature, introduce four novel factors and conduct an assessment on all of them to assess their potential as a PLF. Furthermore, we evaluate our results based on six state-of-the-art pedestrian detection DNNs including three detection tasks. For our experiments, we study a synthetic as well as a real-world dataset for pedestrian detection. We show that there exist various similarities and dissimilarities when comparing the PLF from a synthetic dataset to a real one, and discuss the causes and effects of such relations. Furthermore, we provide an approach to analyze the common factors from both real-world as well as synthetic datasets which might have similar effects on various DNNs performance.
IEEE Intelligent Vehicles Symposium (IV 2022), Jun 7, 2022
Recent adversarial attacks with real world applications are capable of deceiving deep neural netw... more Recent adversarial attacks with real world applications are capable of deceiving deep neural networks (DNN), which often appear as printed stickers applied to objects in physical world. Though achieving high success rate in lab tests and limited field tests, such attacks have not been tested on multiple DNN architectures with a standard setup to unveil the common robustness and weakness points of both the DNNs and the attacks. Furthermore, realistic looking stickers applied by normal people as acts of vandalism are not studied to discover their potential risks as well the risk of optimizing the location of such realistic stickers to achieve the maximum performance drop. In this paper, (a) we study the case of realistic looking sticker application effects on traffic sign detectors performance; (b) we use traffic sign image classification as our use case and train and attack 11 of the modern architectures for our analysis; (c) by considering different factors like brightness, blurriness and contrast of the train images in our sticker application procedure, we show that simple image processing techniques can help realistic looking stickers fit into their background to mimic real world tests; (d) by performing structured synthetic and real-world evaluations, we study the difference of various traffic sign classes in terms of their crucial distinctive features among the tested DNNs.
German Conference on Artificial Intelligence (KI 2021), Oct 1, 2021
As global trends are shifting towards data-driven industries, the demand for automated algorithms... more As global trends are shifting towards data-driven industries, the demand for automated algorithms that can convert images of scanned documents into machine readable information is rapidly growing. In addition to digitization there is an improvement toward process automation that used to require manual inspection of documents. Although optical character recognition (OCR) technologies mostly solved the task of converting human-readable characters from images, the task of extracting tables has been less focused on. This recognition consists of two sub-tasks: table detection and table structure recognition. Most prior work on this problem focuses on either task without offering an end-to-end solution or paying attention to real application conditions like rotated images or noise artefacts. Recent work shows a clear trend towards deep learning using transfer learning for table structure recognition due to the lack of sufficiently large datasets. We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for table recognition. It utilizes state-of-the-art deep learning models and differentiates between three types of tables based on their borders. For the table structure recognition we use a deterministic non-data driven algorithm, which works on all three types. In addition, we present an algorithm for non-bordered tables and one for bordered ones as the basis of our table structure detection algorithm. We evaluate Multi-Type-TD-TSR on a self annotated subset of the ICDAR 2019 table structure recognition dataset [5] and achieve a new state-of-the-art. Source code is available under https://github.com/Psarpei/Multi-Type-TD-TSR.
Master Thesis by Alen Smajic
AI-based computer vision systems play a crucial role in the environment perception for autonomous... more AI-based computer vision systems play a crucial role in the environment perception for autonomous driving. Although the development of self-driving systems has been pursued for multiple decades, it is only recently that breakthroughs in Deep Neural Networks (DNNs) have led to their widespread application in perception pipelines, which are getting more and more sophisticated. However, with this rising trend comes the need for a systematic safety analysis to evaluate the DNN's behavior in difficult scenarios as well as to identify the various factors that cause misbehavior in such systems. This work aims to deliver a crucial contribution to the lacking literature on the systematic analysis of Performance Limiting Factors (PLFs) for DNNs by investigating the task of pedestrian detection in urban traffic from a monocular camera mounted on an autonomous vehicle. To investigate the common factors that lead to DNN misbehavior, six commonly used state-of-the-art object detection architectures and three detection tasks are studied using a new large-scale synthetic dataset and a smaller real-world dataset for pedestrian detection. The systematic analysis includes 17 factors from the literature and four novel factors that are introduced as part of this work. Each of the 21 factors is assessed based on its influence on the detection performance and whether it can be considered a Performance Limiting Factor (PLF). In order to support the evaluation of the detection performance, a novel and task-oriented Pedestrian Detection Safety Metric (PDSM) is introduced, which is specifically designed to aid in the identification of individual factors that contribute to DNN failure. This work further introduces a training approach for F1-Score maximization whose purpose is to ensure that the DNNs are assessed at their highest performance. Moreover, a new occlusion estimation model is introduced to replace the missing pedestrian occlusion annotations in the real-world dataset. Based on a qualitative analysis of the correlation graphs that visualize the correlation between the PLFs and the detection performance, this study identified 16 of the initial 21 factors as being PLFs for DNNs out of which the entropy, the occlusion ratio, the boundary edge strength, and the bounding box aspect ratio turned out to be most severely affecting the detection performance. The findings of this study highlight some of the most serious shortcomings of current DNNs and pave the way for future research to address these issues.
Bachelor Thesis by Alen Smajic
Das Ziel dieser Arbeit ist die realitätsgetreue Entwicklung eines interaktiven 3D-Stadtmodells, w... more Das Ziel dieser Arbeit ist die realitätsgetreue Entwicklung eines interaktiven 3D-Stadtmodells, welches auf den ÖPNV zugeschnitten ist. Dabei soll das Programm anhand von Benutzereingaben und mit Hilfe einer Datenquelle, automatisch eine dreidimensionale Visualisierung der Gebäude erzeugen und den lokalen ÖPNV mitintegrieren. Als Beispiel der Ausarbeitung diente das ÖPNV-Netz der Stadt Frankfurt. Hierbei wurde auf die Problematik der Erhebung von Geoinformationen und der Verarbeitung von solchen komplexen Daten eingegangen. Es wurde ermittelt, welche Nutzergruppen einen Mehrwert durch eine derartige 3D Visualisierung haben und welche neuen Erweiterungs- und Nutzungspotenziale das Modell bietet. Dem Leser soll insbesondere ein Einblick in die Generierung von interaktiven 3D-Modellen aus reinen Rohdaten verschafft werden. Dazu wurde als Entwicklungsumgebung die Spiele-Engine Unity eingesetzt, welche sich als sehr fähiges und modernes Entwicklungswerkzeug bei der Erstellung von funktionalen 3D-Visualisierungen herausgestellt hat. Als Datenquelle wurde das OpenStreetMap Projekt benutzt und im Rahmen dieser Arbeit behandelt. Anschließend wurde zur Evaluation, das Modell verschiedenen Nutzern bereitgestellt und anhand eines Fragebogens evaluiert.
Technical Reports by Alen Smajic
Recent advances in Machine Learning are pushing the boundaries of real world AI applications and ... more Recent advances in Machine Learning are pushing the boundaries of real world AI applications and are becoming more and more integrated in our daily lives. As research advances, the long-term goal of autonomous driving becomes rather reality than fiction, but brings with it numerous challenges for our society and the system engineers. Topics like ethical AI, privacy and transparency are becoming increasingly relevant and have to be considered when designing AI systems with real world applications. In this project we present our first attempt at tackling the problem of traffic sign recognition by taking a systems engineering approach. We focus on the development of an explainable and transparent system with the use of concept whitening layers inside our deep learning models, which allow us to train our models with handcrafted features and make predictions based on predefined concepts. We also introduce a simulation environment for traffic sign recognition, which can be used for model deployment, performance benchmarking, data generation and further domain model research.
Datasets drive vision progress, yet existing driving datasets are limited in terms of visual cont... more Datasets drive vision progress, yet existing driving datasets are limited in terms of visual content, scene variation, the richness of annotations, and the geographic distribution and supported tasks to study multitask learning for autonomous driving. In 2018 Yu et al. released BDD100K, the largest driving video dataset with 100K videos and 10 tasks to evaluate the progress of image recognition algorithms on autonomous driving. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models that are less likely to be surprised by new conditions. Provided are bounding box annotations of 13 categories for each of the reference frames of 100K videos and 2D bounding boxes annotated on 100.000 images for "other vehicle", "pedestrian", "traffic light", "traffic sign", "truck", "train", "other person", "bus", "car", "rider", "motorcycle", "bicycle", "trailer". The goal of our project is to detect and classify traffic objects in a video in real-time using two approaches. We trained the two state-of-the-art models YOLO and Faster R-CNN on the Berkeley DeepDrive dataset to compare their performances and achieve a comparable mAP to the current state-of-the-art on BDD100K, which is 45.7 using a hybrid incremental net. We will focus on the context of autonomous driving and compare the models performances on a live video measuring FPS and mAP.
British Machine Vision Conference (BMVC 2022), Nov 23, 2022
Deep Neural Networks (DNNs) for perception in automated driving have been extensively studied, wh... more Deep Neural Networks (DNNs) for perception in automated driving have been extensively studied, while achieving strong results in detection performance on pre-annotated test sets. However, there has been a gap in the literature on a systematic analysis of DNNs behavior to investigate the factors contributing to their misbehavior. As part of DNNs safety, we propose to both analyze DNNs behavior in challenging scenarios as well as the respective factors that actually contribute to their misbehavior. Although some of such factors have been studied individually, there is not a thorough study to compare all together in a systematic manner to unveil the impact of each factor leading to DNNs failures. In this paper, we propose an approach to evaluate the DNNs performance limiting factors (PLF), and their contribution to the DNNs misbehavior. Accordingly, we analyze seventeen factors from the literature, introduce four novel factors and conduct an assessment on all of them to assess their potential as a PLF. Furthermore, we evaluate our results based on six state-of-the-art pedestrian detection DNNs including three detection tasks. For our experiments, we study a synthetic as well as a real-world dataset for pedestrian detection. We show that there exist various similarities and dissimilarities when comparing the PLF from a synthetic dataset to a real one, and discuss the causes and effects of such relations. Furthermore, we provide an approach to analyze the common factors from both real-world as well as synthetic datasets which might have similar effects on various DNNs performance.
IEEE Intelligent Vehicles Symposium (IV 2022), Jun 7, 2022
Recent adversarial attacks with real world applications are capable of deceiving deep neural netw... more Recent adversarial attacks with real world applications are capable of deceiving deep neural networks (DNN), which often appear as printed stickers applied to objects in physical world. Though achieving high success rate in lab tests and limited field tests, such attacks have not been tested on multiple DNN architectures with a standard setup to unveil the common robustness and weakness points of both the DNNs and the attacks. Furthermore, realistic looking stickers applied by normal people as acts of vandalism are not studied to discover their potential risks as well the risk of optimizing the location of such realistic stickers to achieve the maximum performance drop. In this paper, (a) we study the case of realistic looking sticker application effects on traffic sign detectors performance; (b) we use traffic sign image classification as our use case and train and attack 11 of the modern architectures for our analysis; (c) by considering different factors like brightness, blurriness and contrast of the train images in our sticker application procedure, we show that simple image processing techniques can help realistic looking stickers fit into their background to mimic real world tests; (d) by performing structured synthetic and real-world evaluations, we study the difference of various traffic sign classes in terms of their crucial distinctive features among the tested DNNs.
German Conference on Artificial Intelligence (KI 2021), Oct 1, 2021
As global trends are shifting towards data-driven industries, the demand for automated algorithms... more As global trends are shifting towards data-driven industries, the demand for automated algorithms that can convert images of scanned documents into machine readable information is rapidly growing. In addition to digitization there is an improvement toward process automation that used to require manual inspection of documents. Although optical character recognition (OCR) technologies mostly solved the task of converting human-readable characters from images, the task of extracting tables has been less focused on. This recognition consists of two sub-tasks: table detection and table structure recognition. Most prior work on this problem focuses on either task without offering an end-to-end solution or paying attention to real application conditions like rotated images or noise artefacts. Recent work shows a clear trend towards deep learning using transfer learning for table structure recognition due to the lack of sufficiently large datasets. We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for table recognition. It utilizes state-of-the-art deep learning models and differentiates between three types of tables based on their borders. For the table structure recognition we use a deterministic non-data driven algorithm, which works on all three types. In addition, we present an algorithm for non-bordered tables and one for bordered ones as the basis of our table structure detection algorithm. We evaluate Multi-Type-TD-TSR on a self annotated subset of the ICDAR 2019 table structure recognition dataset [5] and achieve a new state-of-the-art. Source code is available under https://github.com/Psarpei/Multi-Type-TD-TSR.
AI-based computer vision systems play a crucial role in the environment perception for autonomous... more AI-based computer vision systems play a crucial role in the environment perception for autonomous driving. Although the development of self-driving systems has been pursued for multiple decades, it is only recently that breakthroughs in Deep Neural Networks (DNNs) have led to their widespread application in perception pipelines, which are getting more and more sophisticated. However, with this rising trend comes the need for a systematic safety analysis to evaluate the DNN's behavior in difficult scenarios as well as to identify the various factors that cause misbehavior in such systems. This work aims to deliver a crucial contribution to the lacking literature on the systematic analysis of Performance Limiting Factors (PLFs) for DNNs by investigating the task of pedestrian detection in urban traffic from a monocular camera mounted on an autonomous vehicle. To investigate the common factors that lead to DNN misbehavior, six commonly used state-of-the-art object detection architectures and three detection tasks are studied using a new large-scale synthetic dataset and a smaller real-world dataset for pedestrian detection. The systematic analysis includes 17 factors from the literature and four novel factors that are introduced as part of this work. Each of the 21 factors is assessed based on its influence on the detection performance and whether it can be considered a Performance Limiting Factor (PLF). In order to support the evaluation of the detection performance, a novel and task-oriented Pedestrian Detection Safety Metric (PDSM) is introduced, which is specifically designed to aid in the identification of individual factors that contribute to DNN failure. This work further introduces a training approach for F1-Score maximization whose purpose is to ensure that the DNNs are assessed at their highest performance. Moreover, a new occlusion estimation model is introduced to replace the missing pedestrian occlusion annotations in the real-world dataset. Based on a qualitative analysis of the correlation graphs that visualize the correlation between the PLFs and the detection performance, this study identified 16 of the initial 21 factors as being PLFs for DNNs out of which the entropy, the occlusion ratio, the boundary edge strength, and the bounding box aspect ratio turned out to be most severely affecting the detection performance. The findings of this study highlight some of the most serious shortcomings of current DNNs and pave the way for future research to address these issues.
Das Ziel dieser Arbeit ist die realitätsgetreue Entwicklung eines interaktiven 3D-Stadtmodells, w... more Das Ziel dieser Arbeit ist die realitätsgetreue Entwicklung eines interaktiven 3D-Stadtmodells, welches auf den ÖPNV zugeschnitten ist. Dabei soll das Programm anhand von Benutzereingaben und mit Hilfe einer Datenquelle, automatisch eine dreidimensionale Visualisierung der Gebäude erzeugen und den lokalen ÖPNV mitintegrieren. Als Beispiel der Ausarbeitung diente das ÖPNV-Netz der Stadt Frankfurt. Hierbei wurde auf die Problematik der Erhebung von Geoinformationen und der Verarbeitung von solchen komplexen Daten eingegangen. Es wurde ermittelt, welche Nutzergruppen einen Mehrwert durch eine derartige 3D Visualisierung haben und welche neuen Erweiterungs- und Nutzungspotenziale das Modell bietet. Dem Leser soll insbesondere ein Einblick in die Generierung von interaktiven 3D-Modellen aus reinen Rohdaten verschafft werden. Dazu wurde als Entwicklungsumgebung die Spiele-Engine Unity eingesetzt, welche sich als sehr fähiges und modernes Entwicklungswerkzeug bei der Erstellung von funktionalen 3D-Visualisierungen herausgestellt hat. Als Datenquelle wurde das OpenStreetMap Projekt benutzt und im Rahmen dieser Arbeit behandelt. Anschließend wurde zur Evaluation, das Modell verschiedenen Nutzern bereitgestellt und anhand eines Fragebogens evaluiert.
Recent advances in Machine Learning are pushing the boundaries of real world AI applications and ... more Recent advances in Machine Learning are pushing the boundaries of real world AI applications and are becoming more and more integrated in our daily lives. As research advances, the long-term goal of autonomous driving becomes rather reality than fiction, but brings with it numerous challenges for our society and the system engineers. Topics like ethical AI, privacy and transparency are becoming increasingly relevant and have to be considered when designing AI systems with real world applications. In this project we present our first attempt at tackling the problem of traffic sign recognition by taking a systems engineering approach. We focus on the development of an explainable and transparent system with the use of concept whitening layers inside our deep learning models, which allow us to train our models with handcrafted features and make predictions based on predefined concepts. We also introduce a simulation environment for traffic sign recognition, which can be used for model deployment, performance benchmarking, data generation and further domain model research.
Datasets drive vision progress, yet existing driving datasets are limited in terms of visual cont... more Datasets drive vision progress, yet existing driving datasets are limited in terms of visual content, scene variation, the richness of annotations, and the geographic distribution and supported tasks to study multitask learning for autonomous driving. In 2018 Yu et al. released BDD100K, the largest driving video dataset with 100K videos and 10 tasks to evaluate the progress of image recognition algorithms on autonomous driving. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models that are less likely to be surprised by new conditions. Provided are bounding box annotations of 13 categories for each of the reference frames of 100K videos and 2D bounding boxes annotated on 100.000 images for "other vehicle", "pedestrian", "traffic light", "traffic sign", "truck", "train", "other person", "bus", "car", "rider", "motorcycle", "bicycle", "trailer". The goal of our project is to detect and classify traffic objects in a video in real-time using two approaches. We trained the two state-of-the-art models YOLO and Faster R-CNN on the Berkeley DeepDrive dataset to compare their performances and achieve a comparable mAP to the current state-of-the-art on BDD100K, which is 45.7 using a hybrid incremental net. We will focus on the context of autonomous driving and compare the models performances on a live video measuring FPS and mAP.