Michal Hradis | Brno University of Technology (original) (raw)

Papers by Michal Hradis

Research paper thumbnail of Implementation of the "Local Rank Differences" Image Feature Using SIMD Instructions of CPU

2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008

Usage of statistical classifiers, namely AdaBoost and its modifications, in object detection and ... more Usage of statistical classifiers, namely AdaBoost and its modifications, in object detection and pattern recognition is a contemporary and popular trend. The computatiponal performance of these classifiers largely depends on low level image features they are using: both from the point of view of the amount of information the feature provides and the executional time of its evaluation. Local Rank Difference is an image feature that is alternative to commonly used Haar features. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware as well as graphics hardware (GPU). Additionally, as shown in this paper, it performs very well on common CPU's. The paper discusses the LRD features and their properties, describes an experimental implementation of LRD using the multimedia instruction set of current general-purpose processors, presents its empirical performance measures compared to alternative approaches, and suggests several notes on practical usage of LRD and proposes directions for future work.

Research paper thumbnail of GP-GPU Implementation of the “Local Rank Differences” Image Feature

Lecture Notes in Computer Science, 2009

A currently popular trend in object detection and pattern recognition is usage of statistical cla... more A currently popular trend in object detection and pattern recognition is usage of statistical classifiers, namely AdaBoost and its modifications. The speed performance of these classifiers largely depends on the low level image features they are using: both on the amount of information the feature provides and the processor time of its evaluation. Local Rank Differences is an image feature that is alternative to commonly used haar wavelets. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware, but -as this paper shows -it performs very well on graphics hardware (GPU) used in general purpose manner (GPGPU, namely CUDA in this case) as well. The paper discusses the LRD features and their properties, describes an experimental implementation of the LRD in graphics hardware using CUDA, presents its empirical performance measures compared to alternative approaches, suggests several notes on practical usage of LRD and proposes directions for future work.

Research paper thumbnail of Image features in music style recognition

Research paper thumbnail of Real-time Tracking of Participants in Meeting Video}

Research paper thumbnail of Exploiting Neighbors for Faster Scanning Window Detection in Images

Lecture Notes in Computer Science, 2010

Detection of objects through scanning windows is widely used and accepted method. The detectors t... more Detection of objects through scanning windows is widely used and accepted method. The detectors traditionally do not make use of information that is shared between neighboring image positions although this fact means that the traditional solutions are not optimal. Addressing this, we propose an efficient and computationally inexpensive approach how to exploit the shared information and thus increase speed of detection. The main idea is to predict responses of the classifier in neighbor windows close to the ones already evaluated and skip such positions where the prediction is confident enough. In order to predict the responses, the proposed algorithm builds a new classifier which reuses the set of image features already exploited. The results show that the proposed approach can reduce scanning time up to four times with only minor increase of error rate. On the presented examples it is shown that, it is possible to reach less than one feature computed on average per single image position. The paper presents the algorithm itself and also results of experiments on several data sets with different types of image features.

Research paper thumbnail of Technical Report: Image Captioning with Semantically Similar Images

This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Con... more This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Convolutional Neural Network activations as an embedding to find semantically similar images. From these images, the most typical caption is selected based on unigram frequencies. Although the method received low scores with automated evaluation metrics and in human assessed average correctness, it is competitive in the ratio of captions which pass the Turing test and which are assessed as better or equal to human captions.

Research paper thumbnail of Brno University of Technology at TRECVid 2010 SIN, CCD

Research paper thumbnail of Brno University of Technology at TRECVid 2009

In this paper we describe our experiments in High-level feature extraction (HLF) and Search tasks... more In this paper we describe our experiments in High-level feature extraction (HLF) and Search tasks of the 2009 TRECVid evaluation. This year, we have concentrated mainly on the local (affine covariant) image features and their transformation into a searchable form, especially using the indexing techniques. In brief, we have submitted the following runs: HLF: We have used training method based on support vector machine (SVM) using five types of global and local image features. Results were submitted in the BRNO_HLF_SI run. Search: We have performed a fully automatic experiment based on the transformed local image features together with face detection and global features - color layout and texture features in the BrnoUT_visual.2 run. The paper is organized as follows. In Section 1, a motivation and an overview of the work is presented. We dedicated Section 2 to the feature extraction task, which is being used in common by the HLF and Search tasks. Details of the tasks we have sent are ...

Research paper thumbnail of Gaze and conversational engagement in multiparty video conversation

Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction - Gaze-In '12, 2012

Abstract When using a multiparty video mediated system, interacting participants assume a range o... more Abstract When using a multiparty video mediated system, interacting participants assume a range of various roles and exhibit behaviors according to how engaged in the communication they are. In this paper we focus on estimation of conversational engagement from gaze signal. In particular, we present an annotation scheme for conversational engagement, a statistical analysis of gaze behavior across varying levels of engagement, and we classify vectors of computed eye tracking measures. The results show that in 74% ...

Research paper thumbnail of “Local Rank Differences” Image Feature Implemented on GPU

Lecture Notes in Computer Science, 2008

A currently popular trend in object detection and pattern recognition is usage of statistical cla... more A currently popular trend in object detection and pattern recognition is usage of statistical classifiers, namely AdaBoost and its modifications. The speed performance of these classifiers largely depends on the low level image features they are using: both on the amount of information the feature provides and the executional time of its evaluation. Local Rank Differences is an image feature that is alternative to commonly used haar wavelets. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware, but -as this paper shows -it performs very well on graphics hardware (GPU) as well. The paper discusses the LRD features and their properties, describes an experimental implementation of LRD in graphics hardware, presents its empirical performance measures compared to alternative approaches and suggests several notes on practical usage of LRD and proposes directions for future work.

Research paper thumbnail of What do you want to do next

Proceedings of the Symposium on Eye Tracking Research and Applications - ETRA '12, 2012

ABSTRACT Interaction intent prediction and the Midas touch have been a longstanding challenge for... more ABSTRACT Interaction intent prediction and the Midas touch have been a longstanding challenge for eye-tracking researchers and users of gaze-based interaction. Inspired by machine learning approaches in biometric person authentication, we developed and tested an offline framework for task-independent prediction of interaction intents. We describe the principles of the method, the features extracted, normalization methods, and evaluation metrics. We systematically evaluated the proposed approach on an example dataset of gaze-augmented problem-solving sessions. We present results of three normalization methods, different feature sets and fusion of multiple feature types. Our results show that accuracy of up to 76% can be achieved with Area Under Curve around 80%. We discuss the possibility of applying the results for an online system capable of interaction intent prediction.

Research paper thumbnail of High performance architecture for object detection in streamed video (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '13, 2013

ABSTRACT Object detection is one of the key tasks in computer vision. It is computationally inten... more ABSTRACT Object detection is one of the key tasks in computer vision. It is computationally intensive and it is reasonable to accelerate it in hardware. The possible benefits of the acceleration are reduction of the computational load of the host computer system, increase of the overall performance of the applications, and reduction of the power consumption. We present novel architecture for multi-scale object detection in video streams. The architecture uses scanning window classifiers produced by WaldBoost learning algorithm, and simple image features. It employs small image buffer for data under processing, and on-the-fly scaling units to enable detection of object in multiple scales. The whole processing chain is pipelined and thus more image windows are processed in parallel. We implemented the engine in Spartan 6 FPGA and we show that it can process 640x480 pixel video streams at over 160 frames per second without the need of external memory. The design takes only a fraction of resources, compared to similar state of the art approaches.

Research paper thumbnail of Local Rank Patterns – Novel Features for Rapid Object Detection

Lecture Notes in Computer Science, 2009

This paper presents Local Rank Patterns (LRP) -novel features for rapid object detection in image... more This paper presents Local Rank Patterns (LRP) -novel features for rapid object detection in images which are based on existing features Local Rank Differences (LRD). The performance of the novel features is thoroughly tested on frontal face detection task and it is compared to the performance of the LRD and the traditionally used Haar-like features. The results show that the LRP surpass the LRD and the Haarlike features in the precision of detection and also in the average number of features needed for classification. Considering recent successful and efficient implementations of LRD on CPU, GPU and FPGA, the results suggest that LRP are good choice for object detection and that they could replace the Haar-like features in some applications in the future.

Research paper thumbnail of Real-Time Algorithms of Object Detection Using Classifiers

Real-Time Systems, Architecture, Scheduling, and Application, 2012

Research paper thumbnail of Annotating Images with Suggestions — User Study of a Tagging System

Lecture Notes in Computer Science, 2012

Research paper thumbnail of Semantic Class Detectors in Video Genre Recognition}

Research paper thumbnail of Platform for evaluation of image classifiers

Research paper thumbnail of Hardware acceleration of adaboost classifier

Research paper thumbnail of Local Rank Differences-Novel Features for Image Processing

Research paper thumbnail of A comparative study on distant free-hand pointing

Proceedings of the 10th European conference on Interactive tv and video - EuroiTV '12, 2012

In this paper we present a comparative study of free-hand pointing, an absolute remote pointing d... more In this paper we present a comparative study of free-hand pointing, an absolute remote pointing device. Unimanual and bimanual interaction were tested as well as the static reference system (spatial coordinates are fixed in the space in front of the TV) and novel body-aligned reference system (coordinates are bound to the current position of the user). We conducted a pointand-click experiment with 12 participants. We have identified the preferred interaction areas for left-and right-handed users in terms of hand preference and preferred spatial areas of the interaction. In bimanual interaction, the users relied more on dominant hand, switching hands only when necessary. Even though the remote pointing device was faster than the free-hand pointing, it was less accepted probably due to its low precision.

Research paper thumbnail of Implementation of the "Local Rank Differences" Image Feature Using SIMD Instructions of CPU

2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008

Usage of statistical classifiers, namely AdaBoost and its modifications, in object detection and ... more Usage of statistical classifiers, namely AdaBoost and its modifications, in object detection and pattern recognition is a contemporary and popular trend. The computatiponal performance of these classifiers largely depends on low level image features they are using: both from the point of view of the amount of information the feature provides and the executional time of its evaluation. Local Rank Difference is an image feature that is alternative to commonly used Haar features. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware as well as graphics hardware (GPU). Additionally, as shown in this paper, it performs very well on common CPU's. The paper discusses the LRD features and their properties, describes an experimental implementation of LRD using the multimedia instruction set of current general-purpose processors, presents its empirical performance measures compared to alternative approaches, and suggests several notes on practical usage of LRD and proposes directions for future work.

Research paper thumbnail of GP-GPU Implementation of the “Local Rank Differences” Image Feature

Lecture Notes in Computer Science, 2009

A currently popular trend in object detection and pattern recognition is usage of statistical cla... more A currently popular trend in object detection and pattern recognition is usage of statistical classifiers, namely AdaBoost and its modifications. The speed performance of these classifiers largely depends on the low level image features they are using: both on the amount of information the feature provides and the processor time of its evaluation. Local Rank Differences is an image feature that is alternative to commonly used haar wavelets. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware, but -as this paper shows -it performs very well on graphics hardware (GPU) used in general purpose manner (GPGPU, namely CUDA in this case) as well. The paper discusses the LRD features and their properties, describes an experimental implementation of the LRD in graphics hardware using CUDA, presents its empirical performance measures compared to alternative approaches, suggests several notes on practical usage of LRD and proposes directions for future work.

Research paper thumbnail of Image features in music style recognition

Research paper thumbnail of Real-time Tracking of Participants in Meeting Video}

Research paper thumbnail of Exploiting Neighbors for Faster Scanning Window Detection in Images

Lecture Notes in Computer Science, 2010

Detection of objects through scanning windows is widely used and accepted method. The detectors t... more Detection of objects through scanning windows is widely used and accepted method. The detectors traditionally do not make use of information that is shared between neighboring image positions although this fact means that the traditional solutions are not optimal. Addressing this, we propose an efficient and computationally inexpensive approach how to exploit the shared information and thus increase speed of detection. The main idea is to predict responses of the classifier in neighbor windows close to the ones already evaluated and skip such positions where the prediction is confident enough. In order to predict the responses, the proposed algorithm builds a new classifier which reuses the set of image features already exploited. The results show that the proposed approach can reduce scanning time up to four times with only minor increase of error rate. On the presented examples it is shown that, it is possible to reach less than one feature computed on average per single image position. The paper presents the algorithm itself and also results of experiments on several data sets with different types of image features.

Research paper thumbnail of Technical Report: Image Captioning with Semantically Similar Images

This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Con... more This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Convolutional Neural Network activations as an embedding to find semantically similar images. From these images, the most typical caption is selected based on unigram frequencies. Although the method received low scores with automated evaluation metrics and in human assessed average correctness, it is competitive in the ratio of captions which pass the Turing test and which are assessed as better or equal to human captions.

Research paper thumbnail of Brno University of Technology at TRECVid 2010 SIN, CCD

Research paper thumbnail of Brno University of Technology at TRECVid 2009

In this paper we describe our experiments in High-level feature extraction (HLF) and Search tasks... more In this paper we describe our experiments in High-level feature extraction (HLF) and Search tasks of the 2009 TRECVid evaluation. This year, we have concentrated mainly on the local (affine covariant) image features and their transformation into a searchable form, especially using the indexing techniques. In brief, we have submitted the following runs: HLF: We have used training method based on support vector machine (SVM) using five types of global and local image features. Results were submitted in the BRNO_HLF_SI run. Search: We have performed a fully automatic experiment based on the transformed local image features together with face detection and global features - color layout and texture features in the BrnoUT_visual.2 run. The paper is organized as follows. In Section 1, a motivation and an overview of the work is presented. We dedicated Section 2 to the feature extraction task, which is being used in common by the HLF and Search tasks. Details of the tasks we have sent are ...

Research paper thumbnail of Gaze and conversational engagement in multiparty video conversation

Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction - Gaze-In '12, 2012

Abstract When using a multiparty video mediated system, interacting participants assume a range o... more Abstract When using a multiparty video mediated system, interacting participants assume a range of various roles and exhibit behaviors according to how engaged in the communication they are. In this paper we focus on estimation of conversational engagement from gaze signal. In particular, we present an annotation scheme for conversational engagement, a statistical analysis of gaze behavior across varying levels of engagement, and we classify vectors of computed eye tracking measures. The results show that in 74% ...

Research paper thumbnail of “Local Rank Differences” Image Feature Implemented on GPU

Lecture Notes in Computer Science, 2008

A currently popular trend in object detection and pattern recognition is usage of statistical cla... more A currently popular trend in object detection and pattern recognition is usage of statistical classifiers, namely AdaBoost and its modifications. The speed performance of these classifiers largely depends on the low level image features they are using: both on the amount of information the feature provides and the executional time of its evaluation. Local Rank Differences is an image feature that is alternative to commonly used haar wavelets. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware, but -as this paper shows -it performs very well on graphics hardware (GPU) as well. The paper discusses the LRD features and their properties, describes an experimental implementation of LRD in graphics hardware, presents its empirical performance measures compared to alternative approaches and suggests several notes on practical usage of LRD and proposes directions for future work.

Research paper thumbnail of What do you want to do next

Proceedings of the Symposium on Eye Tracking Research and Applications - ETRA '12, 2012

ABSTRACT Interaction intent prediction and the Midas touch have been a longstanding challenge for... more ABSTRACT Interaction intent prediction and the Midas touch have been a longstanding challenge for eye-tracking researchers and users of gaze-based interaction. Inspired by machine learning approaches in biometric person authentication, we developed and tested an offline framework for task-independent prediction of interaction intents. We describe the principles of the method, the features extracted, normalization methods, and evaluation metrics. We systematically evaluated the proposed approach on an example dataset of gaze-augmented problem-solving sessions. We present results of three normalization methods, different feature sets and fusion of multiple feature types. Our results show that accuracy of up to 76% can be achieved with Area Under Curve around 80%. We discuss the possibility of applying the results for an online system capable of interaction intent prediction.

Research paper thumbnail of High performance architecture for object detection in streamed video (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '13, 2013

ABSTRACT Object detection is one of the key tasks in computer vision. It is computationally inten... more ABSTRACT Object detection is one of the key tasks in computer vision. It is computationally intensive and it is reasonable to accelerate it in hardware. The possible benefits of the acceleration are reduction of the computational load of the host computer system, increase of the overall performance of the applications, and reduction of the power consumption. We present novel architecture for multi-scale object detection in video streams. The architecture uses scanning window classifiers produced by WaldBoost learning algorithm, and simple image features. It employs small image buffer for data under processing, and on-the-fly scaling units to enable detection of object in multiple scales. The whole processing chain is pipelined and thus more image windows are processed in parallel. We implemented the engine in Spartan 6 FPGA and we show that it can process 640x480 pixel video streams at over 160 frames per second without the need of external memory. The design takes only a fraction of resources, compared to similar state of the art approaches.

Research paper thumbnail of Local Rank Patterns – Novel Features for Rapid Object Detection

Lecture Notes in Computer Science, 2009

This paper presents Local Rank Patterns (LRP) -novel features for rapid object detection in image... more This paper presents Local Rank Patterns (LRP) -novel features for rapid object detection in images which are based on existing features Local Rank Differences (LRD). The performance of the novel features is thoroughly tested on frontal face detection task and it is compared to the performance of the LRD and the traditionally used Haar-like features. The results show that the LRP surpass the LRD and the Haarlike features in the precision of detection and also in the average number of features needed for classification. Considering recent successful and efficient implementations of LRD on CPU, GPU and FPGA, the results suggest that LRP are good choice for object detection and that they could replace the Haar-like features in some applications in the future.

Research paper thumbnail of Real-Time Algorithms of Object Detection Using Classifiers

Real-Time Systems, Architecture, Scheduling, and Application, 2012

Research paper thumbnail of Annotating Images with Suggestions — User Study of a Tagging System

Lecture Notes in Computer Science, 2012

Research paper thumbnail of Semantic Class Detectors in Video Genre Recognition}

Research paper thumbnail of Platform for evaluation of image classifiers

Research paper thumbnail of Hardware acceleration of adaboost classifier

Research paper thumbnail of Local Rank Differences-Novel Features for Image Processing

Research paper thumbnail of A comparative study on distant free-hand pointing

Proceedings of the 10th European conference on Interactive tv and video - EuroiTV '12, 2012

In this paper we present a comparative study of free-hand pointing, an absolute remote pointing d... more In this paper we present a comparative study of free-hand pointing, an absolute remote pointing device. Unimanual and bimanual interaction were tested as well as the static reference system (spatial coordinates are fixed in the space in front of the TV) and novel body-aligned reference system (coordinates are bound to the current position of the user). We conducted a pointand-click experiment with 12 participants. We have identified the preferred interaction areas for left-and right-handed users in terms of hand preference and preferred spatial areas of the interaction. In bimanual interaction, the users relied more on dominant hand, switching hands only when necessary. Even though the remote pointing device was faster than the free-hand pointing, it was less accepted probably due to its low precision.