Michal Hradis | Brno University of Technology (original) (raw)
Papers by Michal Hradis
2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008
Usage of statistical classifiers, namely AdaBoost and its modifications, in object detection and ... more Usage of statistical classifiers, namely AdaBoost and its modifications, in object detection and pattern recognition is a contemporary and popular trend. The computatiponal performance of these classifiers largely depends on low level image features they are using: both from the point of view of the amount of information the feature provides and the executional time of its evaluation. Local Rank Difference is an image feature that is alternative to commonly used Haar features. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware as well as graphics hardware (GPU). Additionally, as shown in this paper, it performs very well on common CPU's. The paper discusses the LRD features and their properties, describes an experimental implementation of LRD using the multimedia instruction set of current general-purpose processors, presents its empirical performance measures compared to alternative approaches, and suggests several notes on practical usage of LRD and proposes directions for future work.
Lecture Notes in Computer Science, 2009
A currently popular trend in object detection and pattern recognition is usage of statistical cla... more A currently popular trend in object detection and pattern recognition is usage of statistical classifiers, namely AdaBoost and its modifications. The speed performance of these classifiers largely depends on the low level image features they are using: both on the amount of information the feature provides and the processor time of its evaluation. Local Rank Differences is an image feature that is alternative to commonly used haar wavelets. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware, but -as this paper shows -it performs very well on graphics hardware (GPU) used in general purpose manner (GPGPU, namely CUDA in this case) as well. The paper discusses the LRD features and their properties, describes an experimental implementation of the LRD in graphics hardware using CUDA, presents its empirical performance measures compared to alternative approaches, suggests several notes on practical usage of LRD and proposes directions for future work.
Lecture Notes in Computer Science, 2010
Detection of objects through scanning windows is widely used and accepted method. The detectors t... more Detection of objects through scanning windows is widely used and accepted method. The detectors traditionally do not make use of information that is shared between neighboring image positions although this fact means that the traditional solutions are not optimal. Addressing this, we propose an efficient and computationally inexpensive approach how to exploit the shared information and thus increase speed of detection. The main idea is to predict responses of the classifier in neighbor windows close to the ones already evaluated and skip such positions where the prediction is confident enough. In order to predict the responses, the proposed algorithm builds a new classifier which reuses the set of image features already exploited. The results show that the proposed approach can reduce scanning time up to four times with only minor increase of error rate. On the presented examples it is shown that, it is possible to reach less than one feature computed on average per single image position. The paper presents the algorithm itself and also results of experiments on several data sets with different types of image features.
This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Con... more This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Convolutional Neural Network activations as an embedding to find semantically similar images. From these images, the most typical caption is selected based on unigram frequencies. Although the method received low scores with automated evaluation metrics and in human assessed average correctness, it is competitive in the ratio of captions which pass the Turing test and which are assessed as better or equal to human captions.
In this paper we describe our experiments in High-level feature extraction (HLF) and Search tasks... more In this paper we describe our experiments in High-level feature extraction (HLF) and Search tasks of the 2009 TRECVid evaluation. This year, we have concentrated mainly on the local (affine covariant) image features and their transformation into a searchable form, especially using the indexing techniques. In brief, we have submitted the following runs: HLF: We have used training method based on support vector machine (SVM) using five types of global and local image features. Results were submitted in the BRNO_HLF_SI run. Search: We have performed a fully automatic experiment based on the transformed local image features together with face detection and global features - color layout and texture features in the BrnoUT_visual.2 run. The paper is organized as follows. In Section 1, a motivation and an overview of the work is presented. We dedicated Section 2 to the feature extraction task, which is being used in common by the HLF and Search tasks. Details of the tasks we have sent are ...
Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction - Gaze-In '12, 2012
Abstract When using a multiparty video mediated system, interacting participants assume a range o... more Abstract When using a multiparty video mediated system, interacting participants assume a range of various roles and exhibit behaviors according to how engaged in the communication they are. In this paper we focus on estimation of conversational engagement from gaze signal. In particular, we present an annotation scheme for conversational engagement, a statistical analysis of gaze behavior across varying levels of engagement, and we classify vectors of computed eye tracking measures. The results show that in 74% ...
Lecture Notes in Computer Science, 2008
A currently popular trend in object detection and pattern recognition is usage of statistical cla... more A currently popular trend in object detection and pattern recognition is usage of statistical classifiers, namely AdaBoost and its modifications. The speed performance of these classifiers largely depends on the low level image features they are using: both on the amount of information the feature provides and the executional time of its evaluation. Local Rank Differences is an image feature that is alternative to commonly used haar wavelets. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware, but -as this paper shows -it performs very well on graphics hardware (GPU) as well. The paper discusses the LRD features and their properties, describes an experimental implementation of LRD in graphics hardware, presents its empirical performance measures compared to alternative approaches and suggests several notes on practical usage of LRD and proposes directions for future work.
Proceedings of the Symposium on Eye Tracking Research and Applications - ETRA '12, 2012
ABSTRACT Interaction intent prediction and the Midas touch have been a longstanding challenge for... more ABSTRACT Interaction intent prediction and the Midas touch have been a longstanding challenge for eye-tracking researchers and users of gaze-based interaction. Inspired by machine learning approaches in biometric person authentication, we developed and tested an offline framework for task-independent prediction of interaction intents. We describe the principles of the method, the features extracted, normalization methods, and evaluation metrics. We systematically evaluated the proposed approach on an example dataset of gaze-augmented problem-solving sessions. We present results of three normalization methods, different feature sets and fusion of multiple feature types. Our results show that accuracy of up to 76% can be achieved with Area Under Curve around 80%. We discuss the possibility of applying the results for an online system capable of interaction intent prediction.
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '13, 2013
ABSTRACT Object detection is one of the key tasks in computer vision. It is computationally inten... more ABSTRACT Object detection is one of the key tasks in computer vision. It is computationally intensive and it is reasonable to accelerate it in hardware. The possible benefits of the acceleration are reduction of the computational load of the host computer system, increase of the overall performance of the applications, and reduction of the power consumption. We present novel architecture for multi-scale object detection in video streams. The architecture uses scanning window classifiers produced by WaldBoost learning algorithm, and simple image features. It employs small image buffer for data under processing, and on-the-fly scaling units to enable detection of object in multiple scales. The whole processing chain is pipelined and thus more image windows are processed in parallel. We implemented the engine in Spartan 6 FPGA and we show that it can process 640x480 pixel video streams at over 160 frames per second without the need of external memory. The design takes only a fraction of resources, compared to similar state of the art approaches.
Lecture Notes in Computer Science, 2009
This paper presents Local Rank Patterns (LRP) -novel features for rapid object detection in image... more This paper presents Local Rank Patterns (LRP) -novel features for rapid object detection in images which are based on existing features Local Rank Differences (LRD). The performance of the novel features is thoroughly tested on frontal face detection task and it is compared to the performance of the LRD and the traditionally used Haar-like features. The results show that the LRP surpass the LRD and the Haarlike features in the precision of detection and also in the average number of features needed for classification. Considering recent successful and efficient implementations of LRD on CPU, GPU and FPGA, the results suggest that LRP are good choice for object detection and that they could replace the Haar-like features in some applications in the future.
Real-Time Systems, Architecture, Scheduling, and Application, 2012
Lecture Notes in Computer Science, 2012
Proceedings of the 10th European conference on Interactive tv and video - EuroiTV '12, 2012
In this paper we present a comparative study of free-hand pointing, an absolute remote pointing d... more In this paper we present a comparative study of free-hand pointing, an absolute remote pointing device. Unimanual and bimanual interaction were tested as well as the static reference system (spatial coordinates are fixed in the space in front of the TV) and novel body-aligned reference system (coordinates are bound to the current position of the user). We conducted a pointand-click experiment with 12 participants. We have identified the preferred interaction areas for left-and right-handed users in terms of hand preference and preferred spatial areas of the interaction. In bimanual interaction, the users relied more on dominant hand, switching hands only when necessary. Even though the remote pointing device was faster than the free-hand pointing, it was less accepted probably due to its low precision.
2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008
Usage of statistical classifiers, namely AdaBoost and its modifications, in object detection and ... more Usage of statistical classifiers, namely AdaBoost and its modifications, in object detection and pattern recognition is a contemporary and popular trend. The computatiponal performance of these classifiers largely depends on low level image features they are using: both from the point of view of the amount of information the feature provides and the executional time of its evaluation. Local Rank Difference is an image feature that is alternative to commonly used Haar features. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware as well as graphics hardware (GPU). Additionally, as shown in this paper, it performs very well on common CPU's. The paper discusses the LRD features and their properties, describes an experimental implementation of LRD using the multimedia instruction set of current general-purpose processors, presents its empirical performance measures compared to alternative approaches, and suggests several notes on practical usage of LRD and proposes directions for future work.
Lecture Notes in Computer Science, 2009
A currently popular trend in object detection and pattern recognition is usage of statistical cla... more A currently popular trend in object detection and pattern recognition is usage of statistical classifiers, namely AdaBoost and its modifications. The speed performance of these classifiers largely depends on the low level image features they are using: both on the amount of information the feature provides and the processor time of its evaluation. Local Rank Differences is an image feature that is alternative to commonly used haar wavelets. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware, but -as this paper shows -it performs very well on graphics hardware (GPU) used in general purpose manner (GPGPU, namely CUDA in this case) as well. The paper discusses the LRD features and their properties, describes an experimental implementation of the LRD in graphics hardware using CUDA, presents its empirical performance measures compared to alternative approaches, suggests several notes on practical usage of LRD and proposes directions for future work.
Lecture Notes in Computer Science, 2010
Detection of objects through scanning windows is widely used and accepted method. The detectors t... more Detection of objects through scanning windows is widely used and accepted method. The detectors traditionally do not make use of information that is shared between neighboring image positions although this fact means that the traditional solutions are not optimal. Addressing this, we propose an efficient and computationally inexpensive approach how to exploit the shared information and thus increase speed of detection. The main idea is to predict responses of the classifier in neighbor windows close to the ones already evaluated and skip such positions where the prediction is confident enough. In order to predict the responses, the proposed algorithm builds a new classifier which reuses the set of image features already exploited. The results show that the proposed approach can reduce scanning time up to four times with only minor increase of error rate. On the presented examples it is shown that, it is possible to reach less than one feature computed on average per single image position. The paper presents the algorithm itself and also results of experiments on several data sets with different types of image features.
This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Con... more This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Convolutional Neural Network activations as an embedding to find semantically similar images. From these images, the most typical caption is selected based on unigram frequencies. Although the method received low scores with automated evaluation metrics and in human assessed average correctness, it is competitive in the ratio of captions which pass the Turing test and which are assessed as better or equal to human captions.
In this paper we describe our experiments in High-level feature extraction (HLF) and Search tasks... more In this paper we describe our experiments in High-level feature extraction (HLF) and Search tasks of the 2009 TRECVid evaluation. This year, we have concentrated mainly on the local (affine covariant) image features and their transformation into a searchable form, especially using the indexing techniques. In brief, we have submitted the following runs: HLF: We have used training method based on support vector machine (SVM) using five types of global and local image features. Results were submitted in the BRNO_HLF_SI run. Search: We have performed a fully automatic experiment based on the transformed local image features together with face detection and global features - color layout and texture features in the BrnoUT_visual.2 run. The paper is organized as follows. In Section 1, a motivation and an overview of the work is presented. We dedicated Section 2 to the feature extraction task, which is being used in common by the HLF and Search tasks. Details of the tasks we have sent are ...
Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction - Gaze-In '12, 2012
Abstract When using a multiparty video mediated system, interacting participants assume a range o... more Abstract When using a multiparty video mediated system, interacting participants assume a range of various roles and exhibit behaviors according to how engaged in the communication they are. In this paper we focus on estimation of conversational engagement from gaze signal. In particular, we present an annotation scheme for conversational engagement, a statistical analysis of gaze behavior across varying levels of engagement, and we classify vectors of computed eye tracking measures. The results show that in 74% ...
Lecture Notes in Computer Science, 2008
A currently popular trend in object detection and pattern recognition is usage of statistical cla... more A currently popular trend in object detection and pattern recognition is usage of statistical classifiers, namely AdaBoost and its modifications. The speed performance of these classifiers largely depends on the low level image features they are using: both on the amount of information the feature provides and the executional time of its evaluation. Local Rank Differences is an image feature that is alternative to commonly used haar wavelets. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware, but -as this paper shows -it performs very well on graphics hardware (GPU) as well. The paper discusses the LRD features and their properties, describes an experimental implementation of LRD in graphics hardware, presents its empirical performance measures compared to alternative approaches and suggests several notes on practical usage of LRD and proposes directions for future work.
Proceedings of the Symposium on Eye Tracking Research and Applications - ETRA '12, 2012
ABSTRACT Interaction intent prediction and the Midas touch have been a longstanding challenge for... more ABSTRACT Interaction intent prediction and the Midas touch have been a longstanding challenge for eye-tracking researchers and users of gaze-based interaction. Inspired by machine learning approaches in biometric person authentication, we developed and tested an offline framework for task-independent prediction of interaction intents. We describe the principles of the method, the features extracted, normalization methods, and evaluation metrics. We systematically evaluated the proposed approach on an example dataset of gaze-augmented problem-solving sessions. We present results of three normalization methods, different feature sets and fusion of multiple feature types. Our results show that accuracy of up to 76% can be achieved with Area Under Curve around 80%. We discuss the possibility of applying the results for an online system capable of interaction intent prediction.
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '13, 2013
ABSTRACT Object detection is one of the key tasks in computer vision. It is computationally inten... more ABSTRACT Object detection is one of the key tasks in computer vision. It is computationally intensive and it is reasonable to accelerate it in hardware. The possible benefits of the acceleration are reduction of the computational load of the host computer system, increase of the overall performance of the applications, and reduction of the power consumption. We present novel architecture for multi-scale object detection in video streams. The architecture uses scanning window classifiers produced by WaldBoost learning algorithm, and simple image features. It employs small image buffer for data under processing, and on-the-fly scaling units to enable detection of object in multiple scales. The whole processing chain is pipelined and thus more image windows are processed in parallel. We implemented the engine in Spartan 6 FPGA and we show that it can process 640x480 pixel video streams at over 160 frames per second without the need of external memory. The design takes only a fraction of resources, compared to similar state of the art approaches.
Lecture Notes in Computer Science, 2009
This paper presents Local Rank Patterns (LRP) -novel features for rapid object detection in image... more This paper presents Local Rank Patterns (LRP) -novel features for rapid object detection in images which are based on existing features Local Rank Differences (LRD). The performance of the novel features is thoroughly tested on frontal face detection task and it is compared to the performance of the LRD and the traditionally used Haar-like features. The results show that the LRP surpass the LRD and the Haarlike features in the precision of detection and also in the average number of features needed for classification. Considering recent successful and efficient implementations of LRD on CPU, GPU and FPGA, the results suggest that LRP are good choice for object detection and that they could replace the Haar-like features in some applications in the future.
Real-Time Systems, Architecture, Scheduling, and Application, 2012
Lecture Notes in Computer Science, 2012
Proceedings of the 10th European conference on Interactive tv and video - EuroiTV '12, 2012
In this paper we present a comparative study of free-hand pointing, an absolute remote pointing d... more In this paper we present a comparative study of free-hand pointing, an absolute remote pointing device. Unimanual and bimanual interaction were tested as well as the static reference system (spatial coordinates are fixed in the space in front of the TV) and novel body-aligned reference system (coordinates are bound to the current position of the user). We conducted a pointand-click experiment with 12 participants. We have identified the preferred interaction areas for left-and right-handed users in terms of hand preference and preferred spatial areas of the interaction. In bimanual interaction, the users relied more on dominant hand, switching hands only when necessary. Even though the remote pointing device was faster than the free-hand pointing, it was less accepted probably due to its low precision.