Wladyslaw Skarbek | Warsaw University of Technology (original) (raw)

Papers by Wladyslaw Skarbek

Research paper thumbnail of WebGL and web audio software lightweight components for multimedia education

Proceedings of SPIE, Aug 7, 2017

The paper presents the results of our recent work on development of contemporary computing platfo... more The paper presents the results of our recent work on development of contemporary computing platform DC2 for multimedia education usingWebGL andWeb Audio { the W3C standards. Using literate programming paradigm the WEBSA educational tools were developed. It offers for a user (student), the access to expandable collection of WEBGL Shaders and web Audio scripts. The unique feature of DC2 is the option of literate programming, offered for both, the author and the reader in order to improve interactivity to lightweightWebGL andWeb Audio components. For instance users can define: source audio nodes including synthetic sources, destination audio nodes, and nodes for audio processing such as: sound wave shaping, spectral band filtering, convolution based modification, etc. In case of WebGL beside of classic graphics effects based on mesh and fractal definitions, the novel image processing analysis by shaders is offered like nonlinear filtering, histogram of gradients, and Bayesian classifiers.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Virtual reality for spherical images

Proceedings of SPIE, Aug 7, 2017

Paper presents virtual reality application framework and application concept for mobile devices. ... more Paper presents virtual reality application framework and application concept for mobile devices. Framework uses Google Cardboard library for Android operating system. Framework allows to create virtual reality 360 video player using standard OpenGL ES rendering methods. Framework provides network methods in order to connect to web server as application resource provider. Resources are delivered using JSON response as result of HTTP requests. Web server also uses Socket.IO library for synchronous communication between application and server. Framework implements methods to create event driven process of rendering additional content based on video timestamp and virtual reality head point of view.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Konwolucyjne sieci głębokie w programie nauczania technik multimedialnych w Zakładzie Telewizji Politechniki Warszawskiej

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Analiza możliwości transkodowania strumienia MPEG-2 video do MPEG-4 AVC/H.264 w dziedzinie współczynników transformaty

Elektronika : konstrukcje, technologie, zastosowania, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Color transfer by fitting clouds of color points

Proceedings of SPIE, Aug 7, 2017

Color transfer methods can alter color appearance in the input image by borrowing color statistic... more Color transfer methods can alter color appearance in the input image by borrowing color statistics from the reference image. In this paper we present a novel color transfer method in which we consider both input and reference images as three-dimensional sets of data samples, where each color based component can be represented as a 3D cloud of data points. Our goal is to fit position, orientation and scale of color-component clouds from reference to input image by finding proper geometric transformation. Besides global processing approach we also present local color transfer method by applying our proposed algorithm to color segmented parts of images. We use pixel clustering for image segmentation to find groups of dominant colors pixels in each of input and reference images. Experimental results and comparisons with other methods confirm the validity and usefulness of presented method.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of <title>Two step appearance-based approach for fast and reliable face localization</title>

SPIE Proceedings, Mar 6, 2006

Among many face detection methods the appearance-based ones have proved to be the most accurate a... more Among many face detection methods the appearance-based ones have proved to be the most accurate and in particular the AdaBoost cascade algorithm is both accurate and a very fast technique. The high speed of the detector is a crucial parameter in many face detection applications, e.g. the face recognition. In this paper the central two-step detector is presented which is a serial connection of the cascade of the extended weak classifiers and the AdaBoost cascade. The cascade of the extended weak classifiers is a novel concept that accelerates the detection speed to a high degree. The second introduced novelty is the verification of the detection results with another AdaBoost cascade to push a number of false acceptances down to the extremely low level.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Koncepcja interdyscyplinarnego programu nauczania multimediów na poziomie magisterskim - projekt norweski

Bookmarks Related papers MentionsView impact

Research paper thumbnail of From face identification to emotion recognition

Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, Nov 6, 2019

This paper aims to explore the practicality of transfer learning regarding to the emotion recogni... more This paper aims to explore the practicality of transfer learning regarding to the emotion recognition task. We present superior performance of the transfer learning from the face identification, compared with the solutions of train-from-scratch feed-forward deep neural networks and general transfer learning from object classifications. We illustrate that the better adaptation of source domain can help with the initialization of the network, providing more efficient learning from the target training samples. In such way even network with complex architecture can overcome over-fitting problems thus having better results than other solutions can do having the same amount of training data. We discuss the detailed training strategies to the get best performance of such transfer leaning using fine-tuning mechanisms on the classical VGG-16 architecture network based on the public accessible FER2013 emotion database.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Online 3D face reconstruction with incremental Structure From Motion and a regressor cascade

Proceedings of SPIE, Dec 2, 2014

In this paper we present a method for online 3D face reconstruction from a video sequence. The fa... more In this paper we present a method for online 3D face reconstruction from a video sequence. The face landmarks in a given frame are detected and used to create a 3D shape estimate. The resulting 3D shape is an approximate, sparse representation of the subject’s face. Our reconstruction step is based on a revised version of incremental Structure From Motion, where we use a novel 4D subspace tracking procedure followed by scaled deflation against a vector of ones. Facial landmark detection is built upon a regressor cascade scheme where each subsequent regressor updates the initial shape obtained from the preceding frame.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Detekcja kopii obrazu metodą cech lokalnych

Elektronika : konstrukcje, technologie, zastosowania, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of カンジダ3Dモデルのアニメーション運動による顔表情認識【JST・京大機械翻訳】

Proceedings of SPIE, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Multimodal emotion classification by streaming fixed time segments for speaker movies

The approach to Video-Audio Emotion Recognition takes advantage of gaining additional information... more The approach to Video-Audio Emotion Recognition takes advantage of gaining additional information from multimodalites. Since the target features are time related without strict alignment in time, video-audio features become simply video features and audio features. Exploring toward such a goal, spectrogram as outstanding vocal feature in neural network solution is selected to get benefits of convolution filters. Inspired by solution of image captioning of LSTM where embedded words information and image information are spatially aligned, we perform embedding of the audio spectrogram and image sequences since time information is converted to spatial information in spectrogram. We propose both architecture and framework optimizing the alignment of the mentioned temporal features and we provide the analysis of the significant performance improvement along with the discussion of the Video-Audio Emotion Recognition general tasks.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Proceedings of the 9th International Conference on Computer Analysis of Images and Patterns

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Multi-modal Residual Perceptron Network for Audio-Video Emotion Recognition

arXiv (Cornell University), Jul 21, 2021

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Image steganography for increasing security of OTP authentication

Verification of customer in web based banking system is a significant issue these days where exch... more Verification of customer in web based banking system is a significant issue these days where exchanges are done utilizing uncertain Internet. The advanced communication medium is particularly experiencing a lot of threats. Picture identification and One Time Password (OTP) were commonly used to authenticate the customer over many banking systems. In most of the cases they were sent separately which is vulnerable in many cases. To solve this issue, this paper aims to give a method using both the image with hidden customer information and the OTP which is sent as SMS to user mobile. Personal Identification Number (PIN) provided by the bank at the time of registration is used to activate the process of image steganography and sending OTP to the user. The user has to know the image which was opted at the time of registration. The OTP has to be entered in a virtual keypad that has random keys to avoid key logging, used for decrypting the information hidden in the image. The image, the hidden information should match with the information in the database, thus providing the session for the customer.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Deep alignment network: from MIMD to SIMD platform

The paper considers the following software engineering problem for digital media: given a softwar... more The paper considers the following software engineering problem for digital media: given a software tool for processing tensor signals, like Deep Neural network (DNN) defined for MIMD architecture (Multi Instruction, Multi Data), redefine this algorithm to SIMD architecture (Single Instruction, Multiple Data). While for mapping multiple instructions, the standard signal processing approach is applied, for mapping tensors of any dimensionality, 2D RGBA textures (Red, Green, Blue, and Alpha channels) are used as the target data structure. To illustrate the tensor mapping concept, Deep Alignment Network (DAN), contemporary important application for Human Computer Interfacing, is selected and its efficiency analyzed. The testbed for comparisons of DAN’s MIMD and SIMD architectures, was based on Javascript (MIMD) and WebGL (SIMD) software platforms. It appears that expected speed-up (checked for commodity personal computers) of SIMD versus MIMD architecture is on the reasonable level: 350 image frames per minute versus seven image frames per minute.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Color correction by color mapping using color temperature constraints

Color correction methods have gained a lot of attention in the past few years to circumvent color... more Color correction methods have gained a lot of attention in the past few years to circumvent color degradation that may occur due to image acquisition in various light sources conditions, which can be described easily by color temperature parameter. Color Temperature (CT) in color theory is defined as the temperature of a blackbody radiator whose chromaticity point is closest to the chromaticity point of the non-planckian light source. In simple words, CT describes whether the light source at given scene is more bluish, neutral or reddish. In this paper, we present a color correction method in which we combine local color mapping based on selected color samples from both input and reference images together with user-specified color temperature constraint to formulate the optimization problem which result is simple linear transformation matrix. We also present experimental results and comparisons with other color correction methods for performance validation of the proposed method.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Multi-objective noisy-based deep feature loss for speech enhancement

Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, Nov 6, 2019

Deep neural networks have become a great tool for creating solutions to denoise the speech signal... more Deep neural networks have become a great tool for creating solutions to denoise the speech signal, improving the intelligibility, speech quality and signal-to-noise ratio. An important element during training deep speech networks is the use of an appropriate loss function that allows to improvement the subjective and objective measures. In our work, we used the loss function based on a well-trained deep network to classify whether the signal is noisy and clean. Thanks to this, the deep network responsible for denoising is based on minimizing the difference of deep features of the pure and enhanced signal. Our work shows that the use of only deep features in the loss function allows a significant improvement in the measurement of speech signal quality. Novelty is also feature extractor, which has been trained as a multi-objective noise classifier. We believe that deep-feature loss could help in the optimization of functions difficult to differentiate.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Smile detectors correlation

Proceedings of SPIE, Aug 7, 2017

The novel smile recognition algorithm is presented based on extraction of 68 facial salient point... more The novel smile recognition algorithm is presented based on extraction of 68 facial salient points (fp68) using the ensemble of regression trees. The smile detector exploits the Support Vector Machine linear model. It is trained with few hundreds exemplar images by SVM algorithm working in 136 dimensional space. It is shown by the strict statistical data analysis that such geometric detector strongly depends on the geometry of mouth opening area, measured by triangulation of outer lip contour. To this goal two Bayesian detectors were developed and compared with SVM detector. The first uses the mouth area in 2D image, while the second refers to the mouth area in 3D animated face model. The 3D modeling is based on Candide-3 model and it is performed in real time along with three smile detectors and statistics estimators. The mouth area/Bayesian detectors exhibit high correlation with fp68/SVM detector in a range [0:8; 1:0], depending mainly on light conditions and individual features with advantage of 3D technique, especially in hard light conditions.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Tuning deep learning algorithms for face alignment and pose estimation

In this paper tuning for deep learning algorithms is performed for face alignment and pose estima... more In this paper tuning for deep learning algorithms is performed for face alignment and pose estimation problems. For pose estimation the classical indirect method (from fp68 landmarks via Candide model to pose) is compared with direct method when both the landmarks and the pose are obtained by regressive deep neural network (DNN) algorithms of VGG type. Indirect method appeared slightly more accurate than the direct one with respect to inter-ocular, inter-pupil, and box-diagonal measures . We analyzed also both indirect and direct DNN algorithms in two scenarios of resolution reducing for convoluted data tensors: via max-pooling and via striding of convolution operations. The striding algorithms exhibit relatively low amount of parameters (around 10 percent of max-pooling version compression) traded for slight loss of accuracy.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of WebGL and web audio software lightweight components for multimedia education

Proceedings of SPIE, Aug 7, 2017

The paper presents the results of our recent work on development of contemporary computing platfo... more The paper presents the results of our recent work on development of contemporary computing platform DC2 for multimedia education usingWebGL andWeb Audio { the W3C standards. Using literate programming paradigm the WEBSA educational tools were developed. It offers for a user (student), the access to expandable collection of WEBGL Shaders and web Audio scripts. The unique feature of DC2 is the option of literate programming, offered for both, the author and the reader in order to improve interactivity to lightweightWebGL andWeb Audio components. For instance users can define: source audio nodes including synthetic sources, destination audio nodes, and nodes for audio processing such as: sound wave shaping, spectral band filtering, convolution based modification, etc. In case of WebGL beside of classic graphics effects based on mesh and fractal definitions, the novel image processing analysis by shaders is offered like nonlinear filtering, histogram of gradients, and Bayesian classifiers.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Virtual reality for spherical images

Proceedings of SPIE, Aug 7, 2017

Paper presents virtual reality application framework and application concept for mobile devices. ... more Paper presents virtual reality application framework and application concept for mobile devices. Framework uses Google Cardboard library for Android operating system. Framework allows to create virtual reality 360 video player using standard OpenGL ES rendering methods. Framework provides network methods in order to connect to web server as application resource provider. Resources are delivered using JSON response as result of HTTP requests. Web server also uses Socket.IO library for synchronous communication between application and server. Framework implements methods to create event driven process of rendering additional content based on video timestamp and virtual reality head point of view.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Konwolucyjne sieci głębokie w programie nauczania technik multimedialnych w Zakładzie Telewizji Politechniki Warszawskiej

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Analiza możliwości transkodowania strumienia MPEG-2 video do MPEG-4 AVC/H.264 w dziedzinie współczynników transformaty

Elektronika : konstrukcje, technologie, zastosowania, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Color transfer by fitting clouds of color points

Proceedings of SPIE, Aug 7, 2017

Color transfer methods can alter color appearance in the input image by borrowing color statistic... more Color transfer methods can alter color appearance in the input image by borrowing color statistics from the reference image. In this paper we present a novel color transfer method in which we consider both input and reference images as three-dimensional sets of data samples, where each color based component can be represented as a 3D cloud of data points. Our goal is to fit position, orientation and scale of color-component clouds from reference to input image by finding proper geometric transformation. Besides global processing approach we also present local color transfer method by applying our proposed algorithm to color segmented parts of images. We use pixel clustering for image segmentation to find groups of dominant colors pixels in each of input and reference images. Experimental results and comparisons with other methods confirm the validity and usefulness of presented method.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of <title>Two step appearance-based approach for fast and reliable face localization</title>

SPIE Proceedings, Mar 6, 2006

Among many face detection methods the appearance-based ones have proved to be the most accurate a... more Among many face detection methods the appearance-based ones have proved to be the most accurate and in particular the AdaBoost cascade algorithm is both accurate and a very fast technique. The high speed of the detector is a crucial parameter in many face detection applications, e.g. the face recognition. In this paper the central two-step detector is presented which is a serial connection of the cascade of the extended weak classifiers and the AdaBoost cascade. The cascade of the extended weak classifiers is a novel concept that accelerates the detection speed to a high degree. The second introduced novelty is the verification of the detection results with another AdaBoost cascade to push a number of false acceptances down to the extremely low level.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Koncepcja interdyscyplinarnego programu nauczania multimediów na poziomie magisterskim - projekt norweski

Bookmarks Related papers MentionsView impact

Research paper thumbnail of From face identification to emotion recognition

Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, Nov 6, 2019

This paper aims to explore the practicality of transfer learning regarding to the emotion recogni... more This paper aims to explore the practicality of transfer learning regarding to the emotion recognition task. We present superior performance of the transfer learning from the face identification, compared with the solutions of train-from-scratch feed-forward deep neural networks and general transfer learning from object classifications. We illustrate that the better adaptation of source domain can help with the initialization of the network, providing more efficient learning from the target training samples. In such way even network with complex architecture can overcome over-fitting problems thus having better results than other solutions can do having the same amount of training data. We discuss the detailed training strategies to the get best performance of such transfer leaning using fine-tuning mechanisms on the classical VGG-16 architecture network based on the public accessible FER2013 emotion database.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Online 3D face reconstruction with incremental Structure From Motion and a regressor cascade

Proceedings of SPIE, Dec 2, 2014

In this paper we present a method for online 3D face reconstruction from a video sequence. The fa... more In this paper we present a method for online 3D face reconstruction from a video sequence. The face landmarks in a given frame are detected and used to create a 3D shape estimate. The resulting 3D shape is an approximate, sparse representation of the subject’s face. Our reconstruction step is based on a revised version of incremental Structure From Motion, where we use a novel 4D subspace tracking procedure followed by scaled deflation against a vector of ones. Facial landmark detection is built upon a regressor cascade scheme where each subsequent regressor updates the initial shape obtained from the preceding frame.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Detekcja kopii obrazu metodą cech lokalnych

Elektronika : konstrukcje, technologie, zastosowania, 2009

Bookmarks Related papers MentionsView impact

Research paper thumbnail of カンジダ3Dモデルのアニメーション運動による顔表情認識【JST・京大機械翻訳】

Proceedings of SPIE, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Multimodal emotion classification by streaming fixed time segments for speaker movies

The approach to Video-Audio Emotion Recognition takes advantage of gaining additional information... more The approach to Video-Audio Emotion Recognition takes advantage of gaining additional information from multimodalites. Since the target features are time related without strict alignment in time, video-audio features become simply video features and audio features. Exploring toward such a goal, spectrogram as outstanding vocal feature in neural network solution is selected to get benefits of convolution filters. Inspired by solution of image captioning of LSTM where embedded words information and image information are spatially aligned, we perform embedding of the audio spectrogram and image sequences since time information is converted to spatial information in spectrogram. We propose both architecture and framework optimizing the alignment of the mentioned temporal features and we provide the analysis of the significant performance improvement along with the discussion of the Video-Audio Emotion Recognition general tasks.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Proceedings of the 9th International Conference on Computer Analysis of Images and Patterns

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Multi-modal Residual Perceptron Network for Audio-Video Emotion Recognition

arXiv (Cornell University), Jul 21, 2021

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Image steganography for increasing security of OTP authentication

Verification of customer in web based banking system is a significant issue these days where exch... more Verification of customer in web based banking system is a significant issue these days where exchanges are done utilizing uncertain Internet. The advanced communication medium is particularly experiencing a lot of threats. Picture identification and One Time Password (OTP) were commonly used to authenticate the customer over many banking systems. In most of the cases they were sent separately which is vulnerable in many cases. To solve this issue, this paper aims to give a method using both the image with hidden customer information and the OTP which is sent as SMS to user mobile. Personal Identification Number (PIN) provided by the bank at the time of registration is used to activate the process of image steganography and sending OTP to the user. The user has to know the image which was opted at the time of registration. The OTP has to be entered in a virtual keypad that has random keys to avoid key logging, used for decrypting the information hidden in the image. The image, the hidden information should match with the information in the database, thus providing the session for the customer.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Deep alignment network: from MIMD to SIMD platform

The paper considers the following software engineering problem for digital media: given a softwar... more The paper considers the following software engineering problem for digital media: given a software tool for processing tensor signals, like Deep Neural network (DNN) defined for MIMD architecture (Multi Instruction, Multi Data), redefine this algorithm to SIMD architecture (Single Instruction, Multiple Data). While for mapping multiple instructions, the standard signal processing approach is applied, for mapping tensors of any dimensionality, 2D RGBA textures (Red, Green, Blue, and Alpha channels) are used as the target data structure. To illustrate the tensor mapping concept, Deep Alignment Network (DAN), contemporary important application for Human Computer Interfacing, is selected and its efficiency analyzed. The testbed for comparisons of DAN’s MIMD and SIMD architectures, was based on Javascript (MIMD) and WebGL (SIMD) software platforms. It appears that expected speed-up (checked for commodity personal computers) of SIMD versus MIMD architecture is on the reasonable level: 350 image frames per minute versus seven image frames per minute.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Color correction by color mapping using color temperature constraints

Color correction methods have gained a lot of attention in the past few years to circumvent color... more Color correction methods have gained a lot of attention in the past few years to circumvent color degradation that may occur due to image acquisition in various light sources conditions, which can be described easily by color temperature parameter. Color Temperature (CT) in color theory is defined as the temperature of a blackbody radiator whose chromaticity point is closest to the chromaticity point of the non-planckian light source. In simple words, CT describes whether the light source at given scene is more bluish, neutral or reddish. In this paper, we present a color correction method in which we combine local color mapping based on selected color samples from both input and reference images together with user-specified color temperature constraint to formulate the optimization problem which result is simple linear transformation matrix. We also present experimental results and comparisons with other color correction methods for performance validation of the proposed method.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Multi-objective noisy-based deep feature loss for speech enhancement

Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, Nov 6, 2019

Deep neural networks have become a great tool for creating solutions to denoise the speech signal... more Deep neural networks have become a great tool for creating solutions to denoise the speech signal, improving the intelligibility, speech quality and signal-to-noise ratio. An important element during training deep speech networks is the use of an appropriate loss function that allows to improvement the subjective and objective measures. In our work, we used the loss function based on a well-trained deep network to classify whether the signal is noisy and clean. Thanks to this, the deep network responsible for denoising is based on minimizing the difference of deep features of the pure and enhanced signal. Our work shows that the use of only deep features in the loss function allows a significant improvement in the measurement of speech signal quality. Novelty is also feature extractor, which has been trained as a multi-objective noise classifier. We believe that deep-feature loss could help in the optimization of functions difficult to differentiate.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Smile detectors correlation

Proceedings of SPIE, Aug 7, 2017

The novel smile recognition algorithm is presented based on extraction of 68 facial salient point... more The novel smile recognition algorithm is presented based on extraction of 68 facial salient points (fp68) using the ensemble of regression trees. The smile detector exploits the Support Vector Machine linear model. It is trained with few hundreds exemplar images by SVM algorithm working in 136 dimensional space. It is shown by the strict statistical data analysis that such geometric detector strongly depends on the geometry of mouth opening area, measured by triangulation of outer lip contour. To this goal two Bayesian detectors were developed and compared with SVM detector. The first uses the mouth area in 2D image, while the second refers to the mouth area in 3D animated face model. The 3D modeling is based on Candide-3 model and it is performed in real time along with three smile detectors and statistics estimators. The mouth area/Bayesian detectors exhibit high correlation with fp68/SVM detector in a range [0:8; 1:0], depending mainly on light conditions and individual features with advantage of 3D technique, especially in hard light conditions.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Tuning deep learning algorithms for face alignment and pose estimation

In this paper tuning for deep learning algorithms is performed for face alignment and pose estima... more In this paper tuning for deep learning algorithms is performed for face alignment and pose estimation problems. For pose estimation the classical indirect method (from fp68 landmarks via Candide model to pose) is compared with direct method when both the landmarks and the pose are obtained by regressive deep neural network (DNN) algorithms of VGG type. Indirect method appeared slightly more accurate than the direct one with respect to inter-ocular, inter-pupil, and box-diagonal measures . We analyzed also both indirect and direct DNN algorithms in two scenarios of resolution reducing for convoluted data tensors: via max-pooling and via striding of convolution operations. The striding algorithms exhibit relatively low amount of parameters (around 10 percent of max-pooling version compression) traded for slight loss of accuracy.

Bookmarks Related papers MentionsView impact