Zoran Ivanovski | Ss. Cyril & Methodius University in Skopje (original) (raw)

Uploads

Papers by Zoran Ivanovski

Research paper thumbnail of An efficient, SELective, Perceptual-based super-resolution estimator

In this paper, a selective perceptual-based (SELP) framework is presented to reduce the complexit... more In this paper, a selective perceptual-based (SELP) framework is presented to reduce the complexity of popular super-resolution (SR) algorithms while maintaining the desired quality of the enhanced images/video. A perceptual human visual system model is proposed to compute local contrast sensitivity thresholds. The obtained thresholds are used to select which pixels are super-resolved based on the perceived visibility of local edges. Processing only a set of perceptually significant pixels reduces significantly the computational complexity of SR algorithms without losing the achievable visual quality. The proposed SELP framework is integrated into a maximum-a posteriori-based SR algorithm as well as a fast two-stage fusion-restoration SR estimator. Simulation results show a significant reduction on average in computational complexity with comparable signal-to-noise ratio gains and visual quality.

Research paper thumbnail of Segmentation of Shipping Bags in RGB-D Images

Research paper thumbnail of A System for Differentiation of Schizophrenia and Bipolar Disorder based on rsfMRI

2023 30th International Conference on Systems, Signals and Image Processing (IWSSIP)

Research paper thumbnail of Deep Learning Based Multimodal Information Fusion for Near-Miss Event Detection in Intelligent Traffic Monitoring Systems

Studies in systems, decision and control, 2022

Research paper thumbnail of Towards a system for automatic traffic sound event detection

2020 28th Telecommunications Forum (TELFOR), 2020

Intelligent Traffic Surveillance systems have helped improve road safety through ensuring timely ... more Intelligent Traffic Surveillance systems have helped improve road safety through ensuring timely response to events such as traffic accidents and congestion. Our aim is to devise a robust system capable of traffic audio events detection in a real-life environment. At the core of this system is a deep learning model capable of detecting anomalous events and their classification based on their acoustic waveform. We present the results of a series of experiments designed to optimize the architecture of this model based on different algorithms for audio processing. The results show that the designed model has competitive performance to approaches published in literature.

Research paper thumbnail of Automated and Computationally Inexpensive Exposure Fusion for Mobile Devices

Journal of Electrical Engineering and Information Technologies, 2017

Research paper thumbnail of Real–Time Vehicle Detection Based on Wavelet Decomposition and CNN

Research paper thumbnail of Gaussian Power flow Orientation Coefficients for noise-robust speech recognition

European Signal Processing Conference, Sep 1, 2014

Spectro-temporal features have shown a great promise in respect to improving the noise-robustness... more Spectro-temporal features have shown a great promise in respect to improving the noise-robustness of Automatic Speech Recognition (ASR) systems. The common approach uses a bank of 2D Gabor filters to process the speech signal spectrogram and generate the output feature vector. This approach suffers from generating a large number of coefficients, thus necessitating the use of feature dimensionality reduction. The proposed Gaussian Power flow Orientation Coefficients (GPOCs) use an alternative approach in which only the largest coefficients output from a bank of 2D Gaussian kernels are used to describe the spectro-temporal patterns of power flow in the auditory spectrogram. Whilst reducing the size of the feature vectors, the algorithm was shown to outperform traditional feature extraction methods, even a reference spectro-temporal approach, for low SNRs. Its performance for high SNRs is comparable but inferior to traditional ASR frontends, while falling behind state-of-the-art algorithms in all noise scenarios.

Research paper thumbnail of The future of wavelets in medical image processing

Research paper thumbnail of Overview of Feature Selection for Automatic Speech Recognition

Journal of The Audio Engineering Society, Apr 26, 2012

Research paper thumbnail of Content-Based Indoor/Outdoor Video Classification System For A Mobile Platform

Organization of video databases is becoming difficult task as the amount of video content increas... more Organization of video databases is becoming difficult task as the amount of video content increases. Video classification based on the content of videos can significantly increase the speed of tasks such as browsing and searching for a particular video in a database. In this paper, a content-based videos classification system for the classes indoor and outdoor is presented. The system is intended to be used on a mobile platform with modest resources. The algorithm makes use of the temporal redundancy in videos, which allows using an uncomplicated classification model while still achieving reasonable accuracy. The training and evaluation was done on a video database of 443 videos downloaded from a video sharing service. A total accuracy of 87.36% was achieved.

Research paper thumbnail of Delay based optimisation of an integrated online call recording speaker diarisation and identification system

IEEE EUROCON 2017 -17th International Conference on Smart Technologies, 2017

The design of speaker diarisation and recognition systems is a mature research area and their dep... more The design of speaker diarisation and recognition systems is a mature research area and their deployment in the real world has gained momentum. There are still a number of parameters of these systems that have to be tuned and optimised for the application scenario at hand. An online call recording diarisation system is designed with integrated speaker identification of the call-centre operators. The parameters of the speaker diarisation and identification algorithms are cross-tuned using a testbench database. The system performance, as assessed by the true positive rate (TPR), is optimised in respect to the delay introduced by the system. As the system is designed to be used online, the TPR-delay trade-off is crucial to its deployment. The finalised system is flexible in that it allows the user to choose the delay or accuracy needed for on-site deployment.

Research paper thumbnail of SCarrie: A Real-Time System for Sound Event Detection for Assisted Living

2023 30th International Conference on Systems, Signals and Image Processing (IWSSIP)

Research paper thumbnail of Special Issue on Electronic, Telecommunications, Automation, and Informatics with Computer Science (ETAI) Preface

Research paper thumbnail of Macedonian Speech Synthesis for Assistive Technology Applications

Cornell University - arXiv, May 18, 2022

Speech technology is becoming ever more ubiquitous with the advance of speech enabled devices and... more Speech technology is becoming ever more ubiquitous with the advance of speech enabled devices and services. The use of speech synthesis in Augmentative and Alternative Communication tools, has facilitated inclusion of individuals with speech impediments allowing them to communicate with their surroundings using speech. Although there are numerous speech synthesis systems for the most spoken world languages, there is still a limited offer for smaller languages. We propose and compare three models built using parametric and deep learning techniques for Macedonian trained on a newly recorded corpus. We target low-resource edge deployment for Augmentative and Alternative Communication and assistive technologies, such as communication boards and screen readers. The listening test results show that parametric speech synthesis is as performant compared to the more advanced deep learning models. Since it also requires less resources, and offers full speech rate and pitch control, it is the preferred choice for building a Macedonian TTS system for this application scenario.

Research paper thumbnail of Linear Prediction Reflection Coefficients, Mel-Cepstral Cepstral Coefficients

Abstract- Automatic Speech Recognition Systems of today are intensely deployed in real world appl... more Abstract- Automatic Speech Recognition Systems of today are intensely deployed in real world application scenarios which are often characterized by suboptimal operating conditions. Thus their noise robustness has become a crucial parameter when assessing ASR in-field performance. The paper examines the noise robustness of traditional ASR feature sets as applied to a Voice Dialing Application built for Macedonian. The analysis focused on the following features:

Research paper thumbnail of High quality exposure fusion for mobile platforms

IEEE EUROCON 2017 -17th International Conference on Smart Technologies, 2017

In this paper, a new approach for high quality automated exposure fusion on mobile or handheld de... more In this paper, a new approach for high quality automated exposure fusion on mobile or handheld devices is presented. A utilization of the device's viewfinder screen video feed data is proposed, in order to increase the overall performance of the exposure fusion, both in static scenes and in scenes with moving objects. The introduced novelties are computationally inexpensive, since the preview video is of low frame resolution. The proposed extensions are embedded to an existing exposure fusion algorithm, and the performed experimental tests show that the new extended algorithm is better than its predecessor, both visually and in terms of objective quality measures.

Research paper thumbnail of Motion estimation for Super-resolution based on recognition of error artifacts

The work presents an effective approach for subpixel motion estimation for Super-resolution (SR).... more The work presents an effective approach for subpixel motion estimation for Super-resolution (SR). The objective is to improve the quality of the estimated SR image by increasing the accuracy of the motion vectors used in the SR procedure. The correction of the motion vectors is based on appearance of error artifacts in the SR image, introduced due to registration errors. First, SR is performed using full pixel accuracy motion vectors obtained using full search block matching algorithm (FS-BMA). Then, machine learning based method is applied on the resulting images in order to detect and classify artifacts introduced due to missing subpixel components of the motion vectors. The outcome of the classification is a subpixel component of the motion vector. In the final step, SR process is repeated using the corrected (subpixel accuracy) motion vectors.

Research paper thumbnail of No-reference quality assessment of highly compressed video sequences

2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP), 2013

In this paper we present a no-reference quality assessment algorithm for highly compressed H.264 ... more In this paper we present a no-reference quality assessment algorithm for highly compressed H.264 videos. By analyzing the spatio-temporal artifacts and their effect on perceived visual quality we produced an array of viable predictors of quality. Using a feature selection method a small sub-set of features, each applied to a specific artefact-domain, was selected for optimal quality estimation. The features are mapped into video quality scores using a simple linear function, which is computationally efficient and minimizes the effect learning of content. The resulting algorithm is evaluated on content independent sets of highly compressed H.264 video sequences from the LIVE video database where it shows high correlation with the subjective scores.

Research paper thumbnail of Use of Gaussian Mixture Models in Macedonian forensic speaker identification

2012 20th Telecommunications Forum (TELFOR), 2012

ABSTRACT

Research paper thumbnail of An efficient, SELective, Perceptual-based super-resolution estimator

In this paper, a selective perceptual-based (SELP) framework is presented to reduce the complexit... more In this paper, a selective perceptual-based (SELP) framework is presented to reduce the complexity of popular super-resolution (SR) algorithms while maintaining the desired quality of the enhanced images/video. A perceptual human visual system model is proposed to compute local contrast sensitivity thresholds. The obtained thresholds are used to select which pixels are super-resolved based on the perceived visibility of local edges. Processing only a set of perceptually significant pixels reduces significantly the computational complexity of SR algorithms without losing the achievable visual quality. The proposed SELP framework is integrated into a maximum-a posteriori-based SR algorithm as well as a fast two-stage fusion-restoration SR estimator. Simulation results show a significant reduction on average in computational complexity with comparable signal-to-noise ratio gains and visual quality.

Research paper thumbnail of Segmentation of Shipping Bags in RGB-D Images

Research paper thumbnail of A System for Differentiation of Schizophrenia and Bipolar Disorder based on rsfMRI

2023 30th International Conference on Systems, Signals and Image Processing (IWSSIP)

Research paper thumbnail of Deep Learning Based Multimodal Information Fusion for Near-Miss Event Detection in Intelligent Traffic Monitoring Systems

Studies in systems, decision and control, 2022

Research paper thumbnail of Towards a system for automatic traffic sound event detection

2020 28th Telecommunications Forum (TELFOR), 2020

Intelligent Traffic Surveillance systems have helped improve road safety through ensuring timely ... more Intelligent Traffic Surveillance systems have helped improve road safety through ensuring timely response to events such as traffic accidents and congestion. Our aim is to devise a robust system capable of traffic audio events detection in a real-life environment. At the core of this system is a deep learning model capable of detecting anomalous events and their classification based on their acoustic waveform. We present the results of a series of experiments designed to optimize the architecture of this model based on different algorithms for audio processing. The results show that the designed model has competitive performance to approaches published in literature.

Research paper thumbnail of Automated and Computationally Inexpensive Exposure Fusion for Mobile Devices

Journal of Electrical Engineering and Information Technologies, 2017

Research paper thumbnail of Real–Time Vehicle Detection Based on Wavelet Decomposition and CNN

Research paper thumbnail of Gaussian Power flow Orientation Coefficients for noise-robust speech recognition

European Signal Processing Conference, Sep 1, 2014

Spectro-temporal features have shown a great promise in respect to improving the noise-robustness... more Spectro-temporal features have shown a great promise in respect to improving the noise-robustness of Automatic Speech Recognition (ASR) systems. The common approach uses a bank of 2D Gabor filters to process the speech signal spectrogram and generate the output feature vector. This approach suffers from generating a large number of coefficients, thus necessitating the use of feature dimensionality reduction. The proposed Gaussian Power flow Orientation Coefficients (GPOCs) use an alternative approach in which only the largest coefficients output from a bank of 2D Gaussian kernels are used to describe the spectro-temporal patterns of power flow in the auditory spectrogram. Whilst reducing the size of the feature vectors, the algorithm was shown to outperform traditional feature extraction methods, even a reference spectro-temporal approach, for low SNRs. Its performance for high SNRs is comparable but inferior to traditional ASR frontends, while falling behind state-of-the-art algorithms in all noise scenarios.

Research paper thumbnail of The future of wavelets in medical image processing

Research paper thumbnail of Overview of Feature Selection for Automatic Speech Recognition

Journal of The Audio Engineering Society, Apr 26, 2012

Research paper thumbnail of Content-Based Indoor/Outdoor Video Classification System For A Mobile Platform

Organization of video databases is becoming difficult task as the amount of video content increas... more Organization of video databases is becoming difficult task as the amount of video content increases. Video classification based on the content of videos can significantly increase the speed of tasks such as browsing and searching for a particular video in a database. In this paper, a content-based videos classification system for the classes indoor and outdoor is presented. The system is intended to be used on a mobile platform with modest resources. The algorithm makes use of the temporal redundancy in videos, which allows using an uncomplicated classification model while still achieving reasonable accuracy. The training and evaluation was done on a video database of 443 videos downloaded from a video sharing service. A total accuracy of 87.36% was achieved.

Research paper thumbnail of Delay based optimisation of an integrated online call recording speaker diarisation and identification system

IEEE EUROCON 2017 -17th International Conference on Smart Technologies, 2017

The design of speaker diarisation and recognition systems is a mature research area and their dep... more The design of speaker diarisation and recognition systems is a mature research area and their deployment in the real world has gained momentum. There are still a number of parameters of these systems that have to be tuned and optimised for the application scenario at hand. An online call recording diarisation system is designed with integrated speaker identification of the call-centre operators. The parameters of the speaker diarisation and identification algorithms are cross-tuned using a testbench database. The system performance, as assessed by the true positive rate (TPR), is optimised in respect to the delay introduced by the system. As the system is designed to be used online, the TPR-delay trade-off is crucial to its deployment. The finalised system is flexible in that it allows the user to choose the delay or accuracy needed for on-site deployment.

Research paper thumbnail of SCarrie: A Real-Time System for Sound Event Detection for Assisted Living

2023 30th International Conference on Systems, Signals and Image Processing (IWSSIP)

Research paper thumbnail of Special Issue on Electronic, Telecommunications, Automation, and Informatics with Computer Science (ETAI) Preface

Research paper thumbnail of Macedonian Speech Synthesis for Assistive Technology Applications

Cornell University - arXiv, May 18, 2022

Speech technology is becoming ever more ubiquitous with the advance of speech enabled devices and... more Speech technology is becoming ever more ubiquitous with the advance of speech enabled devices and services. The use of speech synthesis in Augmentative and Alternative Communication tools, has facilitated inclusion of individuals with speech impediments allowing them to communicate with their surroundings using speech. Although there are numerous speech synthesis systems for the most spoken world languages, there is still a limited offer for smaller languages. We propose and compare three models built using parametric and deep learning techniques for Macedonian trained on a newly recorded corpus. We target low-resource edge deployment for Augmentative and Alternative Communication and assistive technologies, such as communication boards and screen readers. The listening test results show that parametric speech synthesis is as performant compared to the more advanced deep learning models. Since it also requires less resources, and offers full speech rate and pitch control, it is the preferred choice for building a Macedonian TTS system for this application scenario.

Research paper thumbnail of Linear Prediction Reflection Coefficients, Mel-Cepstral Cepstral Coefficients

Abstract- Automatic Speech Recognition Systems of today are intensely deployed in real world appl... more Abstract- Automatic Speech Recognition Systems of today are intensely deployed in real world application scenarios which are often characterized by suboptimal operating conditions. Thus their noise robustness has become a crucial parameter when assessing ASR in-field performance. The paper examines the noise robustness of traditional ASR feature sets as applied to a Voice Dialing Application built for Macedonian. The analysis focused on the following features:

Research paper thumbnail of High quality exposure fusion for mobile platforms

IEEE EUROCON 2017 -17th International Conference on Smart Technologies, 2017

In this paper, a new approach for high quality automated exposure fusion on mobile or handheld de... more In this paper, a new approach for high quality automated exposure fusion on mobile or handheld devices is presented. A utilization of the device's viewfinder screen video feed data is proposed, in order to increase the overall performance of the exposure fusion, both in static scenes and in scenes with moving objects. The introduced novelties are computationally inexpensive, since the preview video is of low frame resolution. The proposed extensions are embedded to an existing exposure fusion algorithm, and the performed experimental tests show that the new extended algorithm is better than its predecessor, both visually and in terms of objective quality measures.

Research paper thumbnail of Motion estimation for Super-resolution based on recognition of error artifacts

The work presents an effective approach for subpixel motion estimation for Super-resolution (SR).... more The work presents an effective approach for subpixel motion estimation for Super-resolution (SR). The objective is to improve the quality of the estimated SR image by increasing the accuracy of the motion vectors used in the SR procedure. The correction of the motion vectors is based on appearance of error artifacts in the SR image, introduced due to registration errors. First, SR is performed using full pixel accuracy motion vectors obtained using full search block matching algorithm (FS-BMA). Then, machine learning based method is applied on the resulting images in order to detect and classify artifacts introduced due to missing subpixel components of the motion vectors. The outcome of the classification is a subpixel component of the motion vector. In the final step, SR process is repeated using the corrected (subpixel accuracy) motion vectors.

Research paper thumbnail of No-reference quality assessment of highly compressed video sequences

2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP), 2013

In this paper we present a no-reference quality assessment algorithm for highly compressed H.264 ... more In this paper we present a no-reference quality assessment algorithm for highly compressed H.264 videos. By analyzing the spatio-temporal artifacts and their effect on perceived visual quality we produced an array of viable predictors of quality. Using a feature selection method a small sub-set of features, each applied to a specific artefact-domain, was selected for optimal quality estimation. The features are mapped into video quality scores using a simple linear function, which is computationally efficient and minimizes the effect learning of content. The resulting algorithm is evaluated on content independent sets of highly compressed H.264 video sequences from the LIVE video database where it shows high correlation with the subjective scores.

Research paper thumbnail of Use of Gaussian Mixture Models in Macedonian forensic speaker identification

2012 20th Telecommunications Forum (TELFOR), 2012

ABSTRACT