Jan Holub | Czech Technical University in Prague (original) (raw)
Papers by Jan Holub
Measurement: Sensors, Dec 1, 2021
AHFE international, 2023
Perceiving the transmitted speech is a task that puts certain amount of cognitive load on the hum... more Perceiving the transmitted speech is a task that puts certain amount of cognitive load on the human brain. The degree of this load depends on several factors, e.g., the loudness of the perceived speech, the type and intensity of background noise, the quality and accent of the speech, familiarity with the topic of the message, etc. This load also varies between the native and non-native language (of the listener). Different levels of such load are manifested in longer duration workloads (e.g., during a work shift) by different levels of overall fatigue, which affects the decrease in the worker's action or decision error rate when performing other concurrent tasks (the so-called parallel-task paradigm). For technologies used in speech transmission or synthesis, e.g., in telecommunications, radio communications, and machine to human communications, the above implies a strong need to optimize the coding of human (or synthetic) voice to minimize listening effort during communication. Listening effort (LE) can be assessed by subjective tests following, e.g., ITU-T P.800 Recommendation, along with listening quality (LQ) as specified in P.800. A natural (but nowhere explicitely mentioned) requirement is that male and female voices are transferred with similar LQ and LE parameters; in other words, the transmission technology, including coding algorithms, frequency filters, or sampling rates, should not privilege one gender over the other to maintain similar working conditions and opportunities for all.The subjective test laboratory has performed gender analysis for all subjective test projects since 2018 to see how (mis)balanced the transmission quality between male and female speakers is. The identified misbalance can affect many professionals that deploy distant voice communication in their daily duties – think of female airport approach control dispatchers or other professionals (policewomen) who are principally handicapped by technological aspects of their job - worse voice transmission quality means higher listening effort is needed and may lead to consequent (subconscious) discomfort of their communication partners, or even intelligibility issues. Of course, this fact is not surprising for narrow-band or even old analog AM transmissions (as still used in AIRCOM). It can only be used as an argument to upgrade communication means to a suitable digital format. Unfortunately, some contemporary wide-band or even full-band digital communications also show statistically significant differences between quality of transferred male and female voices. The detailed results will be presented, including interesting systematic language dependencies (English, German, Mandarin).In the conclusions, suggestions for future codec designs considering the human-centric gender-balanced requirements are proposed. These include the minimum frequency response of the future coders, granularity of the perceptual frequency scaling, etc. Also, suggestions for gender neutrality of original (studio quality) recordings used to prepare the speech samples for the subjective tests are included.
IEEE Transactions on Instrumentation and Measurement, 2023
To automatically detect dangerous situations in public areas, systems based on visual detection a... more To automatically detect dangerous situations in public areas, systems based on visual detection are the most commonly used. In certain situations, to detect a source of danger, an automatic acoustic detection system may be preferable. This paper presents a smart acoustic sensor system, which automatically detects acoustic pulse events, such as gunshots, in public places, and localizes the position of the acoustic source. Tested firearms are 9 mm short gun, 6.35 mm short gun,. 22 short gun with various subsonic and supersonic ammunition. Various sounds like door slams, human screams, etc. represent false alarms.
Journal of physics, Nov 1, 2016
Journal of physics, Nov 1, 2016
A numerous techniques and algorithms are dedicated to extract emotions from input data. In our in... more A numerous techniques and algorithms are dedicated to extract emotions from input data. In our investigation it was stated that emotion-detection approaches can be classified into 3 following types: Keyword based / lexical-based, learning based, and hybrid. The most commonly used techniques, such as keyword-spotting method, Support Vector Machines, Naive Bayes Classifier, Hidden Markov Model and hybrid algorithms, have impressive results in this sphere and can reach more than 90% determining accuracy.
Journal of The Audio Engineering Society, May 12, 2020
This article deals with subjective tests of speech intelligibility. A set of samples in Czechlang... more This article deals with subjective tests of speech intelligibility. A set of samples in Czechlanguage, recorded by four different narrators, was distorted with different noise levels andencoded by a low bit-rate encoder. The subjective test consisted of two parts. The first part(45 participants) proceeded according to the ITU-T Recommendation P.807 - Subjective testmethodology for assessing speech intelligibility. The second part (70 participants) included anadditional (parallel) psychomotor task deploying a laser-shooting simulator, where subjectshad the roles of “shooters” and “counters”. The purpose of the parallel task is to bring thetesting closer to the real use of technology. Significant differences have been found in theresults of the intelligibility of samples from different speakers. There were also differences inevaluation with and without a parallel task. Samples from male narrators have a significantlyhigher intelligibility score in the standard laboratory test, but also show a greater decrease inintelligibility after engaging a parallel task
Measurement: Sensors, Aug 1, 2022
IEEE Transactions on Instrumentation and Measurement, Oct 1, 2020
The distribution of indoor illuminance affects the health and the performance of humans and is, t... more The distribution of indoor illuminance affects the health and the performance of humans and is, therefore, subject to audits. Audits are performed after each modification of the light sources, and the currently adopted approach uses a human operator to perform measurements manually. This process is time-consuming, and we, therefore, aim to automate it with a mobile platform. In order to automate the process, we propose an online algorithm for selecting the control points. The algorithm is iterative, and it takes previous measurements into account in order to select the control points adaptively. We show how the proposed method can be combined with a mobile platform to perform the audit autonomously. Initially, the mobile platform, equipped with a laser range finder, is used to build the floor plan of the room. This floor plan is then used to restrict the area within which the control points can be selected. The whole measurement process is then executed without the presence or the intervention of a human operator, and this reduces the time needed for the audit. We have performed simulated and real experiments to demonstrate the performance of the proposed approach.
Transmission delays are unwanted effects of every day telecommunication systems. These effects in... more Transmission delays are unwanted effects of every day telecommunication systems. These effects influence the overall quality of a telephony connection as perceived by end users. This article summarizes known subjective experiments testing the influence of delays on the perceived quality. In order to find a possible explanation for the differences in the results of those experiments a new conversational test
International Journal of Speech Technology, 2016
2008 Wireless Telecommunications Symposium, 2008
Measurement: Sensors, Dec 1, 2021
AHFE international, 2023
Perceiving the transmitted speech is a task that puts certain amount of cognitive load on the hum... more Perceiving the transmitted speech is a task that puts certain amount of cognitive load on the human brain. The degree of this load depends on several factors, e.g., the loudness of the perceived speech, the type and intensity of background noise, the quality and accent of the speech, familiarity with the topic of the message, etc. This load also varies between the native and non-native language (of the listener). Different levels of such load are manifested in longer duration workloads (e.g., during a work shift) by different levels of overall fatigue, which affects the decrease in the worker's action or decision error rate when performing other concurrent tasks (the so-called parallel-task paradigm). For technologies used in speech transmission or synthesis, e.g., in telecommunications, radio communications, and machine to human communications, the above implies a strong need to optimize the coding of human (or synthetic) voice to minimize listening effort during communication. Listening effort (LE) can be assessed by subjective tests following, e.g., ITU-T P.800 Recommendation, along with listening quality (LQ) as specified in P.800. A natural (but nowhere explicitely mentioned) requirement is that male and female voices are transferred with similar LQ and LE parameters; in other words, the transmission technology, including coding algorithms, frequency filters, or sampling rates, should not privilege one gender over the other to maintain similar working conditions and opportunities for all.The subjective test laboratory has performed gender analysis for all subjective test projects since 2018 to see how (mis)balanced the transmission quality between male and female speakers is. The identified misbalance can affect many professionals that deploy distant voice communication in their daily duties – think of female airport approach control dispatchers or other professionals (policewomen) who are principally handicapped by technological aspects of their job - worse voice transmission quality means higher listening effort is needed and may lead to consequent (subconscious) discomfort of their communication partners, or even intelligibility issues. Of course, this fact is not surprising for narrow-band or even old analog AM transmissions (as still used in AIRCOM). It can only be used as an argument to upgrade communication means to a suitable digital format. Unfortunately, some contemporary wide-band or even full-band digital communications also show statistically significant differences between quality of transferred male and female voices. The detailed results will be presented, including interesting systematic language dependencies (English, German, Mandarin).In the conclusions, suggestions for future codec designs considering the human-centric gender-balanced requirements are proposed. These include the minimum frequency response of the future coders, granularity of the perceptual frequency scaling, etc. Also, suggestions for gender neutrality of original (studio quality) recordings used to prepare the speech samples for the subjective tests are included.
IEEE Transactions on Instrumentation and Measurement, 2023
To automatically detect dangerous situations in public areas, systems based on visual detection a... more To automatically detect dangerous situations in public areas, systems based on visual detection are the most commonly used. In certain situations, to detect a source of danger, an automatic acoustic detection system may be preferable. This paper presents a smart acoustic sensor system, which automatically detects acoustic pulse events, such as gunshots, in public places, and localizes the position of the acoustic source. Tested firearms are 9 mm short gun, 6.35 mm short gun,. 22 short gun with various subsonic and supersonic ammunition. Various sounds like door slams, human screams, etc. represent false alarms.
Journal of physics, Nov 1, 2016
Journal of physics, Nov 1, 2016
A numerous techniques and algorithms are dedicated to extract emotions from input data. In our in... more A numerous techniques and algorithms are dedicated to extract emotions from input data. In our investigation it was stated that emotion-detection approaches can be classified into 3 following types: Keyword based / lexical-based, learning based, and hybrid. The most commonly used techniques, such as keyword-spotting method, Support Vector Machines, Naive Bayes Classifier, Hidden Markov Model and hybrid algorithms, have impressive results in this sphere and can reach more than 90% determining accuracy.
Journal of The Audio Engineering Society, May 12, 2020
This article deals with subjective tests of speech intelligibility. A set of samples in Czechlang... more This article deals with subjective tests of speech intelligibility. A set of samples in Czechlanguage, recorded by four different narrators, was distorted with different noise levels andencoded by a low bit-rate encoder. The subjective test consisted of two parts. The first part(45 participants) proceeded according to the ITU-T Recommendation P.807 - Subjective testmethodology for assessing speech intelligibility. The second part (70 participants) included anadditional (parallel) psychomotor task deploying a laser-shooting simulator, where subjectshad the roles of “shooters” and “counters”. The purpose of the parallel task is to bring thetesting closer to the real use of technology. Significant differences have been found in theresults of the intelligibility of samples from different speakers. There were also differences inevaluation with and without a parallel task. Samples from male narrators have a significantlyhigher intelligibility score in the standard laboratory test, but also show a greater decrease inintelligibility after engaging a parallel task
Measurement: Sensors, Aug 1, 2022
IEEE Transactions on Instrumentation and Measurement, Oct 1, 2020
The distribution of indoor illuminance affects the health and the performance of humans and is, t... more The distribution of indoor illuminance affects the health and the performance of humans and is, therefore, subject to audits. Audits are performed after each modification of the light sources, and the currently adopted approach uses a human operator to perform measurements manually. This process is time-consuming, and we, therefore, aim to automate it with a mobile platform. In order to automate the process, we propose an online algorithm for selecting the control points. The algorithm is iterative, and it takes previous measurements into account in order to select the control points adaptively. We show how the proposed method can be combined with a mobile platform to perform the audit autonomously. Initially, the mobile platform, equipped with a laser range finder, is used to build the floor plan of the room. This floor plan is then used to restrict the area within which the control points can be selected. The whole measurement process is then executed without the presence or the intervention of a human operator, and this reduces the time needed for the audit. We have performed simulated and real experiments to demonstrate the performance of the proposed approach.
Transmission delays are unwanted effects of every day telecommunication systems. These effects in... more Transmission delays are unwanted effects of every day telecommunication systems. These effects influence the overall quality of a telephony connection as perceived by end users. This article summarizes known subjective experiments testing the influence of delays on the perceived quality. In order to find a possible explanation for the differences in the results of those experiments a new conversational test
International Journal of Speech Technology, 2016
2008 Wireless Telecommunications Symposium, 2008