Speech processing, nlp, quality and usability Research Papers (original) (raw)

Text can be analysed by splitting the text and extracting the keywords .These may be represented as summaries, tabular representation, graphical forms, and images. In order to provide a solution to large amount of information present in... more

Text can be analysed by splitting the text and extracting the keywords .These may be represented as summaries, tabular representation, graphical forms, and images. In order to provide a solution to large amount of information present in textual format led to a research of extracting the text and transforming the unstructured form to a structured format. The paper presents the importance of Natural Language Processing (NLP) and its two interesting applications in Python Language: 1. Automatic text summarization [Domain: Newspaper Articles] 2. Text to Graph Conversion [Domain: Stock news]. The main challenge in NLP is natural language understanding i.e. deriving meaning from human or natural language input which is done using regular expressions, artificial intelligence and database concepts. Automatic Summarization tool converts the newspaper articles into summary on the basis of frequency of words in the text. Text to Graph Converter takes in the input as stock article, tokenize them on various index (points and percent) and time and then tokens are mapped to graph. This paper proposes a business solution for users for effective time management.

This paper investigates the impact of frequent and small playout delay adjustments (time-shifting) of 30 ms or less introduced to silence periods by Voice over IP (VoIP) jitter buffer strategies on listening quality perceived by the end... more

This paper investigates the impact of frequent and small playout delay adjustments (time-shifting) of 30 ms or less introduced to silence periods by Voice over IP (VoIP) jitter buffer strategies on listening quality perceived by the end user. In particular, the quality impact is assessed using both a subjective method (quality scores obtained from subjective listening test) and an objective method based on perceptual modelling. Two different objective methods are used, PESQ (Perceptual Evaluation of Speech Quality, ITU-T Recommendation P.862) and POLQA (Perceptual Objective Listening Quality Assessment, ITU-T Recommendation P.863). Moreover, the relative accuracy of both objective models is assessed by comparing their predictions with subjective assessments. The results show that the impact of the investigated playout delay adjustments on subjective listening quality scores is negligible. On the other hand, a significant impact is reported for objective listening quality scores predicted by the PESQ model i.e. the PESQ model fails to correctly predict quality scores for this kind of degradation. Finally, the POLQA model is shown to perform significantly better than PESQ. We conclude the paper by identifying further related research that arises from this study.

Abstract Speech transmission quality measurement in cellular networks is a major indicator of performance for end-to-end quality of service standards. Many approaches have been proposed in the previous studies, but the results correlation... more

Abstract Speech transmission quality measurement in cellular networks is a major indicator of performance for end-to-end quality of service standards. Many approaches have been proposed in the previous studies, but the results correlation with subjective experiments still need further optimization, especially for quality determination using languages with unique phonetic structures ie clicking sounds. Moreover, the evaluation test data is always a key element in order to obtain representative and consistent results.

This paper compares the effect of send-side music and environmental noise as background noise in a telephone communication. The study focuses on the quality experienced by the end user in the context of NB, WB and SWB mobile speech... more

This paper compares the effect of send-side music and environmental noise as background noise in a telephone communication. The study focuses on the quality experienced by the end user in the context of NB, WB and SWB mobile speech communication. The subjective test procedure defined in ITU-T Rec. P.835 is followed in this study. The results show that music as background noise in telephone conversation deteriorates the overall quality experienced by the end user. Moreover, the impact of music background noise on the quality is similar to that of the environmental noise. Furthermore it is shown that the music background noise seems to be slightly less intrusive than the environmental noise, especially when it comes to the lower SNR.

Print Request Permissions Voice user interface and speech quality are normally assessed using subjective user experience testing methods and/or objective instrumental techniques. However, the recent advances in neurophysiological tools... more

Print
Request Permissions
Voice user interface and speech quality are normally assessed using subjective user experience testing methods and/or objective instrumental techniques. However, the recent advances in neurophysiological tools allowed useful human behavioral constructs to be measured in real-time, such as human emotion, perception, preferences and task performance. Electroencephalography (EEG), and functional near-infrared spectroscopy (fNIRS) are well received neuroimaging tools and they are being used in variety of different domains such as health science, neuromarketing, user experience (UX) research and multimedia quality of experience (QoE) discipline. Therefore, this paper describes the impact of natural and text-to-speech (TTS) signals on a user's affective state (valence and arousal) and their preferences using neuroimaging tools (EEG and fNIRS) and subjective user study. The EEG results showed that the natural and high quality TTS speech generate “positive valence”, that was inferred from a higher EEG asymmetric activation at frontal head region. fNIRS results showed the increased activation at Orbito-Frontal Cortex (OFC) region during decision making in favor of natural and high quality TTS speech signals. But natural and TTS signals have significantly different arousal levels.