Matej Rojc | University of Maribor (original) (raw)

Papers by Matej Rojc

CRC Press eBooks, Oct 25, 2013

ABSTRACT Embodied conversational agents (ECA) and speech-based human–machine interfaces can toget... more ABSTRACT Embodied conversational agents (ECA) and speech-based human–machine interfaces can together represent more advanced and more natural human–machine interaction. Fusion of both topics is a challenging agenda in research and production spheres. The important goal of human–machine interfaces is to provide content or functionality in the form of a dialog resembling face-to-face conversations. All natural interfaces strive to exploit and use different communication strategies that provide additional meaning to the content, whether they are human–machine interfaces for controlling an application or different ECA-based human–machine interfaces directly simulating face-to-face conversation.

CRC Press eBooks, Oct 16, 2013

Coverbal Synchrony in Human-Machine Interaction, 2013

Lecture Notes in Computer Science, 2011

Lecture notes in electrical engineering, 2019

This paper outlines a novel framework that has been designed to create a repository of “gestures”... more This paper outlines a novel framework that has been designed to create a repository of “gestures” for embodied conversational agents. By utilizing it, the virtual agents can sculpt conversational expressions incorporating both verbal and non-verbal cues. The 3D representations of gestures are captured in EVA Corpus, and then stored as a repository of motor skills in the form of expressively tunable templates.

Lecture Notes in Computer Science, 2012

Expert Systems With Applications, Nov 1, 2020

This is a PDF file of an article that has undergone enhancements after acceptance, such as the ad... more This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Lecture Notes in Computer Science, 2011

... robotics.com 2 Faculty of Electrical Engineering and Computer Science, University of Maribor,... more ... robotics.com 2 Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova ulica 17, Slovenia matej.rojc@uni-mb.si ... 8] is one of the frameworks that incorporate various input modalities that the user can utilize to simulate natural hu-man-machine ...

IntechOpen eBooks, Jul 7, 2023

With spoken language interfaces, chatbots, and enablers, the conversational intelligence became a... more With spoken language interfaces, chatbots, and enablers, the conversational intelligence became an emerging field of research in man-machine interfaces in several target domains. In this paper, we introduce the multilingual conversational chatbot platform that integrates Open Health Connect platform and mHealth application together with multimodal services in order to deliver advanced 3D embodied conversational agents. The platform enables novel human-machine interaction with the cancer survivors in six different languages. The platform also integrates patients' reported information as patients gather health data into digital clinical records. Further, the conversational agents have the potential to play a significant role in healthcare, from assistants during clinical consultations, to supporting positive behavior changes, or as assistants in living environments helping with daily tasks and activities.

BACKGROUND Due to the prevalence of depression, its often chronic course, re-occurrences, and ass... more BACKGROUND Due to the prevalence of depression, its often chronic course, re-occurrences, and associated disability, early detection and non-intrusive monitoring present a crucial tool in timely diagnosing and treatment, remission of depression, prevention of relapse, and therefore limiting its impact on quality of life and well-being. Existing successful attempts in exploiting artificial intelligence for early classification of depression are mostly data-driven and thus non-transparent and lacking effective means to deal with uncertainties. OBJECTIVE We present an approach towards designing an explainable, knowledge-based artificial intelligence for classification of symptoms of depression. The aim of the study was to define and evaluate an end-to-end framework for extracting observable depression cues from diary recordings, and to evaluate the framework and explore the potential of the pipeline to present a feasible solution for detecting symptoms of depression automatically, using observable behavior cues. METHODS First, we defined an end-to-end framework for extracting depression cues (i.e., facial, speech, and language features), and stored them as a digital patient resource (i.e., the Fast Healthcare Interoperability Resource). Second, we extracted these cues from 28 video recordings from SymptomMedia dataset (14 simulating a variety of diagnoses of depression, and 14 simulating other mental health-related diagnoses), and 27 recordings from DAIC-WOZ dataset (12 classified as having moderate or severe symptoms of depression, and 15 without any depressive symptoms), and compared the presence of the extracted features between recordings of individuals with depressive disorder and those without. RESULTS We identified several cues consistent with previous studies in regard to their distinction between individuals with and without depressive disorders through both datasets among language (i.e., use of first-person singular pronouns, use of negatively valanced words, explicit mentions of treatment of depression, some features of language complexity), speech (i.e., speaking rate, voiced speech and pauses, low articulation rate, monotonous speech), and facial cues (i.e., rotational energy of heat movements. Other defined cues require further research. CONCLUSIONS The nature/context of the discourse, the impact of other disorders and physical/psychological stress as well as quality and resolution of recordings play an important role in alignment of digital features with relevant background. The work presented in this paper provides a novel approach for the extraction of wide array of cues, relevant for depression classification, and opens up new opportunities for further research.

IEEE Access, 2019

A major drawback of corpus-based speech synthesis systems is the use of large acoustic inventorie... more A major drawback of corpus-based speech synthesis systems is the use of large acoustic inventories, and currently one of the main challenges is the optimal representation of concatenation costs associated with units in the acoustic inventory. These concatenation costs are used to evaluate spectral mismatches between the acoustic units to be concatenated. The combinatorics of costs grows exponentially with the size of the acoustic inventories and can result in hundreds of millions or even billions of concatenation costs to be processed. Therefore, in this paper, we represent a novel unit selection optimization algorithm, which minimizes the size of concatenation costs through the vector quantization-based compression technique and tuple structures. Furthermore, the proposed optimization algorithm is designed to be used as an objective measure to optimize the performance of the unit selection cost function regarding the quality of the speech output, and to evaluate the effect of the vector quantization-based compression technique on its performance. The results obtained show that even when data compression is above 50%, the effect on the quality of the synthesized speech is negligible.

Computers & Electrical Engineering, Oct 1, 2018

In the paper, a speech-based platform for intelligent ambience and/or supportive environment appl... more In the paper, a speech-based platform for intelligent ambience and/or supportive environment applications is presented. The platform has a distributed architecture, which enables extended connectivity and support for multiple intelligent ambience services. The mobile unit Genesis is an integral part of the distributed platform, enabling interaction between several users and the environment. Furthermore, the sophisticated client/server platform's architecture incorporates robust speech recognition and text-to-speech synthesis engines for more natural human-machine interaction between users and the mobile unit Genesis. Both engines are multilingual oriented. Although the whole system is developed for the Slovenian language, it can be quickly adapted for other languages when appropriate language resources are available. With high speaker independent speech recognition accuracy and low command-to-operation delay, Genesis proves to have good manoeuvrability and it is easy to operate even by a non-experienced operator.

Language Resources and Evaluation, Aug 3, 2007

International Conference on Artificial Intelligence, Feb 22, 2012

Applied Artificial Intelligence, Aug 9, 2013

InTech eBooks, Jun 21, 2011

Zenodo (CERN European Organization for Nuclear Research), Feb 26, 2021

This project has received funding from the European Union's Horizon 2020 research and innovation ... more This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 875406 D4.3 Alpha version of the sensing network 26/02/2021 Project title Patients-centered SurvivorShIp care plan after Cancer treatments based on Big Data and Artificial Intelligence technologies Grant Agreement number 875406 Call and topic identifier SC1-DTH-01-2019-Big data and Artificial Intelligence for monitoring health status and quality of life after the cancer treatment Funding schema RIA Coordinator FUNDACION CENTRO TECNOLOXICO DE TELECOMUNICACIONS DE GALICIA (GRADIANT)

CRC Press eBooks, Oct 25, 2013

CRC Press eBooks, Oct 16, 2013

Coverbal Synchrony in Human-Machine Interaction, 2013

Lecture Notes in Computer Science, 2011

Lecture notes in electrical engineering, 2019

Lecture Notes in Computer Science, 2012

Expert Systems With Applications, Nov 1, 2020

Lecture Notes in Computer Science, 2011

IntechOpen eBooks, Jul 7, 2023

IEEE Access, 2019

Computers & Electrical Engineering, Oct 1, 2018

Language Resources and Evaluation, Aug 3, 2007

International Conference on Artificial Intelligence, Feb 22, 2012

Applied Artificial Intelligence, Aug 9, 2013

InTech eBooks, Jun 21, 2011

Zenodo (CERN European Organization for Nuclear Research), Feb 26, 2021

In order to engage with a human user on more personal level, natural HCI is starting to virtualiz... more In order to engage with a human user on more personal level, natural HCI is starting to virtualize itself and is utilizing the potential of entities resembling human collocutors in interaction. In particular through human-likeness, these entities represent the multimodal interaction models, which are capable to adapt to user’s context and to facilitate context of the conversational situation not only via modalities like speech, but also through visual representation of information, and social cues, e.g. of oneself (personality), emotion and attitude. Thus, the extension of conversational interfaces (CAs) with embodied conversational agents (ECA) seems only natural. In this chapter we would like to outline highly modular framework for the generation and realization of multimodal natural communication incorporating embodied conversational agents, male or female ones. The personification of the machine responses is achieved through automatic generation and visualization of co-verbal behavior, based on the behavior generation engine, and the physical realization via behavior realization engines based on Unity Game engine’s core.

This book provides novel insights into the research, development, and designing of advanced techn... more This book provides novel insights into the research, development, and designing of advanced techniques and methods for more natural multimodal human-machine interfaces for use within desktop and pervasive computing environments. The book consists of 15 chapters structured so as to provide an in-depth investigation of novel approaches from both the theoretical and practical perspective. Humans tend to interact using several modalities and communication channels in a highly complex, yet synergetic manner.

The aim of the book is to represent a flexible and efficient algorithm and a novel system used fo... more The aim of the book is to represent a flexible and efficient algorithm and a novel system used for the planning, generation, and realization of conversational behavior (co-verbal behavior). Such behavior is best described as a set of moving body parts, which are meaningful. In terms of prosody, it is synchronized with the accompanying speech. The movement and shapes generated as a co-verbal behavior represent a contextual link between a repertoire of independent motor skills (shapes, movements, and poses that conversational agent can reproduce and execute), and the intent/meaning of spoken sequences (context). The actual intent/meaning of spoken content is identified through language-dependent linguistic markers and prosody. The knowledge databases used to determine the intent/meaning of text are based on the linguistic analysis and classification of the text into semiotic classes and subclasses achieved through annotation of multimodal corpora based on the proposed EVA annotation scheme. The scheme allows for capturing features at a functional (context-dependent), as well as at a descriptive (context-independent) level. The functional level captures high-level features that describe the correlation between speech and co-verbal behavior, whereas the descriptive level allows us to capture and define body-poses and shapes independently of verbal content and in high-resolution. The annotation scheme, therefore, not only interlinks speech and gesture at a semiotic level, but also serves as a basis for the creation of a context independent repertoire of movement and shapes.
The process of generating the co-verbal behavior is, in this book, divided into two phases. The first phase deals with the classification of intent and its synchronization with the verbal content and prosody. The second phase then transforms the planned and synchronized behavior into a co-verbal animation performed by an embodied conversational agent (ECA). In order to be able to extrapolate intent from arbitrary text-sequences, the algorithm for the formulation of behavior deduces meaning/intent in regard to the semiotic intent. Furthermore, the algorithm considers the linguistic features of arbitrary and un-annotated text and select primitive gestures based on semiotic nuclei, as identified by semiotic classification and further modeled by the predicted prosodic features of speech to be generated by a general text-to-speech system (TTS). The output of the phase for formulation of behavior is represented as a hierarchical procedure encoded in XML format, and as a speech sequence generated by TTS. The procedural description is event-oriented and represents a well-defined structure of consecutive movements of body-parts, as well as of body-parts moving in parallel. The second phase of the novel architecture transforms the procedural descriptions into a series of coherent animations of individual parts of the articulated embodied conversational agent. In this regard a novel ECA-based realization framework named EVA-framework is also represented. It supports a real-time realization of procedural animation descriptions and plans on multi-part mesh-based models, by using skeletal animation, blend shape animation, and the animation of predefined (pre-recorded) animated segments. This book, therefore, considers a complete design and implementation of an expressive model for the generation of co-verbal behavior, which is able to transform un-annotated text into a speech-synchronized series of animated sequences.

Version 1.0.4 released. Version 1.0.4 of Meettell mobile application features the moderator funct... more Version 1.0.4 released.
Version 1.0.4 of Meettell mobile application features the moderator functionality. By entering the moderator session code any participant can become a moderator of a discussion and can efficiently manage the discussion with his/her mobile phone. The Meettell mobile application now supports all the available discussion management functionalities supported by the Meettell session server.

The Meettell system consists of a Meettell session server and mobile phone clients. The Meettell ... more The Meettell system consists of a Meettell session server and mobile phone clients. The Meettell session server offers several functionalities for efficiently managing a meeting, for controlling discussion, for enabling participants to register for the meeting, and to participate in the discussions using the audio system of the meeting through their mobile phones.
Meettell allows for efficient management of discussions at any meeting you may have. It brings new quality to meetings regarding the management of discussion and enables efficient control of all the discussion elements using participants smart phones.