Matej Rojc | University of Maribor (original) (raw)

Papers by Matej Rojc

Research paper thumbnail of Coverbal Synchrony in Human-Machine Interaction

CRC Press eBooks, Oct 25, 2013

ABSTRACT Embodied conversational agents (ECA) and speech-based human–machine interfaces can toget... more ABSTRACT Embodied conversational agents (ECA) and speech-based human–machine interfaces can together represent more advanced and more natural human–machine interaction. Fusion of both topics is a challenging agenda in research and production spheres. The important goal of human–machine interfaces is to provide content or functionality in the form of a dialog resembling face-to-face conversations. All natural interfaces strive to exploit and use different communication strategies that provide additional meaning to the content, whether they are human–machine interfaces for controlling an application or different ECA-based human–machine interfaces directly simulating face-to-face conversation.

Research paper thumbnail of A Distributed Architecture for Real-Time Dialogue and On-Task Learning of Efficient Co-Operative Turn-Taking

CRC Press eBooks, Oct 16, 2013

Research paper thumbnail of From Annotation to Multimodal Behavior

Coverbal Synchrony in Human-Machine Interaction, 2013

Research paper thumbnail of Representation of large lexica using finite-state transducers for the multilingual text-to-speech synthesis systems

Research paper thumbnail of Developing Multimodal Web Interfaces by Encapsulating Their Content and Functionality within a Multimodal Shell

Lecture Notes in Computer Science, 2011

Research paper thumbnail of Development of a Repository of Virtual 3D Conversational Gestures and Expressions

Lecture notes in electrical engineering, 2019

This paper outlines a novel framework that has been designed to create a repository of “gestures”... more This paper outlines a novel framework that has been designed to create a repository of “gestures” for embodied conversational agents. By utilizing it, the virtual agents can sculpt conversational expressions incorporating both verbal and non-verbal cues. The 3D representations of gestures are captured in EVA Corpus, and then stored as a repository of motor skills in the form of expressively tunable templates.

Research paper thumbnail of Finite-state machine based distributed framework DATA for intelligent ambience systems

Research paper thumbnail of Platform for flexible integration of multimodal technologies into web application domain

Research paper thumbnail of Form-Oriented Annotation for Building a Functionally Independent Dictionary of Synthetic Movement

Lecture Notes in Computer Science, 2012

Research paper thumbnail of A new fuzzy unit selection cost function optimized by relaxed gradient descent algorithm

Expert Systems With Applications, Nov 1, 2020

Research paper thumbnail of Towards ECA’s Animation of Expressive Complex Behaviour

Lecture Notes in Computer Science, 2011

... robotics.com 2 Faculty of Electrical Engineering and Computer Science, University of Maribor,... more ... robotics.com 2 Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova ulica 17, Slovenia matej.rojc@uni-mb.si ... 8] is one of the frameworks that incorporate various input modalities that the user can utilize to simulate natural hu-man-machine ...

Research paper thumbnail of Multilingual Chatbots to Collect Patient-Reported Outcomes

IntechOpen eBooks, Jul 7, 2023

Research paper thumbnail of An End-to-End Framework for Extracting Observable Cues of Depression from Diary Recordings (Preprint)

BACKGROUND Due to the prevalence of depression, its often chronic course, re-occurrences, and ass... more BACKGROUND Due to the prevalence of depression, its often chronic course, re-occurrences, and associated disability, early detection and non-intrusive monitoring present a crucial tool in timely diagnosing and treatment, remission of depression, prevention of relapse, and therefore limiting its impact on quality of life and well-being. Existing successful attempts in exploiting artificial intelligence for early classification of depression are mostly data-driven and thus non-transparent and lacking effective means to deal with uncertainties. OBJECTIVE We present an approach towards designing an explainable, knowledge-based artificial intelligence for classification of symptoms of depression. The aim of the study was to define and evaluate an end-to-end framework for extracting observable depression cues from diary recordings, and to evaluate the framework and explore the potential of the pipeline to present a feasible solution for detecting symptoms of depression automatically, using observable behavior cues. METHODS First, we defined an end-to-end framework for extracting depression cues (i.e., facial, speech, and language features), and stored them as a digital patient resource (i.e., the Fast Healthcare Interoperability Resource). Second, we extracted these cues from 28 video recordings from SymptomMedia dataset (14 simulating a variety of diagnoses of depression, and 14 simulating other mental health-related diagnoses), and 27 recordings from DAIC-WOZ dataset (12 classified as having moderate or severe symptoms of depression, and 15 without any depressive symptoms), and compared the presence of the extracted features between recordings of individuals with depressive disorder and those without. RESULTS We identified several cues consistent with previous studies in regard to their distinction between individuals with and without depressive disorders through both datasets among language (i.e., use of first-person singular pronouns, use of negatively valanced words, explicit mentions of treatment of depression, some features of language complexity), speech (i.e., speaking rate, voiced speech and pauses, low articulation rate, monotonous speech), and facial cues (i.e., rotational energy of heat movements. Other defined cues require further research. CONCLUSIONS The nature/context of the discourse, the impact of other disorders and physical/psychological stress as well as quality and resolution of recordings play an important role in alignment of digital features with relevant background. The work presented in this paper provides a novel approach for the extraction of wide array of cues, relevant for depression classification, and opens up new opportunities for further research.

Research paper thumbnail of A New Unit Selection Optimisation Algorithm for Corpus-Based TTS Systems Using the RBF-Based Data Compression Technique

Research paper thumbnail of A speech-based distributed architecture platform for an intelligent ambience

Computers & Electrical Engineering, Oct 1, 2018

Research paper thumbnail of Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language

Language Resources and Evaluation, Aug 3, 2007

Research paper thumbnail of Recreation of spontaneous non-verbal behavior on a synthetic agent EVA

International Conference on Artificial Intelligence, Feb 22, 2012

Research paper thumbnail of A New Distributed Platform for Client-Side Fusion of Web Applications and Natural Modalities—A Multimodal Web Platform

Applied Artificial Intelligence, Aug 9, 2013

Research paper thumbnail of Multilingual and Multimodal Corpus-Based Text-to-Speech System - PLATTOS

InTech eBooks, Jun 21, 2011

Research paper thumbnail of Deliverable D4.3: Alpha version of the sensing network

Zenodo (CERN European Organization for Nuclear Research), Feb 26, 2021

Research paper thumbnail of Coverbal Synchrony in Human-Machine Interaction

CRC Press eBooks, Oct 25, 2013

ABSTRACT Embodied conversational agents (ECA) and speech-based human–machine interfaces can toget... more ABSTRACT Embodied conversational agents (ECA) and speech-based human–machine interfaces can together represent more advanced and more natural human–machine interaction. Fusion of both topics is a challenging agenda in research and production spheres. The important goal of human–machine interfaces is to provide content or functionality in the form of a dialog resembling face-to-face conversations. All natural interfaces strive to exploit and use different communication strategies that provide additional meaning to the content, whether they are human–machine interfaces for controlling an application or different ECA-based human–machine interfaces directly simulating face-to-face conversation.

Research paper thumbnail of A Distributed Architecture for Real-Time Dialogue and On-Task Learning of Efficient Co-Operative Turn-Taking

CRC Press eBooks, Oct 16, 2013

Research paper thumbnail of From Annotation to Multimodal Behavior

Coverbal Synchrony in Human-Machine Interaction, 2013

Research paper thumbnail of Representation of large lexica using finite-state transducers for the multilingual text-to-speech synthesis systems

Research paper thumbnail of Developing Multimodal Web Interfaces by Encapsulating Their Content and Functionality within a Multimodal Shell

Lecture Notes in Computer Science, 2011

Research paper thumbnail of Development of a Repository of Virtual 3D Conversational Gestures and Expressions

Lecture notes in electrical engineering, 2019

This paper outlines a novel framework that has been designed to create a repository of “gestures”... more This paper outlines a novel framework that has been designed to create a repository of “gestures” for embodied conversational agents. By utilizing it, the virtual agents can sculpt conversational expressions incorporating both verbal and non-verbal cues. The 3D representations of gestures are captured in EVA Corpus, and then stored as a repository of motor skills in the form of expressively tunable templates.

Research paper thumbnail of Finite-state machine based distributed framework DATA for intelligent ambience systems

Research paper thumbnail of Platform for flexible integration of multimodal technologies into web application domain

Research paper thumbnail of Form-Oriented Annotation for Building a Functionally Independent Dictionary of Synthetic Movement

Lecture Notes in Computer Science, 2012

Research paper thumbnail of A new fuzzy unit selection cost function optimized by relaxed gradient descent algorithm

Expert Systems With Applications, Nov 1, 2020

Research paper thumbnail of Towards ECA’s Animation of Expressive Complex Behaviour

Lecture Notes in Computer Science, 2011

... robotics.com 2 Faculty of Electrical Engineering and Computer Science, University of Maribor,... more ... robotics.com 2 Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova ulica 17, Slovenia matej.rojc@uni-mb.si ... 8] is one of the frameworks that incorporate various input modalities that the user can utilize to simulate natural hu-man-machine ...

Research paper thumbnail of Multilingual Chatbots to Collect Patient-Reported Outcomes

IntechOpen eBooks, Jul 7, 2023

Research paper thumbnail of An End-to-End Framework for Extracting Observable Cues of Depression from Diary Recordings (Preprint)

BACKGROUND Due to the prevalence of depression, its often chronic course, re-occurrences, and ass... more BACKGROUND Due to the prevalence of depression, its often chronic course, re-occurrences, and associated disability, early detection and non-intrusive monitoring present a crucial tool in timely diagnosing and treatment, remission of depression, prevention of relapse, and therefore limiting its impact on quality of life and well-being. Existing successful attempts in exploiting artificial intelligence for early classification of depression are mostly data-driven and thus non-transparent and lacking effective means to deal with uncertainties. OBJECTIVE We present an approach towards designing an explainable, knowledge-based artificial intelligence for classification of symptoms of depression. The aim of the study was to define and evaluate an end-to-end framework for extracting observable depression cues from diary recordings, and to evaluate the framework and explore the potential of the pipeline to present a feasible solution for detecting symptoms of depression automatically, using observable behavior cues. METHODS First, we defined an end-to-end framework for extracting depression cues (i.e., facial, speech, and language features), and stored them as a digital patient resource (i.e., the Fast Healthcare Interoperability Resource). Second, we extracted these cues from 28 video recordings from SymptomMedia dataset (14 simulating a variety of diagnoses of depression, and 14 simulating other mental health-related diagnoses), and 27 recordings from DAIC-WOZ dataset (12 classified as having moderate or severe symptoms of depression, and 15 without any depressive symptoms), and compared the presence of the extracted features between recordings of individuals with depressive disorder and those without. RESULTS We identified several cues consistent with previous studies in regard to their distinction between individuals with and without depressive disorders through both datasets among language (i.e., use of first-person singular pronouns, use of negatively valanced words, explicit mentions of treatment of depression, some features of language complexity), speech (i.e., speaking rate, voiced speech and pauses, low articulation rate, monotonous speech), and facial cues (i.e., rotational energy of heat movements. Other defined cues require further research. CONCLUSIONS The nature/context of the discourse, the impact of other disorders and physical/psychological stress as well as quality and resolution of recordings play an important role in alignment of digital features with relevant background. The work presented in this paper provides a novel approach for the extraction of wide array of cues, relevant for depression classification, and opens up new opportunities for further research.

Research paper thumbnail of A New Unit Selection Optimisation Algorithm for Corpus-Based TTS Systems Using the RBF-Based Data Compression Technique

Research paper thumbnail of A speech-based distributed architecture platform for an intelligent ambience

Computers & Electrical Engineering, Oct 1, 2018

Research paper thumbnail of Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language

Language Resources and Evaluation, Aug 3, 2007

Research paper thumbnail of Recreation of spontaneous non-verbal behavior on a synthetic agent EVA

International Conference on Artificial Intelligence, Feb 22, 2012

Research paper thumbnail of A New Distributed Platform for Client-Side Fusion of Web Applications and Natural Modalities—A Multimodal Web Platform

Applied Artificial Intelligence, Aug 9, 2013

Research paper thumbnail of Multilingual and Multimodal Corpus-Based Text-to-Speech System - PLATTOS

InTech eBooks, Jun 21, 2011

Research paper thumbnail of Deliverable D4.3: Alpha version of the sensing network

Zenodo (CERN European Organization for Nuclear Research), Feb 26, 2021

Research paper thumbnail of Towards Visual and Auditory Representation of Information for the Next Generation of Conversational Interfaces

In order to engage with a human user on more personal level, natural HCI is starting to virtualiz... more In order to engage with a human user on more personal level, natural HCI is starting to virtualize itself and is utilizing the potential of entities resembling human collocutors in interaction. In particular through human-likeness, these entities represent the multimodal interaction models, which are capable to adapt to user’s context and to facilitate context of the conversational situation not only via modalities like speech, but also through visual representation of information, and social cues, e.g. of oneself (personality), emotion and attitude. Thus, the extension of conversational interfaces (CAs) with embodied conversational agents (ECA) seems only natural. In this chapter we would like to outline highly modular framework for the generation and realization of multimodal natural communication incorporating embodied conversational agents, male or female ones. The personification of the machine responses is achieved through automatic generation and visualization of co-verbal behavior, based on the behavior generation engine, and the physical realization via behavior realization engines based on Unity Game engine’s core.

Research paper thumbnail of Coverbal synchrony in human-machine interaction

This book provides novel insights into the research, development, and designing of advanced techn... more This book provides novel insights into the research, development, and designing of advanced techniques and methods for more natural multimodal human-machine interfaces for use within desktop and pervasive computing environments. The book consists of 15 chapters structured so as to provide an in-depth investigation of novel approaches from both the theoretical and practical perspective. Humans tend to interact using several modalities and communication channels in a highly complex, yet synergetic manner.

Research paper thumbnail of An Expressive Conversational-Behavior Generation Model for Advanced Interaction within Multimodal User Interfaces

The aim of the book is to represent a flexible and efficient algorithm and a novel system used fo... more The aim of the book is to represent a flexible and efficient algorithm and a novel system used for the planning, generation, and realization of conversational behavior (co-verbal behavior). Such behavior is best described as a set of moving body parts, which are meaningful. In terms of prosody, it is synchronized with the accompanying speech. The movement and shapes generated as a co-verbal behavior represent a contextual link between a repertoire of independent motor skills (shapes, movements, and poses that conversational agent can reproduce and execute), and the intent/meaning of spoken sequences (context). The actual intent/meaning of spoken content is identified through language-dependent linguistic markers and prosody. The knowledge databases used to determine the intent/meaning of text are based on the linguistic analysis and classification of the text into semiotic classes and subclasses achieved through annotation of multimodal corpora based on the proposed EVA annotation scheme. The scheme allows for capturing features at a functional (context-dependent), as well as at a descriptive (context-independent) level. The functional level captures high-level features that describe the correlation between speech and co-verbal behavior, whereas the descriptive level allows us to capture and define body-poses and shapes independently of verbal content and in high-resolution. The annotation scheme, therefore, not only interlinks speech and gesture at a semiotic level, but also serves as a basis for the creation of a context independent repertoire of movement and shapes.
The process of generating the co-verbal behavior is, in this book, divided into two phases. The first phase deals with the classification of intent and its synchronization with the verbal content and prosody. The second phase then transforms the planned and synchronized behavior into a co-verbal animation performed by an embodied conversational agent (ECA). In order to be able to extrapolate intent from arbitrary text-sequences, the algorithm for the formulation of behavior deduces meaning/intent in regard to the semiotic intent. Furthermore, the algorithm considers the linguistic features of arbitrary and un-annotated text and select primitive gestures based on semiotic nuclei, as identified by semiotic classification and further modeled by the predicted prosodic features of speech to be generated by a general text-to-speech system (TTS). The output of the phase for formulation of behavior is represented as a hierarchical procedure encoded in XML format, and as a speech sequence generated by TTS. The procedural description is event-oriented and represents a well-defined structure of consecutive movements of body-parts, as well as of body-parts moving in parallel. The second phase of the novel architecture transforms the procedural descriptions into a series of coherent animations of individual parts of the articulated embodied conversational agent. In this regard a novel ECA-based realization framework named EVA-framework is also represented. It supports a real-time realization of procedural animation descriptions and plans on multi-part mesh-based models, by using skeletal animation, blend shape animation, and the animation of predefined (pre-recorded) animated segments. This book, therefore, considers a complete design and implementation of an expressive model for the generation of co-verbal behavior, which is able to transform un-annotated text into a speech-synchronized series of animated sequences.

Research paper thumbnail of Meettell 1.0.4: An advanced new system for managing discussions on meetings, conferences and other large events.

Version 1.0.4 released. Version 1.0.4 of Meettell mobile application features the moderator funct... more Version 1.0.4 released.
Version 1.0.4 of Meettell mobile application features the moderator functionality. By entering the moderator session code any participant can become a moderator of a discussion and can efficiently manage the discussion with his/her mobile phone. The Meettell mobile application now supports all the available discussion management functionalities supported by the Meettell session server.

Research paper thumbnail of Meettell: An advanced new system for managing discussions on meetings, conferences and other large events.

The Meettell system consists of a Meettell session server and mobile phone clients. The Meettell ... more The Meettell system consists of a Meettell session server and mobile phone clients. The Meettell session server offers several functionalities for efficiently managing a meeting, for controlling discussion, for enabling participants to register for the meeting, and to participate in the discussions using the audio system of the meeting through their mobile phones.
Meettell allows for efficient management of discussions at any meeting you may have. It brings new quality to meetings regarding the management of discussion and enables efficient control of all the discussion elements using participants smart phones.