Nikolaos Mavridis | New York University Abu Dhabi (original) (raw)

Papers by Nikolaos Mavridis

2015 International Conference on Advanced Robotics (ICAR), 2015

Compositionality is a property of natural language which is of prime importance: It enables human... more Compositionality is a property of natural language which is of prime importance: It enables humans to form and conceptualize potentially novel and complex ideas, by combining words. On the other hand, the symbol grounding problem examines the way meaning is anchored to entities external to language, such as sensory percepts and sensory-motor routines. In this paper we aim towards the exploration of the intersection of compositionality and symbol grounding. We thus propose a methodology for constructing empirically derived models of grounded meaning, which afford composition of grounded semantics. We illustrate our methodology for the case of adjectival modifiers. Grounded models of adjectively modified and unmodified colors are acquired through a specially designed procedure with 134 participants, and then computational models of the modifiers “dark” and “light” are derived. The generalization ability of these learnt models is quantitatively evaluated, and their usage is demonstrated in a real-world physical humanoid robot. We regard this as an important step towards extending empirical approaches for symbol grounding so that they can accommodate compositionality: a necessary step towards the deep understanding of natural language for situated embodied agents, such as sensor-enabled ambient intelligence and interactive robots.

Traditional Artificial Cognitive Systems (for example, intelligent robots) share a number of c... more Traditional Artificial Cognitive Systems (for example, intelligent robots) share a number of common limitations. First, they are usually made up only of machine components; humans are only playing the role of user or supervisor. And yet, there are tasks in which the current state of the art of AI has much worse performance or is more expensive than humans: thus, would be highly beneficial to have a systematic way of creating systems with both human and machine components, possibly with remote non-expert humans providing snippets of some seconds of their capacities in real-time. Second, their components are specific and dedicated to one and only one system, and are often underutilized for significant fractions of their lifetime. Third, there is no inherent support for robust, fault-tolerant operation, and if a new component becomes available, with better performance and/or cheaper cost, one cannot easily replace the old component. Fourth, and quite
importantly in terms of their economics, they are viewed as a resource that needs to be developed and owned, not as a utility; i.e. not as a service provided on demand.
Motivated by the above state of affairs, in this paper we are presenting CLIC: a framework for constructing cognitive systems that overcome the above mentioned limitations. With the four-layer software architecture of CLIC, we provide specific yet extensible mechanisms that enable the creation and operation of distributed cognitive systems that fulfill the following desiderata: First, that are distributed yet situated, interacting with the physical world though sensing and actuation services, and that are also combining services provided by humans as well as services implemented by machines. Second, that are made up of components that are timeshared and re-usable across systems. Third,that provide increased robustness through self-repair mechanisms. Fourth, that are constructed and reconstructed on the fly, with components that dynamically enter and exit the system, while the system is in operation, on the basis of availability, and pricing, and need. Quite importantly, fifth, the cognitive systems created and operated by CLIC do not need to be owned and can be provided on demand, as a utility – thus transforming human-machine situated intelligence to a service, and opening up numerous interesting research directions and application opportunities.

Although the theory of semiotics arguably has ancient beginnings and came to the forefront with t... more Although the theory of semiotics arguably has ancient beginnings and came to the forefront with the seminal work of Pierce in the 20th century, and despite the growth of social media and the direct relevance of semiotics, no framework has so far been provided, which not only enables the re-examination of social content and tagging under the light of semiotics, but can also be used to analyze data mining and clustering algorithms utilized on social data. We provide the motivation and the outline of such a framework in the paper, and demonstrate how it can be applied not only in order to analyze specific algorithms, but also in order to structure the general space of potential algorithms for clustering data derived from social media.

Compositionality is widely accepted as a fundamental principle in linguistics and is also acknowl... more Compositionality is widely accepted as a fundamental principle in linguistics and is also acknowledged as a key cog-nitive capacity. However, despite the prime importance of compositionality towards explaining the nature of meaning and concepts in cognition, and despite the need for computational models which are able to process the composition of grounded meaning there is little existing research. Thus, we aim to create computational models that concern the semantic composition of grounded meaning, that can be applied to embodied intelligent agents (such as cognitive robots), in order to make them capable of creating and processing grounded perceptual-semantic associations, and most importantly their compositions, taking into account syntactic, pragmatic as well as semantic considerations. Here we focus on an introduction to the problem, while then we review related work across multiple directions. Finally we propose a set of concrete desiderata that a computational theory of grounded semantic composition for embodied agents should satisfy, thus paving a clear avenue for the next steps towards the wider application of grounded semantics in intelligent embodied entities.

Spatial interactions between agents (humans, animals, or machines) carry information of high valu... more Spatial interactions between agents (humans, animals, or machines) carry information of high value to human or electronic observers. However, not all the information contained in a pair of continuous trajectories is important and thus the need for qualitative descriptions of interaction trajectories arises. The Qualitative Trajectory Calculus (QTC) (Van de Weghe, 2004) is a promising development towards this goal. Numerous variants of QTC have been proposed in the past and QTC has been applied towards analyzing various interaction domains. However, an inherent limitation of those QTC variations that deal with lateral movements is that they are limited to two-dimensional motion; therefore, complex three-dimensional interactions, such as those occurring between flying planes or birds, cannot be captured. Towards that purpose, in this paper QTC 3D is presented: a novel qualitative trajectory calculus that can deal with full three-dimensional interactions. QTC 3D is based on transformations of the Frenet–Serret frames accompanying the trajectories of the moving objects. Apart from the theoretical exposition, including definition and properties, as well as computational aspects, we also present an application of QTC 3D towards modeling bird flight. Thus, the power of QTC is now extended to the full dimensionality of physical space, enabling succinct yet rich representations of spatial interactions between agents.

Spatial interactions between agents carry information of high value to human observers, as exempl... more Spatial interactions between agents carry information of high value to human observers, as exemplified by the high-level interpretations that humans make when watching the Heider and Simmel movie, or other such videos which just contain motions of simple objects, such as points, lines and triangles. However, not all the information contained in a pair of continuous tra-jectories is important; and thus the need for qualitative descriptions of interaction trajectories arises. Towards that purpose, Qualitative Trajectory Calculus (QTC) has been proposed in (Van de Weghe 2004). However , the original definition of QTC handles uncorrupted continuous-time trajectories, while real-world signals are noisy and sampled in discrete-time. Also, although QTC presents a method for transforming trajectories to qualitative descriptions, the inverse problem has not yet been studied. Thus, in this paper, after discussing several aspects of the transition from ideal QTC to discrete-time noisy QTC, we introduce a novel algorithm for solving the QTC inverse problem; i.e. transforming qualitative descriptions to archetypal trajectories that satisfy them. Both of these problems are particularly important for the successful application of qualitative trajectory calculus to Human-Robot Interaction.

In this paper, an overview of human–robot interactive communication is presented, covering verbal... more In this paper, an overview of human–robot interactive communication is presented, covering verbal as well as non-verbal aspects. Following a historical introduction, and motivation towards fluid human–robot communication, ten desiderata are proposed, which provide an organizational axis both of recent as well as of future research on human–robot communication. Then, the ten desiderata are examined in detail, culminating in a unifying discussion, and a forward-looking conclusion.

Electronic government or e-government project failure has been widely discussed in the literature... more Electronic government or e-government project failure has been widely discussed in the literature. Some of the common reasons cited for project failure are design-reality gaps, ineffective project management and unrealistic planning. Research shows that more than half of e-government projects result in total or partial failures with regard to the initially grounded standards, scheduling or budgeting plans, while even more fail to meet end users' expectations. This paper focuses on the factors that lead to e-government project failures. It explores the context of project failure and investigates the launch of the U.S. Healthcare.gov website. This case is concerned with a highly public e-government project failure where gaps between political agendas and planning are identified through an examination of media sources and social media data analysis of Twitter discussions. The finding of the analysis indicates that e-government users react against failures, while e-government projects will impact and attract opinion makers' attention that influence audience behavior. This research provides classifications of e-government project failure reasons and sources. Moreover, another contribution is the beginnings of a typology for social media activity against e-government project failures.

Elsevier Journal of Robotics and Autonomous Systems, Volume 62, Issue 2, February 2014, Pages 241-256

Recent advances in computer vision on the one hand, and imaging technologies on the other hand, h... more Recent advances in computer vision on the one hand, and imaging technologies on the other hand, have opened up a number of interesting possibilities for robust 3D scene labeling. This paper presents contributions in several directions to improve the state-of-the-art in RGB-D scene labeling. First, we present a novel combination of depth and color features to recognize different object categories in isolation. Then, we use a context model that exploits detection results of other objects in the scene to jointly optimize labels of co-occurring objects in the scene. Finally, we investigate the use of social media mining to develop the context model, and provide an investigation of its convergence. We perform thorough experimentation on both the publicly available RGB-D Dataset from the University of Washington as well as on the NYU scene dataset. An analysis of the results shows interesting insights about contextual object category recognition, and its benefits.

Elsevier Journal of Robotics and Autonomous Systems, Volume 61, Issue 6, June 2013, Pages 580–592

Marker-based multi-camera optical tracking systems are being used in the robotics field to track ... more Marker-based multi-camera optical tracking systems are being used in the robotics field to track robots for validation, verification and calibration of their kinematic and dynamic models. These tracking systems estimate the pose of tracking bodies attached to objects within a tracking volume. In this work we explore the case of tracking the origins of joints of articulated robots when the tracking bodies are mounted on limbs or structures relative to the joints. This configuration leads to an unknown relative pose between the tracking body and the joint origin. The identification of this relative pose is essential for an accurate representation of the kinematic model. We propose an approach for the identification of the origin of joints relative to tracking bodies by using state-of-the-art Center of Rotation (CoR) and Axes of Ro- tation (AoR) estimation methods. The applicability and effectiveness of our approach is demonstrated in two successful case studies: (i) the verification of the upper body kinematics of DLR’s humanoid rollin’ Justin and (ii) the identification of the kinematic parameters of an ST Robot arm relative to it’s environment for the embodiment of a situated conversational assistant.

Many researches have focused on parking demand to gain information for traffic management recomme... more Many researches have focused on parking demand to gain information for traffic management recommendations and decision-making where real-world car park statistics is of great importance. This paper seeks to obtain one-day long statistical analysis of a multipurpose off-street parking space in downtown Abu Dhabi, using a single-camera vacancy detection system. The proposed methodology to collect one-day long statistics uses pattern recognition to determine occupancy states based on visual features extracted from parking spots. This vacancy detection system has two major advantages. First, it relies on only few pixels compared with other methods, being able to cover more than 150 parking spots within a single camera frame. Second, the system works well in both nighttime and daytime – robust to changing light conditions. The accuracy is 99.9% for occupied spots and 97.9% for empty spots for this period of study. This study also proposes a better indication of parking demand when the park is near its full capacity, as the utilization rate does not capture the parking demand from the motorists who fail to find parking spaces.

… Robots and Systems, 2006 IEEE/RSJ …, Jan 1, 2006

Our long-term objective is to develop robots that engage in natural language-mediated cooperative... more Our long-term objective is to develop robots that engage in natural language-mediated cooperative tasks with humans. To support this goal, we are developing an amodal representation and associated processes which is called a grounded situation model (GSM). We are also developing a modular architecture in which the GSM resides in a centrally located module, around which there are language, perception, and action-related modules. The GSM acts as a sensor-updated ”structured blackboard”, that serves as a workspace with contents similar to a ”theatrical stage” in the robot’s ”mind”, which might be filled in with present, past or imagined situations. Two main desiderata drive the design of the GSM: first, ”parsing” situations into ontological types and relations that reflect human language semantics, and second, allowing bidirectional translation between sensory-derived data/expectations and linguistic descriptions. We present an implemented system that allows of a range of conversational and assistive behavior by a manipulator robot. The robot updates beliefs (held in the GSM) about its physical environment, the human user, and itself, based on a mixture of linguistic, visual and proprioceptive evidence. It can answer basic questions about the present or past and also perform actions through verbal interaction. Most importantly, a novel contribution of our approach is the robot’s ability for seamless integration of both language- and sensor-derived information about the situation: For example, the system can acquire parts of situations either by seeing them or by “imagining” them through descriptions given by the user: “There is a red ball at the left”. These situations can later be used to create mental imagery and sensory expectations, thus enabling the aforementioned bidirectionality.

IEEE Transactions on Systems, Man, and Cybernetics Part B, Vol. 34(3), pp. 1374-1383, Jan 1, 2004

To build robots that engage in fluid face-to-face spoken conversations with people, robots must h... more To build robots that engage in fluid face-to-face spoken conversations with people, robots must have ways to connect what they say to what they see. A critical aspect of how language connects to vision is that language encodes points of view. The meaning of my left and your left differs due to an implied shift of visual perspective. The connection of language to vision also relies on object permanence. We can talk about things that are not in view. For a robot to participate in situated spoken dialog, it must have the capacity to imagine shifts of perspective, and it must maintain object permanence. We present a set of representations and procedures that enable a robotic manipulator to maintain a "mental model" of its physical environment by coupling active vision to physical simulation. Within this model, "imagined" views can be generated from arbitrary perspectives, providing the basis for situated language comprehension and production. An initial application of mental imagery for spatial language understanding for an interactive robot is described.

— One of the main barriers preventing widespread use of formal methods is the elicitation of form... more — One of the main barriers preventing widespread use of formal methods is the elicitation of formal specifications. Formal specifications facilitate the testing and verification process for safety critical robotic systems. However, handling the intricacies of formal languages is difficult and requires a high level of expertise in formal logics that many system developers do not have. In this work, we present a graphical tool designed for the development and visualization of formal specifications by people that do not have training in formal logic. The tool enables users to develop specifications using a graphical formalism which is then automatically translated to Metric Temporal Logic (MTL). In order to evaluate the effectiveness of our tool, we have also designed and conducted a usability study with cohorts from the academic student community and industry. Our results indicate that both groups were able to define formal requirements with high levels of accuracy. Finally, we present applications of our tool for defining specifications for operation of robotic surgery and autonomous quadcopter safe operation.

A Situated Conversational Assistant (SCA) is any system with sensing, acting and speech abilities... more A Situated Conversational Assistant (SCA) is any system with sensing, acting and speech abilities, which engages in physically situated natural language conversation with human partners and assists them in tasks. Towards such assistants, a computational model of embodied agents is presented, which produces systems that are capable of a core set of situated natural language skills, and which provides concrete leverage for numerous extensions. The central idea is to endow agents with a sensor-updated set of structures and processes called a Grounded Situation Model (GSM), which is closely related to the cognitive psychology notion of situation models. The GSM contains descriptions of physical & mental aspects of past, current, or imagined situations, enabling bidirectional translation between linguistic descriptions and perceptual data/expectations. The power of the GSM proposal is demonstrated through the real-world example of a manipulator robot with speech and vision, with abilities comparable to those required by a normally- developing child in order to pass the Token Test, a standard psychological test for three-year old children.

Springer International Journal of Social Robotics, November 2012, Volume 4 (1S), pp. 5-18

Although tele-operation has a long history, when it comes to tuning, comparison, and evaluation o... more Although tele-operation has a long history, when it comes to tuning, comparison, and evaluation of tele-operation systems, no standard framework exists which can fulfill desiderata such as: concisely modeling multiple aspects of the system as a whole, i.e. timing, accuracy, and event transitions, while also providing for separation of user-, feedback-, as well as learning-dependent components. On the other hand, real-time remote tele-operation of robotic arms, either industrial or humanoid, is highly suitable for a number of applications, especially in difficult or inaccessible environment, and thus such an evaluation framework would be desirable. Usually, teleoperation is driven by buttons, joysticks, haptic controllers, or slave-arms, providing an interface which can be quite cumbersome and unnatural, especially when operating robots with multiple degrees of freedom. Thus, in thus paper, we present a two-fold contribution: (a) a task-based teleoperation evaluation framework which can achieve the desiderata described above, as well as (b) a system for teleoperation of an industrial arm commanded through human-arm motion capture, which is used as a case study, and also serves to illustrate the effectiveness of the evaluation framework that we are introducing. In our system the desired trajectory of a remote robotic arm is easily and naturally controlled through imitation of simple movements of the operator’s physical arm, obtained through motion capture. Furthermore, an extensive real-world evaluation is provided, based on our proposed probabilistic framework, which contains an inter-subject quantitative study with 23 subjects, a longitudinal study with 6 subjects, as well as opinions and attitudes towards tele-operation study. The results provided illustrate the strengths of the proposed evaluation framework—by enabling the quick production of multiple task-, user-, system-, as well as learning-centric results, as well as the benefits of our natural imitation-based approach towards teleoperation. Furthermore, an interesting ordering of preferences towards different potential application areas of teleoperation is indicated by our data. Finally, after illustrating their effectiveness, we discuss how both our evaluation framework as well as teleoperation system presented are not only applicable in a wide variety of teleoperation domains, but are also directly extensible in many beneficial ways.

—Joystick-based teleoperation is a dominant method for remotely controlling various types of robo... more —Joystick-based teleoperation is a dominant method for remotely controlling various types of robots, such as excavators , cranes, and space telerobotics. Our ultimate goal is to create effective methods for training and assessing human operators of joystick-controlled robots. Towards that goal, an extensive study consisting of a total of 38 experimental subjects on both simulated as well as a physical robot, using either no feedback or auditory feedback, has been performed. In this paper, we present the complete experimental setup and we report only on the 18 experimental subjects teleoperating the simulated robot. Multiple observables were recorded, including not only joystick and robot angles and timings, but also subjective measures of difficulty, personality and usability data, and automated analysis of facial expressions and blink rate of the subjects. Our initial results indicate that: First, that the subjective difficulty of teleoperation with auditory feedback has smaller variance as compared to teleoperation without feedback. Second, that the subjective difficulty of a task is linearly related with the logarithm of task completion time. Third, we introduce two important indicators of operator performance, namely the Average Velocity of Robot Joints (AVRJ), and the Correct-to-Wrong-Joystick Direction Ratio (CWJR), and we show how these relate to accumulated user experience and with task time. We conclude with a forward-looking discussion including future steps.

—Joystick-based teleoperation is a dominant method for remotely controlling various types of robo... more —Joystick-based teleoperation is a dominant method for remotely controlling various types of robots, such as excavators , cranes, and space telerobotics. Our ultimate goal is to create effective methods for training and assessing human operators of joystick-controlled robots. Towards that goal, in this paper we present an extensive study consisting of 18 experimental subjects controlling a simulated robot, using either no feedback or auditory feedback. Multiple observables were recorded, including not only joystick and robot angles and timings, but also subjective measures of difficulty, personality and usability data, and automated analysis of facial expressions and blink rate of the subjects. Our initial results indicate that: First, that the subjective difficulty of teleoperation with auditory feedback has smaller variance as compared to teleoperation without feedback, and second, that the subjective difficulty of a task is linearly related with the logarithm of task completion time. We conclude with a forward-looking discussion including future steps.

Springer International Journal of Artificial Life and Robotics, full-length paper, accepted, Vol. 18, 2013

Robot arms driven by bi-articular and mono-articular muscles have numerous advantages. If one mus... more Robot arms driven by bi-articular and mono-articular muscles have numerous advantages. If one muscle is broken, the functionality of the arm is not influenced. In addition, each joint torque is distributed to numerous muscles, and thus the load of each muscle can be relatively small. This paper addresses the problem of muscle control for this kind of robot arm. A relatively mature control method (i.e. sliding mode control) was chosen to get joint torque first and then the joint torque was distributed to muscle forces. The muscle force was computed based on a Jacobian matrix between joint torque space and muscle force space. In addition, internal forces were used to optimize the computed muscle forces in the following manner: not only to make sure that each muscle force is in its force boundary, but also to make the muscles work in the middle of their working range, which is considered best in terms of fatigue. Besides, all the dynamic parameters were updated in real-time. Compared with previous work, a novel method was proposed to use prediction error to accelerate the convergence speed of parameter. We empirically evaluated our method for the case of bending-stretching movements. The results clearly illustrate the effectiveness of our method towards achieving the desired kinetic as well as load distribution characteristics.

2015 International Conference on Advanced Robotics (ICAR), 2015

Elsevier Journal of Robotics and Autonomous Systems, Volume 62, Issue 2, February 2014, Pages 241-256

Elsevier Journal of Robotics and Autonomous Systems, Volume 61, Issue 6, June 2013, Pages 580–592

… Robots and Systems, 2006 IEEE/RSJ …, Jan 1, 2006

IEEE Transactions on Systems, Man, and Cybernetics Part B, Vol. 34(3), pp. 1374-1383, Jan 1, 2004

Springer International Journal of Social Robotics, November 2012, Volume 4 (1S), pp. 5-18

—Joystick-based teleoperation is a dominant method for remotely controlling various types of robo... more —Joystick-based teleoperation is a dominant method for remotely controlling various types of robots, such as excavators , cranes, and space telerobotics. Our ultimate goal is to create effective methods for training and assessing human operators of joystick-controlled robots. Towards that goal, in this paper we present an extensive study consisting of 18 experimental subjects controlling a simulated robot, using either no feedback or auditory feedback. Multiple observables were recorded, including not only joystick and robot angles and timings, but also subjective measures of difficulty, personality and usability data, and automated analysis of facial expressions and blink rate of the subjects. Our initial results indicate that: First, that the subjective difficulty of teleoperation with auditory feedback has smaller variance as compared to teleoperation without feedback, and second, that the subjective difficulty of a task is linearly related with the logarithm of task completion time. We conclude with a forward-looking discussion including future steps.

Springer International Journal of Artificial Life and Robotics, full-length paper, accepted, Vol. 18, 2013