When the Inference Meets the Explicitness or Why Multimodality can Make us Forget About the Perfect Predictor (original) (raw)

1 Introduction

Since the term robot appeared more than a century ago in the play R.U.R. (Rossum Universal Robots), these automatons have evolved from the ideas of science fiction novels to a palpable reality. In the beginning, they performed small, routine and repetitive tasks with the aim of relieving humans of this burden. As technology has progressed and more precise and computationally powerful hardware has become available, these robots have been able to carry out more complex tasks such as navigating an urban environment [1] or choosing the right tool [2]. At the same time, no longer being caged in controlled environments, robots have begun to explore the world around them.

This last change has led to robots starting to interact with humans, giving rise to the discipline of Human-Robot Interaction (HRI) and the need to model human behaviour [3, 4]. In order to make this interaction as safe, useful and similar to how two humans interact as possible, both theoretical models [5] of how we humans behave and multiple predictors of our next actions have been developed to try to infer human intention with increasing success rates [6, 7]. While their non-negligible error rates are often blamed on limitations in computational capacity or a lack of sufficient data for training, the fact that humans can model the same information we perceive in multiple ways making it so that two humans can represent the same environment differently begs the question of whether we will ever have a perfect predictor.

This very question has caused some authors [8,9,10,11] to consider it necessary to combine inference engines with communication systems that make it possible to obtain the human’s intention explicitly. In this way, they hope to achieve robots that do not just look like humans, but act like humans, displaying one of the many typically human behaviours such as asking and requesting information when the one they have does not allow them to make decisions with sufficient confidence [12].

Thus, this work arises as a continuation of our previous work [11] where in a collaborative transportation task we first confronted on the one hand an inference system to obtain the human’s intention from the instantaneous force they are exerting against, on the other hand, a button-based communication system to allow the human to express their intention explicitly. That work proved that allowing humans to directly express themselves can achieve the same improvement in multiple aspects of an effective HRI as using an intention predictor. However, it also left unanswered questions, such as what happens if both systems are combined, and postulated that we should not continue looking for technical improvements in the predictor used, since these may go unnoticed by the human, but that we should pivot towards methods that seek to improve human-robot communication, although without demonstrating these assertions.

Thus, this work is motivated by the aim of addressing research questions such as: To what extent can humans perceive technical improvements in the success rate of a predictor? (RQ1), or, if a choice must be made, do humans prefer a more robust or a more natural method of communication with a robot?. (RQ2). Or, building on the findings of our previous work, do humans prefer a robot to independently predict their behaviour, or would they rather prefer to communicate their intentions directly? (RQ3). To address these questions, two intention prediction algorithms and two explicit communication methods are utilized and evaluated across three rounds of experiments, including corresponding user studies, to analyse human preferences when collaborating with a robot on tasks involving rapid physical interactions, such as the collaborative transport of objects. We believe that, beyond attempting to answer these questions, merely posing them can provide significant insights for the HRI and social robotics research communities.

Thus, our contributions would be as follows. In the first round of experiments, we compare two predictors with different success rates and find that, once the failure rate is reduced to an acceptable value, the human no longer perceives any improvement. In the second round of experiments, we compare two direct communication systems and find that the human prefers the one that is more natural even though it is technically inferior in terms of response delay and failure rate. Finally, we compare the system preferred by the human in each of the first two rounds of experiments as well as the combination of both to verify that this combination is the most accepted by the human as it offers more freedom to collaborate with the robot, being this our third contribution.

In the rest of the document, we present the related work in Sect. 2. Sect. 3.1 includes an explanation of the task selected for this study, the collaborative transport of objects, as well as all the relevant details of all the systems employed in this work. Sect. 3.2 presents the hypotheses to be tested, the setup and methodology employed as well as the distribution of participants who performed the experiments. Sect. 4 presents the results obtained in each round of experiments. Sect. 5 shows a short discussion of these results. Finally, Sect. 6 contains the conclusions.

The task of human-robot cooperative object transport involves at least one human and one robot working together to relocate objects from point A to point B. This task can be executed over short distances, requiring only arm movement, or over longer distances necessitating full-body movement from both participants. Traditionally, methods to solve this task emphasize the robot adapting to the human [13,14,15]. Bussy et al. [13] explore the manner in which two humans accomplish this task, using motion primitives to allow the robot to adjust to human intentions via an impedance controller. Similarly, Rozo et al. [14] employ an impedance controller, with its parameters derived through Learning from Demonstration (LfD), which can be altered to align with the human’s desired behaviour. Lanini et al. [15] extend this concept by using a classifier to identify the human’s intended subtask (such as initiating, halting, accelerating, turning, etc.), ensuring that the robot synchronizes with the human’s actions as quickly as possible.

Due to the versatility of the task, being able to be executed in smaller or larger spaces with variable durations, it can be used to investigate other facets [[16](/article/10.1007/s12369-025-01303-9#ref-CR16 "Mörtl A, Lawitzky M, Kucukyilmaz A, Sezgin M, Basdogan C, Hirche S (2012) The role of roles: physical cooperation between humans and robots. Int J Rob Res 31(13):1656–1674. https://doi.org/10.1177/0278364912455366

            "), [17](/article/10.1007/s12369-025-01303-9#ref-CR17 "Losey DP, McDonald CG, Battaglia E, O (2018), 010804 M.K.: A review of intent detection, arbitration, and communication aspects of shared control for physical human–robot interaction. Appl Mech Rev 70(1). 
              https://asmedigitalcollection.asme.org/appliedmechanicsreviews/article-pdf/70/1/010804/5964415/amr_070_01_010804.pdf
              
            ")\]. Mortl et al. \[[16](/article/10.1007/s12369-025-01303-9#ref-CR16 "Mörtl A, Lawitzky M, Kucukyilmaz A, Sezgin M, Basdogan C, Hirche S (2012) The role of roles: physical cooperation between humans and robots. Int J Rob Res 31(13):1656–1674. 
              https://doi.org/10.1177/0278364912455366
              
            ")\] focus on identifying the human’s role to allow the robot to alter its role between leader, follower, and collaborator. This adjustment ensures the robot not only aligns with the human but also takes the lead if necessary. Losey et al. \[[17](/article/10.1007/s12369-025-01303-9#ref-CR17 "Losey DP, McDonald CG, Battaglia E, O (2018), 010804 M.K.: A review of intent detection, arbitration, and communication aspects of shared control for physical human–robot interaction. Appl Mech Rev 70(1). 
              https://asmedigitalcollection.asme.org/appliedmechanicsreviews/article-pdf/70/1/010804/5964415/amr_070_01_010804.pdf
              
            ")\] examine the interplay between intention and the assignment of roles within joint transport by leveraging an arbitration concept. Their study posits that accurately interpreting human intentions is crucial for developing shared-control strategies \[[18](/article/10.1007/s12369-025-01303-9#ref-CR18 "Selvaggio M, Cacace J, Pacchierotti C, Ruggiero F, Giordano (2021) P.R.: a shared-control teleoperation architecture for nonprehensile object transportation. IEEE Trans Robot 38(1):569–583")\].

Speaking of human intention, the two strategies discussed in the previous section to better understand humans, inference engines and direct communication systems, fall within what has been known for more than a decade as Social Signal Processing (SSP) [19, 20]. That is, combining psychology, computer vision, speech processing and artificial intelligence to give machines the ability to recognise and interpret human social signals, such as gestures, facial expressions, vocal behaviour, etc. to improve the interaction between humans and robots [21,22,23].

While this has been known for more than a decade and multimodality, i.e. the joint analysis of multiple of these behavioural cues, was postulated as one of the possible future challenges, it is not so common to find works that attempt to combine several sources of information. If we focus on the task at hand, collaborative transport, of the two strategies above, the first (the use of inference engines) is relatively abundant in the literature [24,25,26,27]. Most of these works use different architectures based on Gaussian Mixture Models (GMM) or some kind of Artificial Neural Network (ANN) to obtain a prediction of the human’s intention whether this is the trajectory they are going to follow, the movement they are going to make with their hand or the next object they are going to pick up. Applied to collaborative transportation tasks or, in general, tasks with physical contact, it has been common to use control techniques to make the robot adapt to the human’s wishes using both impedance and admittance controllers [28,29,30,31]. More recent work, on the other hand, has to include some kind of predictor, either of the desired trajectory for the object [32] or of its velocity profile [33] or even of the force that the human is going to exert on the object in the short term [34]. However, none of these works contemplate the possibility of the human communicating with the robot to reduce uncertainty or resolve any problems generated by an error in the robot’s inference.

This second strategy, a direct communication system either by voice, gestures or any other established code, is less common. In [[35](/article/10.1007/s12369-025-01303-9#ref-CR35 "Domínguez-Vidal JE, Torres-Rodríguez IJ, Garrell A, Sanfeliu A (2021) User-friendly smartphone interface to share knowledge in human-robot collaborative search tasks. In 30th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp 913–918). https://doi.org/10.1109/RO-MAN50785.2021.9515379

            ")\] the authors design a smartphone application with which human and robot can communicate bidirectionally over long distances. This app is applied in \[[36](/article/10.1007/s12369-025-01303-9#ref-CR36 "Dalmasso M, Domínguez-Vidal JE, Torres-Rodríguez IJ, Garrell A, Sanfeliu A (2023) Shared task representation for human-robot collaborative navigation: the collaborative search case. Int J Soc Robot. 
              https://doi.org/10.1007/s12369-023-01067-0
              
            ")\] to a collaborative search in an urban scenario with multiples walls, columns and other type of occlusions as well as environmental noise making it impossible to use visual or audible clues for the communication between both agents. Thanks to this app, the robot can know both the route that the human is following and the one they want to follow despite the multiple occlusions that do not allow the robot to track the human. This makes it possible to minimize the overlapping of the areas explored by each agent. Another example where the human is allowed to explicitly communicate with the robot is \[[37](/article/10.1007/s12369-025-01303-9#ref-CR37 "Mullen JF, Mosier J, Chakrabarti S, Chen A, White T, Losey DP (2021) Communicating inferred goals with passive augmented reality and active haptic feedback. IEEE Robot Autom Lett 6(4):8522–8529")\]. In this, the robot infers the goal of the task that the human wants the robot to perform, but before executing it, it asks for confirmation from the human through a multimodal system combining augmented reality (AR) for passive visualization and haptic wristbands for active feedback. More specifically, the robot displays the inferred beliefs about human goals and its confidence through the AR system and uses the wristband to actively alert users when it needs clarification or specific teaching inputs. In \[[38](/article/10.1007/s12369-025-01303-9#ref-CR38 "Gildert N (2022) Combining implicit and explicit communication in object manipulation tasks between two robots. PhD thesis, University of York")\] both strategies are combined to improve object manipulation between two robots. To this end, both robots communicate their plans both implicitly through the force they exert on the object and explicitly by exchanging wireless messages. Finally, \[[39](/article/10.1007/s12369-025-01303-9#ref-CR39 "Lorentz V, Weiss M, Hildebrand K, Boblan I (2023) Pointing gestures for human-robot interaction with the humanoid robot digit. In 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp 1886–1892). IEEE")\] allows the human to use a combination of gestures and voice commands to tell the robot which object to pick up and where to take it.

Finally, if we focus not on verbal cues in general as indicated in [19, 20], but specifically on voice commands recognition and applied to robotics, although there are old works that tried to allow the human to transmit simple movement commands to a mobile robot [40, 41], it was not until the proliferation of Artificial Neural Networks (ANN) and the emergence of large datasets containing lists of typical commands [42] that satisfactory results began to be achieved, not in understanding verbal cues but by detecting finite lists of commands [43, 44]. With the exception of [37], none of the above works compares their communication system with another one. Moreover, all of them assume that the human is willing to use their system. In this article, we do not take that for granted and perform both the comparison of multiple systems and that verification.

Thus, the differentiating factors of this article with all the previous ones is that we start by combining two strategies to know the human’s intention (inference and direct communication) taking inspiration from [37, 38] but, unlike them, using a human-robot setup instead of a robot-robot setup as in [38] and without the need to use expensive gadgets such as AR glasses in [37]. Additionally, within each of these strategies, we use several methods to see if there is any difference between them for the human and, if so, to discover the advantages and disadvantages of each method as well as the user’s preferences. We believe that the results offered by this article can be of great value to the HRI community.

3 Methods

This section is dedicated to explaining the task used as a testbed as well as the different algorithms used. At the same time, the experimental setup and the methodology including the hypotheses to be tested and the participants distribution, are also explained.

3.1 Collaborative Object Transportation

Collaborative object transportation is a task in which a human and a robot collaboratively transport an object from a starting point to a destination point that may or may not be predefined in advance. It is therefore a task that is performed in close proximity and in which there is an exchange of physical forces between the two agents. These two characteristics mean that the robot’s movements cannot be abrupt, so as not to harm the human next to it, and that the robot’s response speed must be high in order to be able to adapt to the rapid changes that the human may make.

Additionally, this task is typically limited to moving an object over a short distance using only a robotic arm. In this work, we will transport an object 5-\(7\,m\) through a scenario with multiple obstacles so that there are multiple routes to the predefined goal and enabling route changes in real-time. This implies that the robot must move its platform through the scenario, adapting to the route desired by the human and that the human must make multiple decisions along this route (see Fig. 1 - Top).

Fig. 1

Experiments setup. top left - human-robot pair collaboratively transporting an aluminium bar. Goal marked with a chequered flag. top right - scheme of the designed setup. At least eight routes to the goal. Control desk on the right with researcher managing the experiment and camera recording the point of view on the left. bottom left - handle for better ergonomics with meaning of each button next to it. Only three buttons are used to tell the robot which route the human wants. bottom right - RODE wireless GO microphone used for voice command recognition

In our case, the object being transported is a \(50\,cm\) long aluminium bar. The choice of this object is due to two fundamental reasons. First, it is a rigid object, which allows us to eliminate from the equations the terms corresponding to possible flexibilities or deformations in the object that would not be negligible if it were a plastic object (or directly its transport would have to be modelled in a completely different way if it were fully deformable objects as is the case of fabrics [45, 46]). Since this work focuses on analysing the implications of different methods of communicating and inferring human intention and not on technical advances, we consider it more appropriate to reduce the complexity of the task modelling. Secondly, it is a lightweight object compared to bulkier objects such as boxes or made of denser materials such as steel. This minimizes the inertias that can be generated in its transport making the task safer for the human. In any case, it is not the transported object that is relevant in this work, but the effects of different methods of communication between the human and the robot.

To enable the robot to perform this task, we start from the implementation in [47]. In this, taking advantage of the fact that this is a task in which the exchange of information is mainly done through forces, the environment is modelled by means of virtual forces: a repulsive force for each obstacle detected by the robot and an attractive force for the partial goals that the human-robot pair should follow to reach the place at which they should place the object. These partial goals are obtained from the waypoints generated by a global planner (in our case, the Robot Operating System (ROS) default global planner).Footnote 1 The force resulting from modelling the environment is then combined with the force exerted by the human at one end of the transported object, which is measured using a force sensor located on the robot’s wrist, where the opposite end of the object is secured. This combined force is sent to a controller that generates the robot’s speed commands.

Additionally, this system is conditioned using the human intention. For this purpose, we start from the distinction between implicit and explicit intention shown in [47, [48](/article/10.1007/s12369-025-01303-9#ref-CR48 "Domínguez-Vidal JE, Rodríguez N, Sanfeliu A (2023) Perception-intention-action cycle as a human acceptable way for improving human-robot collaborative tasks. In Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, pp 567–571). https://doi.org/10.1145/3568294.3580149

            ")\]. Based on these two articles, we define implicit intention as that which can be inferred from the actions of the other agent using some inference mechanism and explicit intention as that which is expressed directly using a direct communication channel between the two agents and a code known and agreed upon by them.

3.1.1 Inference of Implicit Intention

Two similar inference systems, representative of the trend in the literature towards using predictors, will be used to obtain the implicit intention of the human. First, a force predictor which, from the previous values of the force exerted by the human, the velocity of the human-robot pair, the representative force of the environment and the robot’s LiDAR readings, generates a prediction of the force that the human will exert during the next \(1\,s\). This prediction can be processed to obtain an estimate of the route the human wishes to follow in the short term, and with this, condition the robot’s planner to match their intention.

In other words, while the robot has a global planner that tells it the optimal route to the place where the object should be delivered, the human may have another route in mind. This predictor provides an estimate of that route desired by the human, causing the robot’s planner to adapt to this desire.

This force predictor used that way has a fundamental shortcoming: when predicting the route, it only takes into account the contribution of the human through their force and not the contribution performed by the robot, e.g., avoiding getting too close to an obstacle or damping rapid changes in the force exerted by the human. This is the reason why in our previous work [11] we suggested that the predictor used had room for improvement.

To solve this, we develop a second predictor that, in addition to the force to be exerted by the human, predicts the velocity of the human-robot pair during the next \(1\,s\). This second prediction can be directly integrated to obtain an estimate of the short-term desired route with which to also condition the robot’s planner. The technical details of this second predictor regarding its architecture based on transformers [49] and visual transformers [50] as well as the format of the dataset used for its training can be consulted in [51]. The most important fact about this predictor is that it allows to reduce the L2 error made in estimating the trajectory at \(1\,s\) from \(0.199\,m\) with the force predictor to \(0.138\,m\) with this velocity predictor. Once this estimation is obtained, it is used in the same way as the one obtained from the previous force predictor.

3.1.2 Direct Communication of Explicit Intention

To obtain the explicit intention of the human we also used two systems. First, the same system with three buttons on the object’s handle used in [11]. By pressing each of them, the human can tell the robot whether to go straight ahead, turn left or turn right. Information that the robot uses to condition its planer at the next intersection. More specifically, the robot’s planner is forced to follow the received human’s command at the next point where a multiplicity of routes occurs, allowing the planner to continue planning as usual once this point has been passed.

The second system implemented is a voice command recognition model. With this, the human can verbally tell the robot their intention using the ‘Go’, ‘Left’ and ‘Right’ commands. These, generate the same conditioning as with the previous buttons (see Fig. 1 - Bottom).

The only possible error using the system with buttons for explicit communication is that the human presses the wrong button. Additionally, its processing delay is negligible. Meanwhile, the model used for voice command recognition can make mistakes recognizing the command that the human has articulated (hit rate: 94.75%). Moreover, if a Bluetooth microphone is used as in our case, the delay between the human saying a command and it taking effect can go from 0.2 to \(1\,s\) if the radio-frequency (RF) cell is saturated.

3.2 Evaluation

We conducted 3 rounds of experiments to test whether technical improvement in inferring human’s desired route has any effect, whether the human prefers a more natural system in expressing intent over technical efficiency, and what effect the combination of both types of systems may have.

3.2.1 Experiments Setup

Both the starting point and the goal to take the object to are preset and are the same in all the executions of the three rounds of experiments. All of them are carried out indoors on a stage with OptiTrack on the ceiling to track both agents and thus know the covered distance and the duration of each execution. The robot used is IVO [52], which has a force sensor on each wrist and can perceive obstacles by means of front LiDAR and rear LaserScan. All the executions performed last a minimum of \(36\,s\) and a maximum of \(110\,s\).

In the first round of experiments, the usefulness of the two predictors mentioned above in achieving an effective HRI is tested. In this case, we understand by usefulness the efficiency with which each method manages to improve multiple desired aspects such as the degree of trust in the robot or human comfort. For that, each volunteer performs three executions: one without any predictor plus one execution with each predictor.Footnote 2 The second round of experiments allows us to compare both explicit communication systems. For this, the same procedure is followed: one execution without any communication system plus one execution with each.Footnote 3

Finally, for the third round, the predictor/communication system best rated by the volunteers in each previous round is selected and both strategies are compared. To make this comparison, each volunteer performs four executions: a baseline without predictor or explicit communication system plus one execution with each strategy plus one execution with both strategies available.Footnote 4 In the three rounds of experiments, the baseline execution is run first followed by the remaining two (three) executions in random order to avoid statistical distortions.

3.2.2 Questionnaires and Methodology

At the end of each execution, each volunteer fills out a handmade questionnaire specifically designed for these experiments and whose questions are shown in Appendix A. This questionnaire is inspired by the use of likert items as in the Godspeed [53] and RoSAS [54] questionnaires but it draws on the experience of previous experiments to ask specific questions relating to the experience of performing a collaborative task with the robot rather than assessing aspects of the robot. Through this questionnaire, different aspects associated with an effective HRI are assessed both numerically from 1 to 7 and by choosing between the different systems tested.

More specifically, the developed questionnaire asks questions that can be grouped into six blocks: Robot contribution to fluency, Robot contribution to performance, Robot contribution as Human, Human responsibility, Trust in Robot and Comfort. The first two categories refer to the subjective contribution that the human considers the robot is making to the smooth and seamless running of the task as well as trying to fulfil it in the most optimal way possible. The third category tests whether or not the human considers the robot’s contribution to be comparable to their own, while the fourth category refers to whether or not the human considers that they should maintain control of the task or, conversely, the team’s performance was not dependent on them. The fifth category assesses the degree of trust the human had in the robot to perform correctly in order to complete the task. Finally, the sixth category tests the degree of comfort or discomfort the human felt when interacting with the robot. The Cronbach’s alpha [55] of each block of questions are also shown in Appendix A. Overall, this alpha ranges between 0.780 and 0.887 showing that the questions in each block are consistent with each other (using the criterion of alpha > 0.7).

Although this questionnaire has been designed to try to be generic and applicable to different tasks involving a collaboration between at least one human and one robot, we recognize that the specific task being worked on was taken into account when designing the questions. An example of this is the question I felt comfortable accompanying the robot from the block Comfort. This is a limitation of this questionnaire that should be taken into account by other researchers in case they wish to use it in their research.

About the statistical analysis, the numerical ratings are then analysed by means of different tests. First, the Saphiro-Wilk’s and Levene’s tests are applied. If the variable analysed meets the normality condition, an ANOVA test with Bonferroni correction is applied to check whether there is a statistically significant variation (p < 0.05), in which case, a Tukey’s HSD (Honest Significant Difference) test is applied. If the normality condition is not met, a non-parametric Kruskal-Wallis test is applied followed by a Nemenyi’s test if statistically significant variation is detected between the systems analysed in each case. After filling in all the questionnaires, a short interview with three open questions is conducted, giving the volunteers the opportunity to express themselves freely: What did you think of the whole experiment? What were your feelings during each attempt? What would you improve?

3.2.3 Hypotheses

Relative to the first round of experiments about intention predictors:

H1 - Adding a predictor to the robot’s decision making system reduces the human’s effort.

H2 - Once a sufficient hit rate is reached, the human ceases to positively value technical improvements.

Relative to the second round of experiments about intention communication systems:

H3 - Adding a way to explicitly indicate the human’s intention improves multiples aspects of an effective HRI.

H4 - The human prefers a natural communication system with a non-negligible error rate over a more robust one.

Relative to the third round of experiments about the comparison of both systems:

H5 - A system that allows the human to directly express their intention improves multiple aspects of an effective HRI just as much as a system that attempts to infer it.

H6 - A combination of an inference system with an explicit communication one is the option best valuated by the human and the preferred one.

3.2.4 Participants

A total of 75 volunteers were recruited from our research institute as well as from different schools of the partner university. 22 volunteers (age: µ = 26.45, σ = 4.02; 23% female) participated in the first round of experiments performing 66 experiments (3 each). 23 volunteers (age: µ = 27.36, σ = 4.87; 26% female) participated in the second performing 69 runs (also 3 each). Finally, 30 volunteers (age: µ = 28.32, σ = 5.12; 27% female) participated in the third round by performing 120 experiments (4 each). As it can be seen, there is an increase in the sample size used for the third round of experiments as this is the most important of the three. This is in order to to avoid one of the major shortcomings of the previous work that gave rise to this article [11], namely that the sample size was too small to draw definitive conclusions. The other two rounds of experiments have also seen their sample size increased, although to a lesser extent than the third due to the difficulty of obtaining volunteers in HRI experiments.

All the experiments reported in this article have been performed after the approval of the ethics committee of the Universitat Politècnica de Catalunya (UPC) in accordance with all the relevant guidelines and regulations (ID: 2023.05) and all the volunteers have signed an informed consent form. No volunteers were paid for participating in this study, ensuring that there is no conflict of interest.

4 Results

Before analysing each round of experiments, we perform a post-hoc statistical power test to know what values we can be statistically sure of. Thus, using the criterion of p < 0.05, for the first round (22 volunteers) we can detect effect sizes as low as \(\eta^{2}=0.138\) with a statistical power of 80%. For the second (23 volunteers), as \(\eta^{2}=0.133\); and for the third round (30 volunteers), as \(\eta^{2}=0.089\). All variables analysed by variance tests are normally distributed according to the Shapiro-Wilk’s test unless otherwise indicated.

Regarding the figures that will be shown in this section, the colour palette chosen for each of them was selected so that the same colour is used for the same executions. This means that all assessments corresponding to a baseline execution (no predictor or direct communication system is used) use the same grey colour. Executions corresponding to the force predictor use the same shade of dark red and those corresponding to the velocity predictor use the same shade of light red. Similarly, dark blue is used for runs where the buttons are used and light blue for runs where the voice command recognition system is used. This allows the reader to have more context in the subsection devoted to the analysis of the third round of experiments. Similarly, the same colour (orange, cyan and yellow) is used for those variables that mean the same thing in all rounds of experiments, such as the mean and maximum force exerted or the average duration of each experiment. The authors recommend to read the article in colour.

4.1 Force Predictor VS. Velocity Predictor

To test hypothesis H1, we performed three objective measures (execution duration, mean force, and maximum force exerted by the human during the execution) in the three executions comprising this first round. Fig. 2 shows the result. No statistically significant variation is observed in any of the measures (p = 0.48, p = 0.15 and p = 0.18 respectively) so it cannot be claimed that adding a predictor reduces the human’s effort in any way. H1 is rejected.

Fig. 2

Assessment of objective measurements (first round of experiments). mean force exerted in orange, maximum force exerted in light blue and duration in yellow for the three experiments. Left axis in newtons (both forces) and right axis in seconds (duration). Bars represent std. dev

To test H2, several tests are performed. First, the volunteers are asked to rate from 1 to 7 multiple aspects corresponding to an effective HRI. Fig. 3 shows the result. As can be seen, statistically significant increases occur for all variables with these increases being similar for both predictors: “Robot contribution to performance” (\(F(2,63)=15.89\), p < 0.001, \(\eta^{2}=0.335\)), “Human responsibility” (\(F(2,63)=6.88\), p = 0.002, \(\eta^{2}=0.179\)), “Trust in Robot” (\(F(2,63)=5.46\), p = 0.006, \(\eta^{2}=0.147\)), “Comfort” (\(F(2,63)=12.91\), p < 0.001, \(\eta^{2}=0.291\)). The exceptions are “Robot contribution to fluency” where the velocity predictor generates a greater increase (\(F(2,63)=8.44\), p < 0.001, \(\eta^{2}=0.211\); with force predictor: p = 0.011; with velocity predictor: p < 0.001) and “Robot contribution as Human” where the force predictor is the one with the biggest increase (\(F(2,63)=10.74\), p < 0.001, \(\eta^{2}=0.254\); with force predictor: p < 0.001; with velocity predictor: p = 0.0012). No notably more positive valuations are therefore detected for the second predictor despite reducing the mean error in trajectory estimation by 30.6% (\(0.138\,m\) versus \(0.199\,m\)) except if the subjective aggressiveness of the robot’s movements (one of the questions associated to the “Robot contribution to fluency” block in our questionnaire, see the Appendix) is independently analysed in which there is an increase in the case of using the first predictor although without being statistically significant (p = 0.051).

Fig. 3

Assessment of the main aspects involved in the interaction (first round of experiments). comparison among the baseline experiment (without any predictor) in gray, experiment with force predictor in dark red and with velocity predictor in light red. Valuation from 1 (very low) to 7 (very high). Statistical significance marked with *: p < 0.05, **: p < 0.01, ***: p < 0.001. Bars represent std. dev

Additionally, volunteers are asked to explicitly choose between the two predictors, accepting the draw also as a valid option. Fig. 4 shows the result. In general, the velocity predictor is considered to be safer and more fluid, and the force predictor is considered to execute the task faster. The draw is the predominant choice as to which one is easier to use or which one makes the robot behave more naturally.

Fig. 4

Election among predictors make by volunteers (first round of experiments). election made by the 22 volunteers instead of valuate aspects numerically. Force predictor in dark red, velocity predictor in light red and draw in yellow

Finally, volunteers are asked to choose which predictor they find most appropriate for performing the task at hand, not giving the draw as an option. Fig. 5 shows a complete draw on this question. Therefore, H2 is confirmed, as the volunteers have not indicated a preference for the velocity predictor over the force predictor despite being technically superior. Some of the volunteers’ comments in the post-experiment interview confirms this result. Volunteer 1.6 commented ”If you have changed anything between the second (velocity) and the third (force) execution, I haven’t noticed it”. Volunteer 1.13 commented ”The last one (velocity) seemed smoother to me but both work correctly”.

Fig. 5

Direct comparison of options (first round of experiments). election made by the 22 volunteers with respect to which system they prefer for the task at hand. Force predictor in dark red, velocity predictor in light red

At the end of the questionnaire that volunteers fill out after each run, we added a task-specific control question to check that they understood that the various methods they were testing were designed to allow them to indicate their intention to the robot. Fig. 6 shows the volunteers’ subjective ratings of how easy they found it to indicate their intention to the robot in this first round of experiments. There is a statistically significant increase when using any predictor but this increase is no greater when using the most objectively accurate predictor (performing a Kruskal-Wallis test since it does not pass the Shapiro-Wilk’s test followed by a Nemenyi test: H = 17.06, p < 0.001; with force predictor: p < 0.001; with velocity predictor: p = 0.0021) reaffirming the hypothesis H2.

Fig. 6

Assessment of the subjective easiness perceived by the user to indicate their intention (first round of experiments). comparison among the three executions performed: baseline, force predictor and velocity predictor. Valuation from 1 (very low) to 7 (very high). Statistical significance marked with *: p < 0.05, **: p < 0.01, ***: p < 0.001. Bars represent std. dev

4.2 Buttons VS. Voice Commands Recognition

If we perform the same objective measurements that were performed in the first round of experiments, we observe a reduction in the mean and maximum force exerted by the human in the case of using buttons although without being statistically significant (see Fig. 7). This generates an increase in the duration of the execution that we cannot consider as statistically significant as it lacks sufficient statistical power(\(F(2,66)=3.15\), p = 0.049 \(\eta^{2}=0.087\)). It cannot, therefore, be indicated that there is a reduction in human effort.

Fig. 7

Assessment of objective measurements (second round of experiments). mean force exerted in orange, maximum force exerted in light blue and duration in yellow for the three executions. Left axis in newtons (both forces) and right axis in seconds (duration). Statistical significance marked with *: p < 0.05. Bars represent std. dev

To test H3, we use the numerical assessment made by the volunteers using the same previous questionnaire (see Fig. 8). A statistically significant improvement is observed in all the aspects analysed, except in “Human responsibility” in which we lack sufficient statistical power (\(F(2,66)=3.99\), p = 0.023 \(\eta^{2}=0.108\)), being this always higher in the case of the use of voice commands. We highlight “Robot contribution to fluency” (\(F(2,66)=7.66\), p = 0.001, \(\eta^{2}=0.188\); with buttons: p = 0.032; with voice commands: p < 0.001) and “Comfort” (\(F(2,66)=8.67\), p < 0.001, \(\eta^{2}=0.208\); with buttons: p = 0.014; with voice commands: p < 0.001). There is also a statistically significant increase in “Robot contribution to performance” (\(F(2,66)=19.63\), p < 0.001, \(\eta^{2}=0.373\)) using voice commands relative to using buttons (p = 0.042) and a statistically significant reduction in “Robot aggressiveness” (performing a Kruskal-Wallis test since it does not pass the Shapiro-Wilk’s test followed by a Nemenyi test: H = 9.59, p = 0.005; with voice commands: p = 0.003). H3 is therefore confirmed. For the sake of completeness, the rest of results are as follows: “Robot contribution as Human” (\(F(2,66)=10.05\), p < 0.001, \(\eta^{2}=0.233\)), “Trust in Robot” (\(F(2,66)=7.66\), p = 0.001, \(\eta^{2}=0.188\)).

Fig. 8

Assessment of the main aspects involved in the interaction (second round of experiments). comparison among the baseline experiment (without any communication system) in gray, execution with buttons in dark blue and with voice commands in light blue. Valuation from 1 (very low) to 7 (very high). Statistical significance marked with *: p < 0.05, **: p < 0.01, ***: p < 0.001. Bars represent std. dev

To test H4, we asked the volunteers to choose between the two explicit communication systems (see Fig. 9) with the system with buttons being considered faster when communicating and the system with command recognition coming out victorious in all other aspects analysed. Finally, the volunteers are asked to choose which system seems more appropriate for the task being the system with voice commands chosen by 86.9% of them (see Fig. 10). H4 is therefore confirmed.

Fig. 9

Election among communication systems make by volunteers (second round of experiments). Election made by the 23 volunteers instead of evaluate aspects numerically. Buttons in dark blue, voice commands in light blue and draw in yellow

Fig. 10

Direct comparison of options (second round of experiments). election made by the 23 volunteers with respect to which system they prefer for the task at hand. Buttons in dark blue, voice commands in light blue

Some of the volunteers’ comments shed light on these results, showing that a communication system that is more natural or human-like is preferred to one that may be technically more robust. Volunteer 2.8 commented ”Even though I sometimes have to repeat the command, I prefer to be able to talk to the robot”. Volunteer 2.17 said ”There is a delay until my command takes effect that with the first 1 (buttons) it doesn’t happen but the second one (voice commands) allows me to focus more on exerting force”.

Analysing the result of the control question in which volunteers rate the ease with which they can communicate their intention to the robot, Fig. 11 shows how volunteers consider that command recognition allows them to indicate their intention more easily (with a Kruskal-Wallis test since it does not pass the Shapiro-Wilk’s test followed by a Nemenyi test: H = 19.46, p < 0.001; with buttons: p = 0.0048; with voice commands: p < 0.001) reaffirming the hypothesis H4.

Fig. 11

Assessment of the subjective easiness perceived by the user to indicate their intention (second round of experiments). comparison among the three executions performed in the second round: baseline, with buttons and with voice commands recognition. Valuation from 1 (very low) to 7 (very high). Statistical significance marked with *: p < 0.05, **: p < 0.01, ***: p < 0.001. Bars represent std. dev

4.3 Complete System

We take the velocity predictor (simply because it seems to generate less aggressive movements since there is not any noticeable preference between them) and the voice commands recognition system as the two preferred systems by the human for inference and direct communication respectively. With these, we perform a last round of experiments to compare both of them and their combination.

Looking at the same objective measures used above, only in the mean force exerted by the human is observed a statistically significant variation (\(F(3,116)=5.71\), p = 0.0012 \(\eta^{2}=0.129\)) (see Fig. 12). Specifically, there is a statistically significant increase when using the velocity predictor relative to the baseline (p = 0.009) and statistically significant decreases when using the voice commands or the complete system relative to using the predictor (p = 0.002 and p = 0.014) but not relative to the baseline (p = 0.963 and p = 0.998).

Fig. 12

Assessment of objective measurements (third round of experiments). mean force exerted in orange, maximum force exerted in light blue and duration in yellow for the four executions. Left axis in newtons (both forces) and right axis in seconds (duration). Statistical significance marked with *: p < 0.05, **: p < 0.01. Bars represent std. dev

To check H5 the numerical results obtained in the previous rounds cannot be directly compared as they were performed by different volunteers but require the same people to use both systems. Using the same questionnaire used in previous rounds (see Fig. 13), it can be observed that both systems produce statistically significant increases in all analysed parameters except for “Human responsibility” where again we do not have enough statistical power (\(F(3,116)=2.95\), p = 0.036 \(\eta^{2}=0.071\)). The system with velocity predictor achieves larger increases in “Robot contribution to fluency” (\(F(3,116)=21.42\), p < 0.001 \(\eta^{2}=0.356\); with predictor: p < 0.001; with voice commands: p = 0.0011) and “Robot contribution to performance” (\(F(3,116)=22.89\), p < 0.001 \(\eta^{2}=0.372\); with predictor: p < 0.001; with voice commands: p = 0.0022), while the system with voice commands outperforms it in “Trust in Robot” (\(F(3,116)=18.88\), p < 0.001 \(\eta^{2}=0.328\); with predictor: p = 0.014; with voice commands: p < 0.001) and “Comfort” (\(F(3,116)=22.00\), p < 0.001 \(\eta^{2}=0.362\); with predictor: p = 0.0061; with voice commands: p < 0.001). H5 is therefore confirmed. For the sake of completeness: “Robot contribution as Human” (\(F(3,116)=28.13\), p < 0.001 \(\eta^{2}=0.421\)).

Fig. 13

Assessment of the main aspects involved in the interaction (third round of experiments). comparison among the baseline execution (without predictor nor voice commands) in gray, execution with velocity predictor in light red, with voice commands in light blue and with both in green. Valuation from 1 (very low) to 7 (very high). Statistical significance marked with *: p < 0.05, **: p < 0.01, ***: p < 0.001. Bars represent std. dev

To verify H6, Fig. 13 indeed shows that the system that makes use of both options is the one that scores better in all the aspects analysed, being the only one that manages to reduce the perceived aggressiveness in the robot’s movements in a statistically significant way (performing a Kruskal-Wallis test since it does not pass the Shapiro-Wilk’s test followed by a Nemenyi test: H = 9.16, p = 0.011; complete system: p = 0.014). If we ask the volunteers to choose between the three systems (see Fig. 14), they choose the system with voice commands as the easiest to use and the complete system that makes use of both options in all other aspects analysed. Finally, when the volunteers were asked to choose which system they considered most appropriate for the task, 86.7% of them opted for the complete system (see Fig. 15). Therefore, H6 is confirmed.

Comments from some volunteers confirm these results. Volunteer 3.19 said, ”The predictor makes it more fluid, but being able to talk to the robot gives me extra security and peace of mind”. Volunteer 3.24 commented, ”They are very different approaches that I think can serve different purposes […] give me both and I choose when and how to use each”.

Finally, in terms of how easily volunteers feel they can communicate their intention to the robot, Fig. 16 shows once again the volunteers’ preference for the system that combines both methods as they can use whichever one they feel more comfortable with at any given time (with a Kruskal-Wallis test since it does not pass the Shapiro-Wilk’s test followed by a Nemenyi test: H = 46.39, p < 0.001; with predictor: p = 0.0013; with voice commands: p < 0.001, with both: p < 0.001) reaffirming hypothesis H6.

Fig. 14

Election among communication systems make by volunteers (third round of experiments). Election made by the 30 volunteers instead of evaluate aspects numerically. Velocity predictor in light red, voice commands in light blue and both systems in green

Fig. 15

Direct comparison of options (third round of experiments). election made by the 30 volunteers with respect to which system they prefer for the task at hand. Velocity predictor in light red, voice commands in light blue and with both in green

Fig. 16

Assessment of the subjective easiness perceived by the user to indicate their intention (third round of experiments). comparison among the four executions performed in the third round: baseline, velocity predictor, voice commands recognition and both systems. Valuation from 1 (very low) to 7 (very high). Statistical significance marked with *: p < 0.05, **: p < 0.01, ***: p < 0.001. Bars represent std. dev

5 General Discussion

The first notable result of this study is that none of the systems tested seem to produce a reduction in human effort, understood as the mean and maximum force exerted during the task. In the case of using any of the predictors, it seems that there is even an increase which only became statistically significant in the third round of experiments. It is worth mentioning that the volunteers were aware that a predictor was being used in the experiment and that it received as input the previously exerted force; however, they were not informed about which specific predictor was in use. This is because in the second and third round of experiments, whether using buttons or command recognition, the volunteer needed to be aware of these systems to utilize them effectively. Thus, to ensure comparability across experiments, we informed the volunteers of the presence of a predictor whenever it was in use. It is possible that this might have encouraged a greater force on the part of the human to make it easier for the predictor to infer their intention, understood as the desired path. At the same time, we do not believe that the marginal reduction in the force exerted when using the buttons is primarily due to the buttons incentivizing reduced effort. Observing these executions, humans tend to pay attention to the markings on the handle before pressing any button to ensure accuracy, which means that the force exerted during that time naturally tends to be lower.

As for the comparison between the two predictors, that a considerable technical improvement goes unnoticed in the human’s subjective assessment of it, confirming the hypothesis H2, challenges the justification for further refining the predictor. It also seems to indicate that a perfect predictor is not necessary, but simply a sufficiently effective one. This result is not entirely unexpected, as it fits with the Pareto rule [56,57,58] (which states that roughly 80% of effects come from 20% of causes, implying that further improvements beyond the “vital few” often yield negligible improvements) or the Law of Diminishing Returns [59, 60] (which describes how, beyond a certain point, each additional unit of investment produces progressively smaller increases in the profits) in economics. As for the two explicit communication systems, the confirmation of the hypothesis H4 as well as comments such as those of volunteers 2.8 and 2.17 seem to indicate that participants place greater value on naturalness than on technical attributes such as reduced delay or lower failure rates. These two results combined support the idea that we should pivot the current trend of attempting to infer human’s intention in the best possible way towards methods that seek to improve human-robot communication making it as humane as possible.

It is worth mentioning that this work should not be understood as being against the use of predictors. In one of the executions conducted during the third round of experiments, the Wi-Fi network used for the exchange of information between the robot and the computer running the control algorithm was saturated, causing delays in the generation of the robot’s speed commands based on sensor data. This made the interaction with the robot complex and counter-intuitive. The inclusion of the \(1\,s\) prediction enabled the system to to compensate for these delays and allow the human to perform the task satisfactorily. This is an illustrative example of the usefulness of using predictors. Another example could involve scenarios where direct communication is infeasible. This is why we do not advocate discarding the use of predictors, but rather their correct combination with explicit communication systems that are as natural as possible, thus taking advantage of the benefits of both types of systems. This is what explains the title of this article.

The analysis of human preferences carried out in this work leads us to consider multimodality, understood as the use of multiple communication channels of different and even redundant nature, as the preferred option for humans when interacting with a robot. This mitigates the necessity of developing a flawless predictor. Ultimately, this aligns with human behaviour: we use our prior knowledge and experience to try to predict the behaviour of our peers but, when the uncertainty is too high or we simply do not know the other person well, we choose to communicate directly to avoid misunderstandings that could negatively affect the outcome of the interaction. It is therefore to be expected that the human expects the same behaviour from the robot if the goal is for robots to be perceived as partners rather than mere tools.

6 Conclusions

In this work we have conducted three rounds of experiments using a collaborative transportation task to test the following: 1) Predictor-based inference systems can enhance multiple aspects of effective HRI. However, there is a sufficient performance beyond which technical improvements go unnoticed by the human user. 2) The human prefers explicit communication methods with the robot that prioritize naturalness, even at the expense of a higher failure rate. 3) While each system individually achieves comparable subjective ratings from humans across various aspects, their proper combination results in superior HRI outcomes, including enhanced fluency, trust in the robot, and user comfort.

This study should be replicated in other tasks to confirm these findings. In any case, these findings can serve as a stepping stone to encourage and justify the use and development of human-robot communication systems that seek greater naturalness, such as natural language or gestures, even if these systems may exhibit errors, simply because humans are willing to accept them.

References

Goldhoorn A, Garrell A, Alquézar R, Sanfeliu A (2017) Searching and tracking people in urban environments with static and dynamic obstacles. Robot Auton Syst 98:147–157
Article Google Scholar
Saito N, Ogata T, Funabashi S, Mori H, Sugano S (2021) How to select and use tools?: active perception of target objects using multimodal deep learning. IEEE Robot Autom Lett 6(2):2517–2524
Article Google Scholar
Dragan AD (2017). Robot planning with mathematical models of human state and action. arXiv preprint arXiv:1705.04226
Choudhury R, Swamy G, Hadfield-Menell D, Dragan AD (2019) On the utility of model learning in hri. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp 317–325). IEEE
Tabrez A, Luebbers MB, Hayes B (2020) A survey of mental modeling techniques in human–robot teaming. Curr Robot Rep 1:259–267
Article Google Scholar
Ordóñez FJ, Roggen D (2016) Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1):115
Article Google Scholar
Schydlo P, Rakovic M, Jamone L, Santos-Victor J (2018) Anticipation in human-robot cooperation: a recurrent neural network approach for multiple action sequences prediction. In 2018 IEEE International Conference on Robotics and Automation, pp 5909–5914). IEEE
Gildert N, Millard AG, Pomfret A, Timmis J (2018) The need for combining implicit and explicit communication in cooperative robotic systems. Front Robot AI 5:65
Article Google Scholar
Lee B-J et al. (2018) Perception-action-learning system for Mobile social-service robots using deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence, vol 32
Dar S, Bernardet U (2020) When agents become partners: a review of the role the implicit plays in the interaction with artificial social agents. Multimodal Technol Interact 4(4):81
Article Google Scholar
Domínguez-Vidal JE, Sanfeliu A (2023) Inference VS. Explicitness. Do we really need the perfect predictor? The human-robot collaborative object transportation case. In 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp 1866–1871). IEEE
Desender K, Boldt A, Yeung N (2018) Subjective confidence predicts information seeking in decision making. Psychological Sci 29(5):761–778
Article Google Scholar
Bussy A, Gergondet P, Kheddar A, Keith F, Crosnier A (2012) Proactive behavior of a humanoid robot in a haptic transportation task with a human partner. In 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication, pp 962–967). IEEE
Rozo L, Bruno D, Calinon S, Caldwell DG (2015) Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 1024–1030). IEEE
Lanini J, Razavi H, Urain J, Ijspeert A (2018) Human intention detection as a multiclass classification problem: application in physical human–robot interaction while walking. IEEE Robot Autom Lett 3(4):4171–4178
Article Google Scholar
Mörtl A, Lawitzky M, Kucukyilmaz A, Sezgin M, Basdogan C, Hirche S (2012) The role of roles: physical cooperation between humans and robots. Int J Rob Res 31(13):1656–1674. https://doi.org/10.1177/0278364912455366
Article Google Scholar
Losey DP, McDonald CG, Battaglia E, O (2018), 010804 M.K.: A review of intent detection, arbitration, and communication aspects of shared control for physical human–robot interaction. Appl Mech Rev 70(1). https://asmedigitalcollection.asme.org/appliedmechanicsreviews/article-pdf/70/1/010804/5964415/amr_070_01_010804.pdf
Selvaggio M, Cacace J, Pacchierotti C, Ruggiero F, Giordano (2021) P.R.: a shared-control teleoperation architecture for nonprehensile object transportation. IEEE Trans Robot 38(1):569–583
Article Google Scholar
Vinciarelli A, Pantic M, Bourlard H (2009) Social signal processing: survey of an emerging domain. Image Vision Comput 27(12):1743–1759
Article Google Scholar
Vinciarelli A, Pantic M, Heylen D, Pelachaud C, Poggi I, D’Errico F, Schroeder M (2011) Bridging the gap between social animal and unsocial machine: a survey of social signal processing. IEEE Trans Affect Comput 3(1):69–87
Article Google Scholar
Yan H, Ang MH, Poo (2014) A.N.: a survey on perception methods for human–robot interaction in social robots. Int J Soc Robot 6:85–119
Article Google Scholar
Salem M, Dautenhahn K (2017) 23 Social signal processing in social robotics. Soc Signal Process 317
Tapus A, Bandera A, Vazquez-Martin R, Calderita LV (2019) Perceiving the person and their interactions with the others for social robotics–a review. Pattern Recognit Lett 118:3–13
Article Google Scholar
Luo RC, Mai L (2019) Human intention inference and on-line human hand motion prediction for human-robot collaboration. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 5958–5964). IEEE
Jain S, Argall B (2018) Recursive bayesian human intent recognition in shared-control robotics. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 3905–3912). IEEE
Maroger I, Ramuzat N, Stasse O, Watier B (2021) Human trajectory prediction model and its coupling with a walking pattern generator of a humanoid robot. IEEE Robot Autom Lett 6(4):6361–6369
Article Google Scholar
Thobbi A, Gu Y, Sheng W (2011) Using human motion estimation for human-robot cooperative manipulation. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 2873–2878). IEEE
Agravante DJ, Cherubini A, Bussy A, Gergondet P, Kheddar A (2014) Collaborative human-humanoid carrying using vision and haptic sensing. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pp 607–612). IEEE
Tarbouriech S, Navarro B, Fraisse P, Crosnier A, Cherubini A, Sallé D (2019) Admittance control for collaborative dual-arm manipulation. In 2019 19th International Conference on Advanced Robotics (ICAR), IEEE, pp. 198–204
Yu X, Li B, He W, Feng Y, Cheng L, Silvestre C (2021) Adaptive-constrained impedance control for human–robot co-transportation. IEEE Trans Cybern 52(12):13237–13249
Article Google Scholar
Li Z, Liu, Huang Z, Peng Y, Pu H, Ding L (2017) Adaptive impedance control of human–robot cooperation using reinforcement learning. IEEE Trans Ind Electron 64(10):8013–8022
Article Google Scholar
Alevizos KI, Bechlioulis CP, Kyriakopoulos KJ (2020) Physical human–robot cooperation based on robust motion intention estimation. Robotica 38(10):1842–1866
Article Google Scholar
Al-Yacoub A, Zhao Y, Eaton W, Goh YM, Lohse N (2021) Improving human robot collaboration through force/torque based learning for object manipulation. Robot Comput-Integr Manuf 69, 102111
Domínguez-Vidal JE, Sanfeliu A (2023) Improving human-robot interaction effectiveness in human-robot collaborative object transportation using force prediction. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 7839–7845). IEEE
Domínguez-Vidal JE, Torres-Rodríguez IJ, Garrell A, Sanfeliu A (2021) User-friendly smartphone interface to share knowledge in human-robot collaborative search tasks. In 30th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp 913–918). https://doi.org/10.1109/RO-MAN50785.2021.9515379
Dalmasso M, Domínguez-Vidal JE, Torres-Rodríguez IJ, Garrell A, Sanfeliu A (2023) Shared task representation for human-robot collaborative navigation: the collaborative search case. Int J Soc Robot. https://doi.org/10.1007/s12369-023-01067-0
Article Google Scholar
Mullen JF, Mosier J, Chakrabarti S, Chen A, White T, Losey DP (2021) Communicating inferred goals with passive augmented reality and active haptic feedback. IEEE Robot Autom Lett 6(4):8522–8529
Article Google Scholar
Gildert N (2022) Combining implicit and explicit communication in object manipulation tasks between two robots. PhD thesis, University of York
Lorentz V, Weiss M, Hildebrand K, Boblan I (2023) Pointing gestures for human-robot interaction with the humanoid robot digit. In 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp 1886–1892). IEEE
Rogalla O, Ehrenmann M, Zollner R, Becher R, Dillmann R (2002) Using gesture and speech control for commanding a robot assistant. In Proceedings. 11th IEEE International Workshop on Robot and Human Interactive Communication, pp 454–459). IEEE
Lv X, Zhang M, Li H (2008) Robot control based on voice command. In 2008 IEEE International Conference on Automation and Logistics, pp 2490–2494). IEEE
Warden P (2018). Speech commands: a dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209
Majumdar S, Ginsburg B (2020). Matchboxnet: 1d time-channel separable convolutional neural network architecture for speech commands recognition. arXiv preprint arXiv:2004.08531
Kim B, Chang S, Lee J, Sung D (2021). Broadcasted residual learning for efficient keyword spotting. arXiv preprint arXiv:2106.04140
Yin H, Varava A, Kragic D (2021) Modeling, learning, perception, and control methods for deformable object manipulation. Sci Robot 6(54):8803
Article Google Scholar
Hou YC, Sahari KSM, How DNT (2019) A review on modeling of flexible deformable object for dexterous robotic manipulation. Int J Adv Rob Syst 16(3), 1729881419848894
Domínguez-Vidal JE, Rodríguez N, Sanfeliu A (2024) Perception-intention-action cycle in human-robot collaborative tasks: the collaborative lightweight object transportation use-case. Int J Soc Robot
Domínguez-Vidal JE, Rodríguez N, Sanfeliu A (2023) Perception-intention-action cycle as a human acceptable way for improving human-robot collaborative tasks. In Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, pp 567–571). https://doi.org/10.1145/3568294.3580149
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2020). An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Domínguez-Vidal JE, Sanfeliu A (2024) Exploring transformers and visual transformers for force prediction in human-robot collaborative transportation tasks. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp 3191–3197). IEEE
Laplaza J, Rodríguez N, Domínguez-Vidal JE, Herrero F, Hernández S, López A, Sanfeliu A, Garrell A (2022) IVO Robot: a new social Robot for human-Robot collaboration. In Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction, pp 860–864). IEEE
Bartneck C, Kulić D, Croft E, Zoghbi S (2009) Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int J Soc Robot 1:71–81
Article Google Scholar
Carpinella CM, Wyman AB, Perez MA, Stroessner SJ (2017) The robotic social attributes scale (RoSAS) development and validation. In Proceedings of the 2017 ACM/IEEE International Conference on Human-robot Interaction, pp 254–262
Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. psychometrika 16(3):297–334
Article Google Scholar
Pareto V (1964) Cours D’économie Politique. Librairie Droz
Book Google Scholar
Juran JM (1975) The non-Pareto principle; mea culpa. Qual Prog 8(5):8–9
Google Scholar
Dunford R, Su Q, Tamang E, Wintour A (2014) The pareto principle. The Plymouth Student Scientist 7(2):140–148
Google Scholar
Brue SL (1993) Retrospectives: the law of diminishing returns. J Econ Perspect 7(3):185–192
Article Google Scholar
Cannan E (1892) The origin of the law of diminishing returns, 1813-15. Econ J (London) 2(5):53–69
Google Scholar

Download references