HANDS Workshop (original) (raw)

HANDS

Observing and Understanding Hands in Action
in conjunction with ICCV 2025

Join Us: Oct.20 13:00-17:00, 305 B, Hawai'i Convention Center

Poster Sessions: Oct.20 14:00 - 16:30, Boards 112 - 131, Exhibit Hall I

Overview

Welcome to our HANDS@ICCV25.

We are very happy to organize HANDS workshop. This year's workshop will be held at ICCV25. See you in Honolulu.

The ninth edition of this workshop will emphasize the use of multimodal LLMs for hand-related tasks. Multimodal LLMs have revolutionized the perceptions of AI, and demonstrated groundbreaking contributions to multimodal understanding, zero-shot learning, and transfer learning. Those models can process and integrate information from different types of hand data (or modalities), allowing the model to better understand complex hand-object/-hand interaction situations by capturing richer, more diverse representations.

During the workshop, we will explore multimodal LLMs for hand-related tasks through the talks of invited speakers, the presentation of accepted papers, and workshop challenges.

Invited Speakers

Schedule

Time: 13:00 - 17:00 Oct. 20 (Hawai'i time)

Location: 305 B, ICCV25 Hawai'i Convention Center

The detailed schedule is below.

13:00 - 13:10 Opening Remarks
13:10 - 13:40 Invited Talk: Srinath Sridhar Bio: Srinath Sridhar (<srinathsridhar.com>) is the John E. Savage Assistant Professor of Computer Science at Brown University, where he leads the Interactive 3D Vision & Learning Lab (ivl.cs.brown.edu). He received his PhD at the Max Planck Institute for Informatics and was subsequently a postdoctoral researcher at Stanford. His research interests are in 3D computer vision and artificial intelligence. Specifically, his group builds foundational methods for 3D spatiotemporal (4D) visual understanding of the world including objects in it, humans in motion, and human-object interactions, with applications ranging from robotics to mixed reality. He is the recipient of an NSF CAREER award, a Google Research Scholar award, and his work received a Best Student Paper award at WACV and a Best Paper Honorable Mention at Eurographics. He spends part of his time as an Amazon Scholar and a visiting faculty at the Indian Institute of Science (IISc).
Title: Vision and Touch in Robot Learning and Interaction Abstract: Touch, together with vision, is a fundamental sensing modality for robots. However, sensing and combining touch with vision has been hard due to hardware and algorithmic challenges. In this talk, I will discuss my group's work on visuo-tactile sensing and fusion. Specifically, I will introduce (1) GigaHands, a new large-scale 3D human hand activity dataset that provides visual and contact information for robot manipulation learning, and (2) UniTac, a new method for touch sensing that operates without any tactile sensors. We show that touch sensing does not always need cumbersome hardware, and can add significant information for better robot learning.
13:40 - 14:10 Invited Talk: Jihyun Lee
Title: Towards a Universal Generative Prior for Hands and Their Interactions Abstract: We are witnessing remarkable progress in generative modeling, with recent diffusion- and flow-based models demonstrating powerful generative capabilities. In this talk, I will discuss our recent efforts to harness these advances to build a deep generative prior for hands and their interactions. Such priors capture the distribution of plausible hand shapes, poses, and interactions, serving as a universal regularizer for long-standing hand-related vision problems, such as monocular 3D reconstruction. By constraining the solution space to what is physically and semantically plausible, generative priors reduce the ill-posedness of these problems and are particularly effective for in-the-wild generalization, where training supervision is often noisy or insufficient — helping advance progress toward more robust and reliable real-world vision systems.
14:10 - 14:40 Invited Talk: Seungryul Baek
Title: Two Hands and an Object: From Perception to Generation Abstract: In this talk, I will present our lab's recent efforts to advance the understanding of hand–object interactions, which play a crucial role in human activities and everyday manipulation. In particular, we are addressing one of the most challenging scenarios: the complex interaction between two hands and an object, where coordination, occlusion, and fine-grained motion understanding become highly demanding. I will begin by describing a Transformer-based framework that we have developed for modeling and interpreting the dynamics of two hands interacting with an object. Next, I will introduce BiGS, a method that goes a step further by extending this setting to previously unseen or unknown objects. Finally, I will present Text2HOI, a generative model that takes natural language text prompts as input and synthesizes plausible two-hand–object interaction motions.
14:40 - 15:30 Coffee break time & Poster
15:30 - 16:00 Invited Talk: Jingya Wang
Title: Open-World Hand-Object Interaction Synthesis:Towards Generalizable and Dexterous Embodied Manipulation Abstract: Dexterous hand-object interaction constitutes a fundamental component of human physical intelligence, enabling the execution of complex manipulation tasks in unstructured environments. The synthesis of such interactions from open-ended instructions presents significant challenges, particularly in cross-object generalization, long-horizon task reasoning, and physical plausibility assurance. In this talk, we will introduce OpenHOI, a framework that employs a 3D Multimodal Large Language Model to translate open-vocabulary instructions into executable interaction sequences through semantic task decomposition and affordance reasoning. Subsequently, we will discuss UniHM, which establishes a unified representation space for heterogeneous hand morphologies to facilitate cross-dexterous-hand manipulation. Furthermore, we will examine how the integration of human gaze as a biological prior in our GHO-Diffusion model enhances intentionality and human-likeness in synthesized interactions.
16:00 - 16:30 Invited Talk: Rolandos Potamias
Title: Building the Tools of Embodied AI: From Human Hands to Dexterous Agents Abstract: Hands are essential tools for humans to act, interact, and communicate in nearly all daily activities, highlighting the critical need for precise modeling to achieve highly realistic digital agents. However, the complexity of hands, characterized by their scale, degrees of freedom, and versatility, poses significant challenges for current human-centered AI frameworks. These limitations are evident across various domains, including human motion modeling, generative image and video synthesis, and 3D human reconstruction. In this talk, I will discuss key challenges in 3D hand shape and appearance modeling, introducing our large-scale Handy model and WiLoR, our approach for real-time hand detection and reconstruction of hand-object interactions from in-the-wild images. I will also present HaWoR, designed to reconstruct 4D hand motion in world space, particularly from egocentric wearable camera settings where both the hands and camera are in motion. Finally, I will introduce our recent work, CEDex, a large-scale dataset for cross-embodied dexterous grasping derived from human-like contact representations.
16:30 - 16:50 Challenge winner talks
16:50 - 17:00 Closing Remarks

Accepted Papers & Extended Abstracts

We are delighted to announce the following accepted papers and extended abstracts will appear in the workshop! All full-length papers, extended abstracts and invited posters should prepare posters for communication during the workshop. Poster size is 84” x 42”.

Full-length Papers

Extended Abstracts

Technical Report

Organizers

Qi Ye
Zhejiang University

hands2025@googlegroups.com