Second Joint Egocentric Vision Workshop (original) (raw)

Overview

Wearable cameras, smart glasses, and AR/VR headsets are gaining importance for research and commercial use. They feature various sensors like cameras, depth sensors, microphones, IMUs, and GPS. Advances in machine perception enable precise user localization (SLAM), eye tracking, and hand tracking. This data allows understanding user behavior, unlocking new interaction possibilities with augmented reality. Egocentric devices may soon automatically recognize user actions, surroundings, gestures, and social relationships. These devices have broad applications in assistive technology, education, fitness, entertainment, gaming, eldercare, robotics, and augmented reality, positively impacting society.

Previously, research in this field faced challenges due to limited datasets in a data-intensive environment. However, the community's recent efforts have addressed this issue by releasing numerous large-scale datasets covering various aspects of egocentric perception, including HoloAssist, Ego4D, Ego-Exo4D, EPIC-KITCHENS, and HD-EPIC.

The goal of this workshop is to provide an exciting discussion forum for researchers working in this challenging and fast-growing area, and to provide a means to unlock the potential of data-driven research with our datasets to further the state-of-the-art.

Challenges

We welcome submissions to the challenges from February to May (see important dates) through the leaderboards linked below. Participants to the challenges are requested to submit a technical report on their method. This is a requirement for the competition. Reports should be 2-6 pages including references. Submissions should use the CVPR format and should be submitted through the CMT website.

HoloAssist Challenges

HoloAssist is a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks.

Action Recognition

**Lead:**Taein Kwon, ETH Zurich, Switzerland
**Summary:**Action Recognition on the holoassist dataset. Input could be RGB images or multiple modalities.

Challenges Leaderboards Open	Feb 2025
Challenges Leaderboards Close	19 May 2025(some challenges have extended their deadline,please check respective challenge's webpage)
Challenges Technical Reports Deadline (on CMT)	23 May 2025 (11:55 PM AoE)(some challenges have extended their deadline,please check respective challenge's webpage)
Notification to Challenge Winners	30 May 2025
Challenge Reports ArXiv Deadline	6 June 2025
Extended Abstract Deadline (on CMT)	~~2 May 2025~~ 9 May 2025
Extended Abstract Notification to Authors	23 May 2025
Extended Abstracts ArXiv Deadline	2 June 2025
Workshop Date	12 June 2025

Time	Event
08:50-09:00	Welcome and Introductions
09:00-09:30	Invited Keynote 1: Siyu Tang, ETH Zürich, CHTalk Title: Towards an egocentric multimodal foundation model
09:30-10:00	HoloAssist Challenges
10:00-11:00	Coffee Break and Poster Session
11:00-11:30	Invited Keynote 2: Kris Kitani, CMU, USA
11:30-12:00	EPIC-KITCHENS & HD-EPIC Challenges
12:00-12:30	Oral Presentations (Group 1) EgoLife: Towards Egocentric Life Assistant HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning
12:30-13:30	Lunch Break
13:30-14:00	EgoVis Distinguished Papers Award
14:00-14:30	Invited Keynote 3: Xiaolong Wang, UCSD, USA
14:30-15:30	Ego4D & Ego-Exo4D Challenges
15:30-16:00	Coffee Break
16:00-16:30	Invited Keynote 4: Arsha Nagrani, Google DeepMind
16:30-17:05	Aria Gen2
17:05-17:35	Oral Presentations (Group 2) FIction: 4D Future Interaction Prediction from Video EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision Estimating Body and Hand Motion in an Ego‑sensed World Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory
17:35-17:45	Conclusion

PosterBoard #	Title	Authors	arXiv Link
77	Leadership Assessment in Pediatric Intensive Care Unit Team Training	Liangyang Ouyang (The University of Tokyo); Yuki Sakai (The University of Tokyo); Ryosuke Furuta (The University of Tokyo); Hisataka Nozawa (The University of Tokyo); Hikoro Matsui (The University of Tokyo); Yoichi Sato (The University of Tokyo)	link
78	What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning	Chi-Hsi Kung (Indiana University); Frangil Ramirez (Indiana University); Juhyung Ha (Indiana University); Yi-Ting Chen (National Yang-Ming Chiao-Tung University); David Crandall (Indiana University); Yi-Hsuan Tsai (Atmanity Inc.)	link
79	Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory	Zaira Manigrasso (University of Udine); Matteo Dunnhofer (University of Udine); Antonino Furnari (University of Catania); Moritz nottebaum (University of Udine); Antonio Finocchiaro (University of Catania); Davide Marana (University of Udine); Rosario Forte (University of Catania); Giovanni Maria Farinella (University of Catania); Micheloni Christian (University of Udine)	link
80	Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning	Xueyi Ke (Nanyang Technological University); Satoshi Tsutsui (Nanyang Technological University, Singapore); Yayun Zhang (Max Planck Institute for Psycholinguistics); Bihan Wen (Nanyang Technological University)	link
81	PAIR-Net: Enhancing Egocentric Speaker Detection via Pretrained Audio–Visual Fusion and Alignment Loss	Yu Wang (Indiana University); Juhyung Ha (Indiana University Bloomington); David Crandall (Indiana University)	link
82	From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living Activities	Dominick Reilly (University of North Carolina at Charlotte); Manish Kumar Govind (University of North Carolina at Charlotte); Le Xue (Salesforce AI Research); Srijan Das (University of North Carolina at Charlotte)	link
83	EASG-Bench : Video Q&A Benchmark with Egocentric Action Scene Graphs	Ivan Rodin (University of Catania); Tz-Yin Wu (Intel Labs); Kyle Min (Intel Labs); Sharath Sridhar (Intel Labs); Antonino Furnari (University of Catania); Subarna Tripathi (Intel Labs); Giovanni Maria Farinella (University of Catania)	link
84	ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition	Sanjoy Kundu (Auburn University); Shanmukha Vellamcheti (Auburn University); Sathyanarayan N. Aakur (Auburn University)	link
85	Reasoning on hierarchical representation of human behavior from Ego-videos	Simone Alberto Peirone (Politecnico di Torino); Francesca Pistilli (Politecnico di Torino); Giuseppe Averta (Politecnico di Torino)	link
86	Learning reusable concepts across different egocentric video understanding tasks	Simone Alberto Peirone (Politecnico di Torino); Francesca Pistilli (Politecnico di Torino); Antonio Alliegro (Politecnico di Torino); Tatiana Tommasi (Politecnico di Torino); Giuseppe Averta (Politecnico di Torino)	link
87	Efficient Egocentric Action Recognition with Multimodal Data	Marco Calzavara (ETH Zurich); Ard Kastrati (ETH Zurich); Matteo Macchini (Magic Leap); Dushan Vasilevski (Magic Leap); Roger Wattenhofer (ETH Zurich)	link
88	Sequence-Based Identification of First-Person Camera Wearers in Third-Person Views	Ziwei Zhao (Indiana University); Xizi Wang (Indiana University); Yuchen Wang (Indiana University); Feng Cheng (ByteDance); David Crandall (Indiana University)	link
89	Vid2Coach: Transforming How-To Videos into Task Assistants	Mina Huh (University of Texas at Austin); Zihui Xue (University of Texas at Austin); Ujjaini Das (University of Texas at Austin); Kumar Ashutosh (University of Texas at Austin); Kristen Grauman (University of Texas at Austin); Amy Pavel (University of Texas at Austin)	link
90	Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera	Zhengdi Yu (Imperial College London); Stefanos Zafeiriou (Imperial College London); Tolga Birdal (Imperial College London)	link
91	Keystep Recognition using Graph Neural Networks	Julia Romero (University of Colorado Boulder); Kyle Min (Intel Labs); Subarna Tripathi (Intel Labs); Morteza Karimzadeh (University of Colorado Boulder)	link
92	Improving Keystep Recognition in Ego-Video via Dexterous Focus	Zachary Chavis (University of Minnesota); Stephen Guy (University of Minnesota); Hyun Soo Park (University of Minnesota)	link

PosterBoard #	Title	Authors	Paper Link
93	HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos	Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Shangchen Han, Fan Zhang, Linguang Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Engel, Tomas Hodan	link
94	Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision	Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori	link
95	FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video	Andrea Boscolo Camiletto, Jian Wang, Eduardo Alvarado, Rishabh Dabral, Thabo Beeler, Marc Habermann, Christian Theobalt	link
96	Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos	Chiara Plizzari, Alessio Tonioni, Yongqin Xian, Achin Kulshrestha, Federico Tombari	link
97	HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos	Jinglei Zhang, Jiankang Deng, Chao Ma, Rolandos Alexandros Potamias	link
98	Layered motion fusion: Lifting motion segmentation to 3D in egocentric videos	Vadim Tschernezki, Diane Larlus, Andrea Vedaldi, Iro Laina	link
99	REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning	Jihyun Lee, Weipeng Xu, Alexander Richard, Shih-En Wei, Shunsuke Saito, Shaojie Bai, Te-Li Wang, Minhyuk Sung, Tae-Kyun Kim, Jason Saragih	link
100	EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering	Sheng Zhou, Junbin Xiao, Qingyun Li, Yicong Li, Xun Yang, Dan Guo, Meng Wang, Tat-seng Chua, Angela Yao	link
101	EgoLife: Towards Egocentric Life Assistant	Jingkang Yang, Shuai Liu, Hongming Guo, Yuhao Dong, Xiamengwei Zhang, Sicheng Zhang, Pengyun Wang, Zitang Zhou, Binzhu Xie, Ziyue Wang, Bei Ouyang, Zhengyu Lin, Marco Cominelli, Zhongang Cai, Bo Li, Yuanhan Zhang, Peiyuan Zhang, Fangzhou Hong, Joerg Widmer, Francesco Gringoli, Lei Yang, Ziwei Liu	link
102	DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Video	Lorenzo Mur-Labadia, Jose J. Guerrero, Ruben Martinez-Cantin	link
103	Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities	Michele Mazzamuto, Antonino Furnari, Yoichi Sato, Giovanni Maria Farinella	link
104	EgoLM: Multi-Modal Language Model of Egocentric Motions	Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim, Yuting Ye, Richard Newcombe, Ziwei Liu, Lingni Ma	link
105	EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision	Yiming Zhao, Taein Kwon, Paul Streli, Marc Pollefeys, Christian Holz	link
106	HD-EPIC: A Highly-Detailed Egocentric Video Dataset	Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard, Kranti Kumar Parida, Kaiting Liu, Prajwal Gatti, Siddhant Bansal, Kevin Flanagan, Jacob Chalk, Zhifan Zhu, Rhodri Guerrier, Fahd Abdelazim, Bin Zhu, Davide Moltisanti, Michael Wray, Hazel Doughty, Dima Damen	link
107	Estimating Body and Hand Motion in an Ego‑sensed World	Brent Yi, Vickie Ye, Maya Zheng, Yunqi Li, Lea Müller, Georgios Pavlakos, Yi Ma, Jitendra Malik, Angjoo Kanazawa	link
108	Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations	Jungin Park, Jiyoung Lee, Kwanghoon Sohn	link
109	GEM: A Generalizable Ego-vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control	Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Pedro M B Rezende, Yasaman Haghighi, David Brüggemann, Isinsu Katircioglu, Lin Zhang, Xiaoran Chen, Suman Saha, Marco Cannici, Elie Aljalbout, Botao Ye, Xi Wang, Aram Davtyan, Mathieu Salzmann, Davide Scaramuzza, Marc Pollefeys, Paolo Favaro, Alex Alahi	link
110	Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning	Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman	link
111	Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos	Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah, Reina Pradhan, Kristen Grauman	link
112	ExpertAF: Expert Actionable Feedback from Video	Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos, Kris Kitani, Kristen Grauman	link
113	FIction: 4D Future Interaction Prediction from Video	Kumar Ashutosh, Georgios Pavlakos, Kristen Grauman	link
114	Progress-Aware Video Frame Captioning	Zihui Xue, Joungbin An, Xitong Yang, Kristen Grauman	link
115	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, Gedas Bertasius, Lorenzo Torresani	link
116	VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos	Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal	link
117	VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation	Hanzhi Chen, Boyang Sun, Anran Zhang, Marc Pollefeys, Stefan Leutenegger	link

Second Joint Egocentric Vision Workshop (original) (raw)

Overview

Challenges

HoloAssist Challenges

Ego4D Challenges

Ego-Exo4D Challenges

EPIC-Kitchens Challenges

HD-EPIC Challenge

Call for Abstracts

Format

Important Dates

Program

Papers

Extended Abstracts

Invited CVPR Papers

Invited Speakers

Workshop Organisers

Co-organizing Advisors

Related Past Events