4D Vision Workshop (original) (raw)

Overview

We live in a dynamic 3D world. Enabling machines to see and understand the world in 4D (3D + time) unlocks numerous real-world applications across various domains. These include designing autonomous robots capable of navigating and interacting with complex real-world environments, creating immersive and interactive virtual worlds that seamlessly blend with physical reality, and unveiling the mysteries of life and the universe, for instance, in biology and astrophysics. Computer vision lies at the core of these innovations, offering the tools to capture and model the 4D world from partially-observed image data.

In recent years, we have seen remarkable progress in 3D computer vision, with increasingly robust and efficient models for reconstructing and generating 3D objects and scenes. 4D computer vision, as a natural extension of these efforts, is rapidly gaining traction. This workshop aims to establish a dedicated venue for discussions on this topic, bringing together researchers across various domains to exchange perspectives, identify challenges, and collectively accelerate progress in this space.

Schedule

09:20 - 09:30 Opening Remarks
09:30 - 10:00 Adam Harley: "4D Vision Tomorrow: Structured, Slow, and Data-Driven"
10:00 - 10:30 Tali Dekel (remote): "Generative AI Beyond What It Is Meant To Do"
10:30 - 12:30 Poster Session (ExHall D, poster board ID #106-141)
12:30 - 13:30 Lunch Break
13:30 - 14:00 Andrea Vedaldi: "Feed-forward 4D: from scene to categories" New models like VGGT achieve excellent 3D reconstruction results using only neural network components in a feed-forward manner, entirely eschewing optimization. As such, they are credible building blocks for the future foundations of computer vision. However, most 3D reconstruction methods remain limited to static data, even though reconstructing dynamic scenes in 3D is, generally speaking, much more useful. In this talk, I will present Dynamic Point Maps, a principled dynamic extension of the point maps popularized by DuST3R, which encode both 3D geometry and motion, restoring certain invariants. I will also address the problem of data scarcity in 4D vision and demonstrate, through Geo4D, how one can build high-quality 4D reconstruction networks starting from video generators pre-trained on millions of videos. Finally, I will explore the challenge of modeling 4D object categories rather than entire scenes. To this end, I will introduce Dual Point Maps as an alternative to traditional deformable models for capturing the 3D shape and motion of dynamic objects.
14:00 - 14:30 Angjoo Kanazawa
14:30 - 15:00 Daniel Cremers
15:00 - 15:30 Coffee Break
15:30 - 16:00 Deva Ramanan
16:00 - 16:30 David Fouhey: "Measuring Scientific Data with 3D/4D Vision" As vision has changed from a discipline with potential to one with impact, one great new opportunity is empowering researchers in other areas of inquiry with better data. Over the past five years, I've been doing so with solar physics and evolutionary ecology, transferring what I've learned from 3D vision. Although solar physics and ecology study objects of radically different scales, they are unified by a need for data that is higher volume, higher quality, and is easier to obtain. In this talk, I'll show efforts to this end done in collaboration with domain experts. In solar physics, our work includes systems to estimate the Sun's 3D vector magnetic field from multiple signals, which inform our understanding of a key driver of space weather. In evolutionary ecology, our collaborations have built one of the largest repositories of bird morphology, which has helped understand drivers of evolution. Throughout, I'll talk about some of our applications and lessons I've learned.
16:30 - 17:00 Kristen Grauman: "4D Human Activity Understanding"
17:00 - 17:05 Closing Remarks

Invited Speakers

Accepted Papers

Call for Papers

We welcome submissions of techical papers on dynamic 3D (4D) modeling and its wide-ranging applications across various fields including computer vision, embodied AI, robotics, sciences, and beyond. We encourage both short papers (approximately 4 pages) presenting early-stage ideas and full-length technical papers (up to 8 pages excluding references). Additional content can be submitted as supplementary material. Submissions should follow the CVPR 2025 format. The review process is double-blind and does not involve a rebuttal phase.

Accepted papers will not be published in the proceedings. Thus, papers that have already been published at major conferences are also welcome. Authors may also submit their work to future conferences or journals after acceptance to this workshop.

If you have any questions, please contact us at 4dvisionworkshop@googlegroups.com.

Topics of Interest

Reviewer Nomination

We are looking for reviewers with expertise in 4D vision and related fields. If you are interested in reviewing for the workshop, please fill out this form.

Reviewers

Aviral Chharia Bardienus Pieter Duisterhof Brian Nlong Zhao Chen Geng
Chuhao Chen Guangzhao He Hirokatsu Kataoka Hong-Xing Yu
Ishan Khatri Jeff Tan Jenny Seidenschwarz Jiaman Li
Jianyuan Wang Khiem Vuong Koshi Makihara Linzhan Mou
Minghao Chen Paul Engstler Qi Sun Rongqi Fan
Shenhan Qian Shizun Wang Sholder Lyko Sizhe Wei
Suyash Damle Ting-Hsuan Liao Tushar Shinde Varun Kumar Reddy Bankiti
Wanyue Zhang Weijia Zeng Yash Sanjay Bhalgat Yufu Wang
Yunzhou Song Yunzhi Zhang Zeren Jiang Zhengfei Kuang
Zhening Huang Zhiyang Dou Zirui Wang

Organizers