4D Vision Workshop (original) (raw)

Overview

We live in a dynamic 3D world. Enabling machines to see and understand the world in 4D (3D + time) unlocks numerous real-world applications across various domains. These include designing autonomous robots capable of navigating and interacting with complex real-world environments, creating immersive and interactive virtual worlds that seamlessly blend with physical reality, and unveiling the mysteries of life and the universe, for instance, in biology and astrophysics. Computer vision lies at the core of these innovations, offering the tools to capture and model the 4D world from partially-observed image data.

In recent years, we have seen remarkable progress in 3D computer vision, with increasingly robust and efficient models for reconstructing and generating 3D objects and scenes. 4D computer vision, as a natural extension of these efforts, is rapidly gaining traction. This workshop aims to establish a dedicated venue for discussions on this topic, bringing together researchers across various domains to exchange perspectives, identify challenges, and collectively accelerate progress in this space.

Schedule

09:20 - 09:30	Opening Remarks
09:30 - 10:00	Adam Harley: "4D Vision Tomorrow: Structured, Slow, and Data-Driven"
10:00 - 10:30	Tali Dekel (remote): "Generative AI Beyond What It Is Meant To Do"
10:30 - 12:30	Poster Session (ExHall D, poster board ID #106-141)
12:30 - 13:30	Lunch Break
13:30 - 14:00	Andrea Vedaldi: "Feed-forward 4D: from scene to categories" New models like VGGT achieve excellent 3D reconstruction results using only neural network components in a feed-forward manner, entirely eschewing optimization. As such, they are credible building blocks for the future foundations of computer vision. However, most 3D reconstruction methods remain limited to static data, even though reconstructing dynamic scenes in 3D is, generally speaking, much more useful. In this talk, I will present Dynamic Point Maps, a principled dynamic extension of the point maps popularized by DuST3R, which encode both 3D geometry and motion, restoring certain invariants. I will also address the problem of data scarcity in 4D vision and demonstrate, through Geo4D, how one can build high-quality 4D reconstruction networks starting from video generators pre-trained on millions of videos. Finally, I will explore the challenge of modeling 4D object categories rather than entire scenes. To this end, I will introduce Dual Point Maps as an alternative to traditional deformable models for capturing the 3D shape and motion of dynamic objects.
14:00 - 14:30	Angjoo Kanazawa
14:30 - 15:00	Daniel Cremers
15:00 - 15:30	Coffee Break
15:30 - 16:00	Deva Ramanan
16:00 - 16:30	David Fouhey: "Measuring Scientific Data with 3D/4D Vision" As vision has changed from a discipline with potential to one with impact, one great new opportunity is empowering researchers in other areas of inquiry with better data. Over the past five years, I've been doing so with solar physics and evolutionary ecology, transferring what I've learned from 3D vision. Although solar physics and ecology study objects of radically different scales, they are unified by a need for data that is higher volume, higher quality, and is easier to obtain. In this talk, I'll show efforts to this end done in collaboration with domain experts. In solar physics, our work includes systems to estimate the Sun's 3D vector magnetic field from multiple signals, which inform our understanding of a key driver of space weather. In evolutionary ecology, our collaborations have built one of the largest repositories of bird morphology, which has helped understand drivers of evolution. Throughout, I'll talk about some of our applications and lessons I've learned.
16:30 - 17:00	Kristen Grauman: "4D Human Activity Understanding"
17:00 - 17:05	Closing Remarks

Invited Speakers

Accepted Papers

Call for Papers

We welcome submissions of techical papers on dynamic 3D (4D) modeling and its wide-ranging applications across various fields including computer vision, embodied AI, robotics, sciences, and beyond. We encourage both short papers (approximately 4 pages) presenting early-stage ideas and full-length technical papers (up to 8 pages excluding references). Additional content can be submitted as supplementary material. Submissions should follow the CVPR 2025 format. The review process is double-blind and does not involve a rebuttal phase.

Accepted papers will not be published in the proceedings. Thus, papers that have already been published at major conferences are also welcome. Authors may also submit their work to future conferences or journals after acceptance to this workshop.

Submission Portal: OpenReview
Paper Submission Deadline: 11:59 pm, ~~March 28~~ April 4, 2025 (Pacific Time)
Notification to Authors: ~~April 25~~ April 28, 2025
Camera-ready Submission Deadline: 11:59 pm, May 21, 2025 (Pacific Time)

If you have any questions, please contact us at 4dvisionworkshop@googlegroups.com.

Topics of Interest

4D reconstruction and generation
4D predictive modeling and forecasting
Tracking
SLAM in dynamic scenes
Multi-modal 4D understanding and language models
Egocentric 4D perception
Dynamic humans and animals
4D synthetic data and simulation
4D vision for embodied agents and robots
4D datasets and new sensors
Benchmarking and evaluation for 4D tasks
......

Reviewer Nomination

We are looking for reviewers with expertise in 4D vision and related fields. If you are interested in reviewing for the workshop, please fill out this form.

Reviewers

Aviral Chharia	Bardienus Pieter Duisterhof	Brian Nlong Zhao	Chen Geng
Chuhao Chen	Guangzhao He	Hirokatsu Kataoka	Hong-Xing Yu
Ishan Khatri	Jeff Tan	Jenny Seidenschwarz	Jiaman Li
Jianyuan Wang	Khiem Vuong	Koshi Makihara	Linzhan Mou
Minghao Chen	Paul Engstler	Qi Sun	Rongqi Fan
Shenhan Qian	Shizun Wang	Sholder Lyko	Sizhe Wei
Suyash Damle	Ting-Hsuan Liao	Tushar Shinde	Varun Kumar Reddy Bankiti
Wanyue Zhang	Weijia Zeng	Yash Sanjay Bhalgat	Yufu Wang
Yunzhou Song	Yunzhi Zhang	Zeren Jiang	Zhengfei Kuang
Zhening Huang	Zhiyang Dou	Zirui Wang