3D Scene Understanding at CVPR 2020 (original) (raw)

3D Scene Understanding for Vision, Graphics, and Robotics

CVPR 2020 Workshop, Virtual, June 15th, 2020

Watch the recorded video workshop from Youtube

News

Due to the pandemic, our workshop will be virtual this year. We will host an online chat room for communication with the speakers and Q&A. Looking forward to meet you online!.

Invited talks and oral presentations will be presented live or by recorded videos in the same Zoom room, all of the talks will have live Q&A session, please refer to the Talks for recorded videos and more details.

All the events are hosted in the Zoom, click the raise hand button if you have questions during the talk. The speaker would either pause to answer your questions or leave them to the Q&A part.

Invited Speakers


Kristen Grauman (UT Austin)	Sergey Levine (UC Berkeley)	Andreas Geiger (University of Tübingen)

Yasutaka Furukawa (Simon Fraser University)	Daniel Ritchie (Brown University)	Jeannette Bohg (Stanford University)

Shuran Song (Columbia University)	Andrea Tagliasacchi (Google Brain)	Katerina Fragkiadaki (Carnegie Mellon University)

Opening Remark

Schedule

9:00 am - 9:15 am: Opening Remark: David Forsyth - 3D Vision: Subjunctives
9:15am - 9:45am: Invited talk: Andreas Geiger - Learning 3D Reconstruction in Function Space
9:45am - 10:15am: Invited talk: Jeannette Bohg - The Importance of Depth Data for Decision-making in Robot Manipulation
10:15am - 10:45am: Oral Presentation 1
10:45am - 11:15am: Invited talk: Shuran Song - Learning Visual Representations for Generalizable Manipulation
11:15am - 11:45am: Invited talk: Yasutaka Furukawa - CVPR is a Contemporary Art Exhibition
11:45am - 12:15pm: Oral Presentation 2
12:15pm - 2:00pm: Lunch Break
2:00pm - 2:30pm: Invited talk: Andrea Tagliasacchi - Structured Representations for 3D Computer Vision
2:30pm - 3:00pm: Invited talk: Katerina Fragkiadaki - 3-Dimensional Neural Scene Representations for Perception and Control
3:00pm - 3:30pm: Oral Presentation 3
3:30pm - 4:00pm: Invited talk: Sergey Levine - Embodied Implicit Scene Understanding
4:00pm - 4:30pm: Invited talk: Kristen Grauman - People Watching for Agent Learning
4:30pm - 5:00pm: Invited talk: Daniel Ritchie - Toward Synthesizing Training Data for 3D Scene Understanding

Oral Presentation

SAPIEN: A SimulAted Part-based Interactive ENvironment
SuperGlue:Learning Feature Matching with Graph Neural Networks
Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera
Video Inference for Human Body Pose and Shape Estimation
Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image
3D Dynamic Scene Graphs Actionable Spatial Perception with Places, Objects, and Humans
Interactive Gibson Environment:a Simulator for Embodied Visual Agents
Neural Topological SLAM for Visual Navigation
Local Deep Implicit Functions for 3D Shapes

Overview

The goal of this workshop is to foster interdisciplinary communication of researchers working on 3D scene understanding (computer vision, computer graphics, and robotics) so that more attention of the broader community can be drawn to this field. Through this workshop, current progress and future directions will be discussed, and new ideas and discoveries in related fields are expected to emerge.

Specifically, we are interested in the following problems:

Datasets: What is a desired yet manageable breadth for a dataset to serve various tasks at the same time and provide ample opportunities to combine problems?
Representations: What are representations most suitable for a particular task like reconstruction, physical reasoning, etc.? Can a single representation serve all purposes of 3D scene understanding?
Reconstruction: How to build efficient models which parse and reconstruct the observation from different data modalities (RGB, RGBD, Physical Sensor)?
Reasoning: How to formulate reasoning about affordances and physical properties? How to encode, represent and learn common sense?
Interaction: How to model and learn the physical interaction with objects within the scene?
Bridge of the three fields: How to facilitate research to connect among vision, graphics, and robotics via 3D scene understanding?

Organizers


Siyuan Huang* (UCLA)	Chuhang Zou* (UIUC)	Hao Su (UCSD)	Alexander Schwing (UIUC)

Shuran Song (Columbia)	Jiajun Wu (Stanford)	Siyuan Qi (UCLA)	Yixin Zhu (UCLA)