Open☀️3D (original) (raw)
☀️ OpenSUN 3D 🌍
1st Workshop on Open-Vocabulary 3D Scene Understanding
in conjunction with ICCV 2023, Paris, France.
Room E06 - Oct. 3rd Tuesday Afternoon
Motivation 💡
The ability to perceive, understand and interact with arbitrary 3D environments is a long-standing goal in both academia and industry with applications in AR/VR as well as robotics. Current 3D scene understanding models are largely limited to recognizing a closed set of pre-defined object classes. Recently, large visual-language models, such as CLIP, have demonstrated impressive capabilities trained solely on internet-scale image-language pairs. Some initial works have shown that these models have the potential to extend 3D scene understanding not only to open set recognition, but also offer additional applications such as affordances, materials, activities, and properties of unseen environments. The goal of this workshop is to bundle these initial siloed efforts and to discuss and establish clear task definitions, evaluation metrics, and benchmark datasets.
Schedule ⏰
| 13:20 - 13:30 | Welcome & Introduction |
|---|---|
| 13:30 - 14:00 | Keynote: Jen Jen Chung |
| 14:00 - 14:30 | Keynote: Vishal Patel |
| 14:30 - 14:45 | Oral Sessions / Challenge Winners |
| 14:45 - 15:15 | Keynote: Thomas Funkhouser |
| 15:15 - 16:00 | Poster Session & Coffee Break |
| 16:00 - 16:30 | Keynote: Angela Dai |
| 16:30 - 17:00 | Keynote: Manolis Savva |
| 17:00 - 17:30 | Panel Discussion |
Invited Speakers 🧑🏫
Professor Vishal Patel Johns Hopkins University Vishal M. Patel is an associate professor of electrical and computer engineering and a member of the Vision and Image Understanding Lab. His research interests are focused on computer vision, machine learning, image processing, medical image analysis, and biometrics. Patel is an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence journal and chairs the conference subcommittee of IAPR Technical Committee on Biometrics (TC4). He has received a number of awards including the 2021 IEEE Signal Processing Society (SPS) Pierre-Simon Laplace Early Career Technical Achievement Award, the 2021 NSF CAREER Award, the 2021 IAPR Young Biometrics Investigator Award (YBIA), the 2016 ONR Young Investigator Award, and the 2016 Jimmy Lin Award for Invention.
Professor Angela Dai Technical University of Munich Angela Dai is an assistant professor at the Technical University of Munich (TUM) where she leads the 3D AI Lab. Her research focuses on understanding how the 3D world around us can be modeled and semantically understood. Prof. Dai is the creator of the seminal ScanNet benchmark that sparked the development of numerous 3D scene understanding works.
Professor Manolis Savva Simon Fraser University Manolis Savva is an assistant professor in the School of Computing Science at Simon Fraser University, and a Canada Research Chair in Computer Graphics. His research focuses on analysis, organization and generation of 3D content. The methods that he works on are stepping stones towards holistic 3D scene understanding revolving around people, with applications in computer graphics, computer vision, and robotics. Prof. Savva contributed highly influential works towards embodied AI including Matterport and Habitat.
Professor Thomas Funkhouser Google / Princeton University Thomas Funkhouser is a full professor at Princeton University and a senior research scientist at Google. His research focuses on computer graphics, computer vision, and in particular 3D machine perception. In recent years, Professor Funkhouser has greatly impacted the field of 3D scene understanding.
Professor Jen Jen Chung University of Queensland Jen Jen Chung is an associate professor in Mechatronics within the School of Information Technology and Electrical Engineering at the University of Queensland. Her current research interests include perception, planning and learning for robotic mobile manipulation, algorithms for robot navigation through human crowds, informative path planning and adaptive sampling.
Important Dates 🗓️
- Paper Track: We accept novel full 8-page papers for publication in the proceedings, and shorter 4-page extended abstracts of either novel or previously published work that will not be included in the proceedings. All submissions shall follow the ICCV 2023 author guidelines.
- Submission Portal: CMT
- Paper Submission Deadline: July 31, 2023 (23:59 Pacific Time)
- Notification to Authors: August 9, 2023
- Camera-ready submission: August 21, 2023
- Challenge Track:
Accepted Papers 📄
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition
Deepti B. Hegde, Jeya Maria Jose Valanarasu, Vishal Patel
CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP
Junbo Zhang, Runpei Dong, Kaisheng Ma
The Change You Want to See (Now in 3D)
Ragav Sachdeva, Andrew Zisserman
Learning to Prompt CLIP for Monocular Depth Estimation: Exploring the Limits of Human Language
Dylan Auty, Krystian Mikolajczyk
SAM3D: Segment Anything in 3D Scenes
Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
Antonin Vobecky, Oriane Siméoni, David Hurych, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Shiyang Lu, Haonan Chang, Eric P. Jing, Yu Wu, Abdeslam Boularias, Kostas Bekris
Challenge Results
We have published a technical report providing an overview of our workshop challenge, results, and the methods of the winning teams!
Top-3 ranking teams from our workshop challenge are listed below:
| Rank | Team | Method | mAP (↑) | AP_50 (↑) | AP_25 (↑) |
|---|---|---|---|---|---|
| 1 | PICO-MRHongbo Tian1,2, Chunjie Wang1, Xiaosheng Yan1, Bingwen Wang1, Xuanyang Zhang1, Xiao Liu1 1PICO, ByteDance, Beijing 2Beijing University of Posts and Telecommunications | - | 6.08 | 14.08 | 17.67 |
| 2 | VinAI-3DIS Phuc Nguyen1,Khoi Nguyen1,Anh Tran1,Cuong Pham1 1VinAI Research | GitHub | 4.13 | 12.14 | 39.41 |
| 3 | CRP Zhening Huang1, Xiaoyang Wu2, Xi Chen2, Hengshuang Zhao2, Lei Zhu3, Joan Lasenby1 1University of Cambridge 2HKU 3HKUST (Guangzhou) | - | 2.67 | 5.06 | 13.98 |
Organizers
Program Comittee
- Alexander Hermans (RWTH)
- Alexey Nekrasov (RWTH)
- Ayush Jain (CMU)
- Dávid Rozenberszki (TUM)
- Francis Engelmann (ETH)
- Ji Hou (Meta)
- Jonas Schult (RWTH)
- Or Litany (NVIDIA)
- Songyou Peng (ETH)
- Yujin Chen (TUM)
<<<<<<< HEAD ======= >>>>>>> ea8bf6c040240009a1fc8aa2bb4f6c61c31c5eee