Songyou Peng - Homepage (original) (raw)

News 06/2026 Real-3DQA received the Best Paper Runner-up Award at the 3D-LLM/VLA Workshop at CVPR 2026! 04/2026 Vision Banana is out! Check out our tech report and website. You can also find the slides for my talk. 03/2026 My PhD thesis received the 3DV Outstanding Dissertation Award Honorable Mention! You can check out my award talk on My 10-Year Journey in 3D Vision (recording). 02/2026 Selfi (Oral) and Sensor2Sensor are accepted to CVPR 2026! 01/2026 Real-3DQA and UFO-4D are accepted to ICLR 2026! 09/2025 LODGE is accepted to NeurIPS 2025 as a spotlight! 08/2025 I will serve as an Area Chair at ICLR 2026 and CVPR 2026. 07/2025 Three papers (Visual Chronicles, CL-Splats, and SplatTalk) are accepted to ICCV 2025! 05/2025 Invited to give talks on A "Splatacular" Year of 3D Reconstruction at Stanford University and KAIST (as a guest lecture). ---- show more ----
Research (Selected |Full List)
Your browser does not support the video tag. Vision Banana: Image Generators are Generalist Vision Learners Project Co-Lead Google DeepMind tech report |website
Gemini 3 & Gemini 2.5 Core Contributor Google DeepMind tech report | website
Your browser does not support the video tag. Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment Youming Deng,Songyou Peng,Junyi Zhang,Kathryn Heal,Tiancheng Sun,John Flynn,Steve Marschner,Lucy Chai Conference on Computer Vision and Pattern Recognition (CVPR), 2026 (Oral) paper |project page Teach a 3D foundation model to improve itself. No ground truth needed.
Your browser does not support the video tag. Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving Jiahao Wang,Bo Sun,Yijing Bai,Vincent Casser,Songyou Peng,Zehao Zhu,Meng-Li Shih,Xander Masotto,Shih-Yang Su,Kanaad V Parvate,Tiancheng Ge,Linn Bieske,Dragomir Anguelov,Mingxing Tan,Chiyu "Max" Jiang Conference on Computer Vision and Pattern Recognition (CVPR), 2026 paper A prototype for the Waymo World Model, translating in-the-wild monocular videos into high-fidelity multi-modal sensor logs.
Do 3D Large Language Models Really Understand 3D Spatial Relationships? Xianzheng Ma*,Tao Sun*,Shuai Chen,Yash Bhalgat,Jindong Gu,Angel X Chang,Iro Armeni,Iro Laina,Songyou Peng†,Victor Adrian PrisacariuInternational Conference on Learning Representations (ICLR), 2026 Best Paper Runner-up Award at the 3D-LLM/VLA Workshop at CVPR 2026 (* equal contribution, † equal supervision) paper |project page code data Your 3D-LLM isn't understanding 3D. It might be just guessing without seeing.
UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images Junhwa Hur,Charles Herrmann,Songyou Peng,Philipp Henzler,Zeyu Ma,Todd Zickler,Deqing Sun International Conference on Learning Representations (ICLR), 2026 paper |project page code Feedforward 4D reconstruction from just two unposed images.
Your browser does not support the video tag. LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering Jonas Kulhanek,Marie-Julie Rakotosaona,Fabian Manhardt,Christina Tsalicoglou,Michael Niemeyer,Torsten Sattler,Songyou Peng,Federico Tombari Conference on Neural Information Processing Systems (NeurIPS), 2025 (Spotlight, top 3%) paper |project page City-scale 3DGS in real-time with an iPhone.
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images Boyang Deng,Songyou Peng*,Kyle Genova*,Gordon Wetzstein,Noah Snavely,Leonidas Guibas,Thomas Funkhouser International Conference on Computer Vision (ICCV), 2025 (Highlight, top 2.3%) paper |project page We help you find "unusual" things and trends in NYC and SF, like 200+ abstract sculptures, see left for an example.
Your browser does not support the video tag. CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization Jan Ackermann, Jonas Kulhanek, Shengqu Cai,Haofei Xu,Marc Pollefeys,Gordon Wetzstein,Leonidas Guibas,Songyou Peng International Conference on Computer Vision (ICCV), 2025 paper |project page code We give you great 3DGS even after you add, delete, change stuff in your room.
SplatTalk: 3D VQA with Gaussian Splatting Anh Thai,Songyou Peng,Kyle Genova,Leonidas Guibas,Thomas Funkhouser International Conference on Computer Vision (ICCV), 2025 paper |project page 3D language Gaussian field benefits 3D VQA tasks.
Your browser does not support the video tag. DepthSplat: Connecting Gaussian Splatting and Depth Haofei Xu,Songyou Peng,Fangjinhua Wang,Hermann Blum,Daniel Barath,Andreas Geiger,Marc Pollefeys Conference on Computer Vision and Pattern Recognition (CVPR), 2025 paper |project page code Depths helps 3DGS, 3DGS helps depth prediction.
Your browser does not support the video tag. Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation Haotong Lin,Sida Peng,Jingxiao Chen,Songyou Peng Jiaming Sun,Minghuan Liu,Hujun Bao,Jiashi Feng,Xiaowei Zhou,Bingyi Kang Conference on Computer Vision and Pattern Recognition (CVPR), 2025 paper |project page code 4K accurate metric depth estimation from low-res LiDAR.
Your browser does not support the video tag. Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views Chong Bao,Zehao Yu,Jiale Shi,Guofeng Zhang,Songyou Peng,Zhaopeng Cui Conference on Computer Vision and Pattern Recognition (CVPR), 2025 paper |project page video code Video models enable unbounded 360° scene reconstruction from 3-4 unposed views.
Your browser does not support the video tag. WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments Jianhao Zheng*,Zihan Zhu,Valentin Bieri,Marc Pollefeys,Songyou Peng,Iro Armeni Conference on Computer Vision and Pattern Recognition (CVPR), 2025 paper |project page code Robust SLAM for dynamic scenes in the wild.
Your browser does not support the video tag. No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images Botao Ye,Sifei Liu,Haofei Xu,Xueting Li,Marc Pollefeys,Ming-Hsuan Yang,Songyou Peng International Conference on Learning Representations (ICLR), 2025 (Oral, top 1.8%) paper |project page code Unposed 3DGS made easy, also enables SoTA relative pose estimation performance!
Your browser does not support the video tag. WildGaussians: 3D Gaussian Splatting in the Wild Jonas Kulhanek,Songyou Peng,Zuzana Kukelova,Marc Pollefeys,Torsten Sattler Conference on Neural Information Processing Systems (NeurIPS), 2024 paper |project page code Boost 3DGS for in-the-wild scenes with appearance and dynamic changes.
Renovating Names in Open-Vocabulary Segmentation Benchmarks Haiwen Huang,Songyou Peng,Dan Zhang,Andreas Geiger Conference on Neural Information Processing Systems (NeurIPS), 2024 paper |project page code Wanna enhance your segmentation model or benchmark? Renovate names now!
Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels Rui Huang,Songyou Peng,Ayça Takmaz, Federico Tombari,Marc Pollefeys,Shiji Song,Gao Huang,Francis Engelmann European Conference on Computer Vision (ECCV), 2024 paper |project page code demo A self-supervised segmentation approach that outperforms fully-supervised methods.
Your browser does not support the video tag. Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization Weiyang Liu*, Zeju Qiu*, Yao Feng**, Yuliang Xiu**, Yuxuan Xue**, Longhui Yu**,Haiwen Feng, Zhen Liu, Juyeon Heo,Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, Bernhard Schölkopf (*/** equal contribution) International Conference on Learning Representations (ICLR), 2024 paper |project page code BOFT (Orthogonal Butterfly) is a general finetuning technique that adapts foundation models to different tasks such as Vision, NLP, Math QA, and Controllable Generation.
Your browser does not support the video tag. FastHuman: Reconstructing High-Quality Clothed Human in Minutes Lixiang Lin, Songyou Peng, Qijun Gan, Jianke Zhu International Conference on 3D Vision (3DV), 2024 (Spotlight, top 8.2%) paper |project page code Shape As Points (SAP) for fast human body reconstruction.
Neural Scene Representations for 3D Reconstruction and Scene Understanding Songyou Peng PhD Thesis, 2023 ECVA PhD Award, 2024 3DV Outstanding Dissertation Award Honorable Mention, 2026 thesis |slides PhD supervisors: Prof. Marc Pollefeys (ETH Zurich), Prof. Andreas Geiger (MPI-IS) External committee: Prof. Leonidas J. Guibas (Stanford), Prof. Vincent Sitzmann (MIT)
Your browser does not support the video tag. OpenScene: 3D Scene Understanding with Open Vocabularies Songyou Peng, Kyle Genova, Chiyu "Max" Jiang, Andrea Tagliasacchi, Marc Pollefeys,Thomas Funkhouser Conference on Computer Vision and Pattern Recognition (CVPR), 2023 paper |project page video code Zero-shot approach for novel 3D scene understanding tasks with open-vocabulary queries.
Your browser does not support the video tag. : A Unified Framework for Surface Reconstruction Zehao Yu, Anpei Chen, Bozidar Antic, Songyou Peng, Apratim Bhattacharyya, Michael Niemeyer, Siyu Tang, Torsten Sattler, Andreas Geiger Open Source Project, 2023 project page |code We provide a unified framework and benchmark for neural implicit surface reconstruction.
PersEmoN: A Deep Network for Joint Analysis of Apparent Personality, Emotion and Their Relationship Le Zhang, Songyou Peng, Stefan Winkler IEEE Transactions on Affective Computing (TAFFC), 2019. In press. paper |code A journal extension of our ACM MM 2018 paper.
Give Me One Portrait Image, I Will Tell You Your Emotion and Personality Songyou Peng, Le Zhang, Stefan Winkler, Marianne Winslett ACM International Conference on Multimedia (ACM MM), 2018 paper |slides code Technical Demo. A deep Siamese-like network is introduced to predict one's Big-Five personality and arousal-valence emotion from one portrait photo.
Depth Super-Resolution Meets Uncalibrated Photometric Stereo Songyou Peng, Bjoern Haefner, Yvain Queau, Daniel Cremers International Conference on Computer Vision (ICCV) Workshops, 2017 paper |slides code & data A novel depth super-resolution approach for RGB-D sensors is presented. This paper a part of my master thesis, and subsumed by our TPAMI paper.
High Quality Shape from a RGB-D Camera using Photometric Stereo Songyou Peng M.Sc. Thesis, Techinical University of Munich Supervisor: Yvain Queau and Daniel Cremers thesis |bibtex poster

Mentored Students and Interns

I am fortunate to (co-)mentor some talented and highly motivated students and interns. I have learnt from and gotten inspired by them:

Invited Talks

Vision Banana: Image Generators are Generalist Vision Learners Invited talk at the 3DSUN Workshop at CVPR, 2026 slides
My 10-Year Journey in 3D Vision and Finding Our Poses in the Current World International Conference on 3D Vision (3DV), 2026 PhD Outstanding Dissertation Award Talk recording | slides
Building Visual Intelligence Meta, 2025 Amazon Frontier AI & Robotics (FAR), 2025 slides
A "Splatacular" Year of 3D Reconstruction Stanford University, hosted by Iro Armeni, 2025 KAIST, hosted by Minhyuk Sung, 2025 (Guest Lecture) recording | slides
2D Magic in a 3D World Imperial College London, hosted by Andrew Davison, 2024 Czech Technical University (CTU), hosted by Torsten Sattler, 2024 The University of Hong Kong (HKU), hosted by Kai Han, 2024 slides
Dive into Neural Explicit-Implicit 3D Representations and Their Applications Symposium of Geometry Processing (SGP) Graduate School, 2023 (Invited Lecture) slides
Learning to Reconstruct and Understand the 3D World Microsoft Mixed Reality & AI Labs - Zurich, 2023 slides
Learning Neural Scene Representations for 3D Reconstruction and Understanding Shanghai AI Lab, 2023 slides
OpenScene: 3D Scene Understanding with Open Vocabularies Peking University, hosted by Baoquan Chen, 2023 Apple, 2023 Stability.ai, 2023 slides
How do NeRF and CLIP advance 3D Scene Reconstruction and Understanding Chinese University of Hong Kong (CUHK) Shenzhen, 2023 Bosch Center for Artificial Intelligence (BCAI), 2023 slides
Large-Scale 3D Scene Reconstruction with NeRF Stanford University, hosted by Gordon Wetzstein, 2022 slides
Towards Practical Applications of NeRF Adobe Research, hosted by Zexiang Xu, 2022 slides
Neural Scene Representations for 3D Reconstruction University of Basel, 2022 slides
Shape As Points: A Differentiable Poisson Solver Talking Papers Podcast, 2022 video |podcast
Shape As Points: A Differentiable Poisson Solver Graphics And Mixed Environment Seminar (GAMES), 2021 slides |talk (in Chinese)
Towards Practical Applications of NeRF Graphics And Mixed Environment Seminar (GAMES), 2021 slides |talk (in Chinese)

Teaching

Teaching Assistant (Lead), 3D Vision, Spring 2023 Teaching Assistant, Computer Vision, Fall 2022 Teaching Assistant (Lead), 3D Vision, Spring 2022 Teaching Assistant, Deep Learning for Computer Vision: Seminal Work, Spring 2022 Teaching Assistant, 3D Vision, Spring 2020 Teaching Assistant, Deep Learning for Computer Vision: Seminal Work, Spring 2020

Academic Services