Yi Wei (original) (raw)

Yi Wei I am a research engineer in Huawei, working on 3D vision and computer graphics. I obtained my Ph.D degree at the Intelligent Vision Group (IVG), Department of Automation, Tsinghua University, advised by Prof. Jiwen Lu. My research interests lie in 3D vision, especially focusing on 3D scene understanding and 3D reconstruction. I hope my research can help the industry applications. Prior to that, I received my Bachelor's degree from the department of Electronic Engineering, Tsinghua University in 2019 (Ranking 6/245). I have also spent some time at DeePhi Tech (Xilinx), Sensetime , Microsoft Research Asia, XPeng, ByteDance, PhiGent Robtics, Gaussian Robotics and Apple. We are currently recruiting doctoral and master's degree students who will graduate in 2025. If you are interested in 3D vision or computer graphics, please feel free to contact me. Email / Google Scholar / Github / Twitter / Curriculum Vitae

News

2024-07: One paper on 3D AIGC is accepted to NeurIPS 2024.
2024-07: I graduate from Tsinghua University and will join Huawei.
2024-02: One paper on 3D AIGC is accepted to CVPR 2024.
2023-07: Two papers on occupancy prediction are accepted to ICCV 2023.
2023-07: The journal version of PV-RAFT is accepted to T-PAMI.
2023-04: The journal version of NerfingMVS is accepted to T-PAMI.
2023-03: I am a recipient of the 2023 Apple Scholars in AI/ML PhD fellowship.
2022-09: One paper on self-supervised multi-camera depth estimation is accepted to CoRL 2022.
2022-07: One paper on LiDAR-based 3D object detection is accepted to ECCV 2022.
2022-06: One paper on robotic exploration is accepted to IROS 2022.
2021-07: Three papers (including 1 oral) on NeRF, depth estimation and 3D pretraining are accepted to ICCV 2021.
2021-03: One paper on 3D scene flow estimation is accepted to CVPR 2021.
2021-03: One paper on weakly supervised 3D detection is accepted to ICRA 2021.

Selected Publications

* indicates equal contribution

	GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation Chubin Zhang, Hongliang Song, Yi Wei, Yu Chen, Jiwen Lu , Yansong Tang Conference on Neural Information Processing Systems (NeurIPS), 2024 [Project page] [arXiv] [Code] We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
	OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments Chubin Zhang, Juncheng Yan, Yi Wei, Jiaxin Li, Li Liu, Yansong Tang, Yueqi Duan, Jiwen Lu arXiv*, 2023 [Project page] [arXiv] [Code] We propose an OccNeRF method for self-supervised multi-camera occupancy prediction, which adopts the parameterized occupancy fields, multi-frame photometric loss and open-vocabulary 2D segmentation.
	Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior Fangfu Liu, Diankun Wu, Yi Wei, Yongming Rao , Yueqi Duan IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 [Project page] [arXiv] [Code] We propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously.
	SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou , Jiwen Lu IEEE International Conference on Computer Vision (ICCV), 2023 [Project page] [arXiv] [Code] We propose a SurroundOcc method to predict the volumetric occupancy with multi-camera images and generate dense occupancy ground truth with sparse LiDAR points.
	OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception Xiaofeng Wang, Zheng Zhu, Wenbo Xu, Yunpeng Zhang, Yi Wei, Xu Chi, Yun Ye, Dalong Du, Jiwen Lu , Xingang Wang IEEE International Conference on Computer Vision (ICCV)*, 2023 [arXiv] [Code] Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.
	3D Point-Voxel Correlation Fields for Scene Flow Estimation Ziyi Wang, Yi Wei, Yongming Rao , Jie Zhou , Jiwen Lu IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2023 [Paper] [Code] We propose Deformable PV-RAFT, where the Spatial Deformation deforms the voxelized neighborhood, and the Temporal Deformation controls the iterative update process.
	Depth-Guided Optimization of Neural Radiance Fields for Indoor Multi-View Stereo Yi Wei, Shaohui Liu, Jie Zhou , Jiwen Lu IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2023 [Paper] [Code] Beyond NerfingMVS, we further present NerfingMVS++, where a coarse-to-fine depth priors training strategy is proposed to directly utilize sparse SfM points and the uniform sampling is replaced by Gaussian sampling to boost the performance.
	LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection Yi Wei, Zibu Wei, Yongming Rao, Jiaxin Li, Jiwen Lu , Jie Zhou European Conference on Computer Vision (ECCV), 2022 [arXiv] [Code] [中文解读] We propose the LiDAR Distillation to bridge the domain gap induced by different LiDAR beams for 3D object detection.
	SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Yongming Rao, Guan Huang, Jiwen Lu , Jie Zhou Conference on Robot Learning (CoRL), 2022 [Project page] [arXiv] [Code] [中文解读] We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict scale-aware depth maps across cameras.
	NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo Yi Wei, Shaohui Liu, Yongming Rao, Wang Zhao, Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV), 2021, Oral Presentation [Project page] [arXiv] [Code] [Video] [中文解读] We present a new multi-view depth estimation method that utilizes both conventional SfM reconstruction and learning-based priors over the recently proposed neural radiance fields (NeRF).
	A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo Wang Zhao, Shaohui Liu, Yi Wei , Hengkai Guo , Yong-jin Liu IEEE International Conference on Computer Vision (ICCV), 2021 [Project page] [arXiv] [Code] We propose a novel solver that iteratively solves for per-view depth map and normal map by optimizing an energy potential based on the locally planar assumption.
	PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds Yi Wei, Ziyi Wang, Yongming Rao , Jiwen Lu , Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2021 [arXiv] [Code] [Video] We present point-voxel correlation fields for 3D scene flow estimation which migrates the high performance of RAFT and provides a solution to build structured all-pairs correlation fields for unstructured point clouds.
	FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection Yi Wei, Shang Su, Jiwen Lu , Jie Zhou IEEE International Conference on Robotics and Automation (ICRA), 2021 [arXiv] [Code] [Video] We propose a weakly supervised 3D detection method without using 3D labels, which consists of coarse 3D segmentation and 3D bounding box estimation two stages.
	Conditional Single-view Shape Generation for Multi-view Stereo Reconstruction Yi Wei, Shaohui Liu , Wang Zhao , Jiwen Lu , Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2019 [Project] [arXiv] [Code] we present a new perspective towards image-based shape generation. Unlike most single-view methods which are sometimes insufficient to determine a single groundtruth shape because the back part is occluded, our method levergae multi-view consistency for 3D reconstruction.
	Quantization mimic: Towards very tiny cnn for object detection Yi Wei, Xinyu Pan , Hongwei Qin , Junjie Yan European Conference on Computer Vision (ECCV), 2018 [arXiv] we propose a simple and general framework for training very tiny CNNs for object detection. Our method leverages the fact that mimic and quantization can facilitate each other.
	Two-stream binocular network: Accurate near field finger detection based on binocular images Yi Wei, Guijin Wang , Cairong Zhang , Hengkai Guo , Xinghao Chen , Huazhong Yang , IEEE Visual Communications and Image Processing (VCIP), 2017 (Best Student Paper Award) [arXiv] We propose the Two-Stream Binocular Network (TSBnet) to detect fingertips from binocular images. Different with previous depth-based methods, we directly regress 3D positions of fingertip from left and right images.

	Apple AI/ML Group, Research Intern Topic: 3D AIGC
	Gaussian Robotics Gaussian-Tsinghua joint laboratory, Project leader Topic: Sensor calibration, Drivable space detection, LiDAR-based 3D object detection, Depth estimation, 3D reconstruction
	ByteDance SLAM & 3D Vision Group, Engineer&Research Intern Topic: Sky AR, Advertisement AR, Self-supervised depth estimation, Plane-assisted multi-view stereo, Multiple plane detection
	XPeng LiDAR Group, Engineer Intern Topic: LiDAR-based 3D object detection, LiDAR-based model quantization
	MSRA Intelligent Multimedia Group, Research Intern Topic: Multi-view hand pose estimation
	Sensetime Video Intelligence Group, Engineer&Research Intern Topic: Model compression
	Deephi Engineer Intern Topic: Real-time object detection

Honors and Awards

2024 Beijing Outstanding Graduate / 北京市优秀毕业生
2023 Huawei TopMinds / 华为天才少年称号
2023 Apple Scholar / 苹果学者奖学金 (22 people in the world, 2 people in China)
2023 Ubiquant Scholar / 九坤奖学金
2021 National Scholarship / 国家奖学金
2019 Beijing Outstanding Graduate / 北京市优秀毕业生
2018 Caixiong Scholarship / 清华科创类专项奖 (10 people in Tsinghua)
2018 Baogang Outstanding Scholarship / 宝钢优秀学生特等奖 (1 person in Tsinghua)
2017 National Scholarship / 国家奖学金
2017 Qualcomm Scholarship / 高通奖学金 (30 people in Tsinghua)
2017 Sensetime Scholarship / 商汤奖学金 (30 people in China)

Academic Services

Conference Reviewer / Program Committee Member: CVPR 2024, ICCV 2023, ICRA 2023, ECCV 2022, CVPR 2022, ICCV 2021, CVPR 2021, ICIP 2021, WACV 2021, ACCV 2020, CVPR 2020, ICIP 2019
Journal Reviewer: T-PAMI, T-IP, T-MM, T-CSVT