Ziyi Wang (original) (raw)

Ziyi Wang I am a fifth year PhD student in the Department of Automation at Tsinghua University, advised by Prof. Jiwen Lu . In 2020, I obtained my B.Eng. in the Department of Electronic Engineering, Tsinghua University. I also obtained B.Admin. as dual degree in the School of Ecnomics and Management, Tsinghua University. I am broadly interested in computer vision and deep learning. My current research focuses on 3D vision, 3D generation and 4D world model. Email / Google Scholar / Github profile photo

News

Publications

* indicates equal contribution

dise Vision Generalist Model: A Survey Ziyi Wang,Yongming Rao, Shuofeng Sun, Xinrun Liu,Yi Wei,Xumin Yu,Zuyan Liu,Yanbo Wang, Hongmin Liu, Jie Zhou , Jiwen Lu International Journal of Computer Vision (IJCV), 2025 [arXiv] We conduct a comprehensive survey on vision generalist models that support multimodal inputs and can handle various downstream tasks.
dise OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View Yanbo Wang*,Ziyi Wang*,Wenzhao Zheng, Jie Zhou , Jiwen Lu Preprint. [arXiv] [Code] [Project Page] OGGSplat is designed to expand the field-of-view of the Gaussian-based 3D scene reconstructed from sparse views and feedforward / generalizable models.
dise UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting Ziyi Wang*,Yanran Zhang*, Jie Zhou , Jiwen Lu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [arXiv] [Code] UniPre3D is a unified pre-training method that can be applied to both object-level and scene-level point clouds. It is supported by cross-modal Gaussian splatting technique.
dise XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation Ziyi Wang*,Yanbo Wang*,Xumin Yu, Jie Zhou , Jiwen Lu Conference on Neural Information Processing Systems (NeurIPS), 2024 [arXiv] [Code] XMask3D is a framework that propose mask-level reasoning techniques to empower 3D segmentation model with open vocabulary capacity under the assistance of the pre-trained 2D mask generator.
dise Point-to-Pixel Prompting for Point Cloud Analysis With Pre-Trained Image Models Ziyi Wang,Yongming Rao,Xumin Yu, Jie Zhou , Jiwen Lu IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024 [IEEE] [Code] [Project Page] P2P++ is the extended journal version of P2P. We further propose Pixel-to-Point Distillation to make P2P applicable in scene-level perception tasks.
dise 3D Point-Voxel Correlation Fields for Scene Flow Estimation Ziyi Wang*,Yi Wei*,Yongming Rao, Jie Zhou , Jiwen Lu IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023 [IEEE] [Code] [Project Page] DPV-RAFT is the extended journal version of PV-RAFT. We further propose Spatial Deformation and Temporal Deformation to enhance PV-RAFT.
dise Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models Ziyi Wang*,Xumin Yu*,Yongming Rao, Jie Zhou , Jiwen Lu IEEE International Conference on Computer Vision (ICCV), 2023 [arXiv] [Code] [Project Page] TAP is a 3D-to-2D generative pre-training method that generate projected images of point clouds from instructed perspectives.
dise P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting Ziyi Wang*,Xumin Yu*,Yongming Rao*, Jie Zhou , Jiwen Lu Conference on Neural Information Processing Systems (NeurIPS), 2022 Spotlight [arXiv] [Code] [Project Page] [中文解读] P2P is a framework to leverage large-scale pre-trained image models for 3D point cloud analysis.
dise SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation Ziyi Wang,Yongming Rao,Xumin Yu, Jie Zhou , Jiwen Lu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022 [arXiv] [Code] We present Semantic-Affine Transformation that transforms decoder mid-level features of the encoder-decoder segmentation network with class-specific affine parameters.
dise PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers Xumin Yu*, Yongming Rao*,Ziyi Wang, Zuyan Liu, Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV), 2021 Oral Presentation [arXiv] [Code] [中文解读] PoinTr is a transformer-based framework that reformulates point cloud completion as a set-to-set translation problem.
dise Towards Interpretable Deep Metric Learning with Structural Matching Wenliang Zhao*, Yongming Rao*,Zyi Wang, Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV), 2021 [arXiv] [Code] We present a deep interpretable metric learning (DIML) that adopts a structural matching strategy to explicitly aligns the spatial embeddings by computing an optimal matching flow between feature maps of the two images.
dise PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds Yi Wei *, Ziyi Wang*, Yongming Rao*, Jiwen Lu , Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 [arXiv] [Code] We present point-voxel correlation fields for 3D scene flow estimation which migrates the high performance of RAFT and provides a solution to build structured all-pairs correlation fields for unstructured point clouds.

Teaching

Honors and Awards