| Xinlong Wang (original) (raw)

About Me

I am a technical lead at Beijing Academy of Artificial Intelligence (BAAI), leading and founding the vision and multimodal research center (a.k.a. BAAI Vision). I received my PhD degree from The University of Adelaide, supervised by Prof. Chunhua Shen. Before that I obtained my Bachelor's degree from Tongji University.

My research interests lie in the area of computer vision and foundation models. I work on visual perception (SOLO, SOLOv2), visual representation (DenseCL, EVA), visual generalist (Painter, SegGPT), multimodal representation (EVA-CLIP, Uni3D) and multimodal generalist (Emu, Emu2, Emu3).


Contact

We are always looking for full-time researchers, engineers and interns at BAAI, feel free to shoot an email if interested!

Email: xinlong.wang96@gmail.com


News

[Apr.2025] See3D for scalable 3D generation is accepted by CVPR 2025 as Highlight. JudgeLM for scalable LLM Judges is accepted by ICLR 2025 as Spotlight.
[Sept.2024] We have released Emu3, new state-of-the-art multimodal models trained solely with next-token prediction.
[Sept.2024] EVE for encoder-free VLMs is accepted by NeurIPS 2024 as Spotlight.
[Feb.2024] Emu2 and CapsFusion are accepted by CVPR 2024.
[Feb.2024] We have released EVA-CLIP-18B, the largest and most powerful open-source CLIP model to date, with 18-billion parameters.
[Jan.2024] Emu and Uni3D are accepted by ICLR 2024. Uni3D is selected for Spotlight presentation.
[Dec.2023] We have released Emu2, open and largest generative multimodal models that achieve new state of the art on multimodal understanding and generation tasks.
[Sept.2023] EVA is one of the most influential papers (7th/2359) in CVPR 2023.
[Jul.2023] SegGPT is accepted by ICCV 2023.
[Jul.2023] We have released Emu, a multimodal generalist that can seamlessly generate images and texts in multimodal context.
[Feb.2023] Painter and EVA are accepted by CVPR 2023.
[Dec.2022] We have released Painter, a generalist model using "image" as the general-purpose interface.
[Nov.2022] We have released EVA, the best 1B Vision Foundation Model to date. All the code and models are available.


[](#Recent Publications "Recent Publications")Selected Publications





















Professional Activities