Linli Yao (original) (raw)

I'm a PhD student at the Language Computing and Machine Learning Group (Lanco), MOE Key Laboratory of Computational Linguistics, School of Computer Science, Peking University. I am supervised by Prof. Xu Sun.

I received my Master's degree and Bachelor's degree from Renmin University of China (RUC) in 2023 and 2020 respectively, advised by Prof. Qin Jin who directs the AI·M3 Lab.

Research Interests

Multi-modal Understanding and Generation
Vision and Language
Large Multi-modal Models

Publications (Full List)

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

(* indicates equal contribution)

Preprint, arxiv:2504.17343, 2025.

Generative Frame Sampler for Long Video Understanding

Linli Yao, Haoning Wu, Kun Ouyang, Yuanxing Zhang, Caiming Xiong, Bei Chen, Xu Sun, Junnan Li

Preprint, arxiv:2503.09146, 2025.

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

Linli Yao, Lei Li, Shuhuai Ren, Lean Wang, Yuanxin Liu, Xu Sun, Lu Hou

Preprint, arxiv:2405.20985, 2024.

Temporal Reasoning Transfer from Text to Video

Lei Li*, Yuanxin Liu*, Linli Yao, Peiyuan Zhang, Chenxin An, Lean Wang, Xu Sun, Lingpeng Kong, Qi Liu

ICLR 2025.

Edit As You Wish: Video Caption Editing with Multi-grained User Control

Linli Yao, Yuanmeng Zhang, Ziheng Wang, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Xu Sun, Qin Jin

ACM MM 2024.

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Shuhuai Ren*, Linli Yao*, Shicheng Li, Xu Sun, Lu Hou

(* indicates equal contribution)

CVPR 2024.

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

Yuting Mei, Linli Yao, Qin Jin

ICMR 2024.

[Paper] [Code]

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?

Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun

NAACL 2024.

[Paper] [Code]

CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge

Linli Yao, Weijing Chen, Qin Jin

The Web Conference (WWW) 2023.

Rethinking Benchmarks for Cross-modal Image-text Retrieval

Weijing Chen, Linli Yao, Qin Jin

SIGIR 2023, long paper.

Image Difference Captioning with Pre-training and Contrastive Learning

Linli Yao, Weiying Wang, Qin Jin

AAAI 2022 .

Education

2023.09 - Present	PhD Student	School of Computer Science, Peking University
2020.09 - 2023.06	Master	School of Information, Renmin University of China
2016.09 - 2020.06	Bachelor	School of Information, Renmin University of China

Experience

2022.10 - 2023.07

Research Intern

Alimama CV&NLP Group @ Alibaba, Advised by Tiezheng Ge.

2022.04 - 2022.10

Organizer / Workshop Chair

Person in Context (PIC) Workshop @ ACM MM 2022

The MTVG and MDVC tasks attracted participation from 40 teams worldwide, including prestigious institutions such as Tsinghua University, Peking University, and the University of Hong Kong. It also included industry teams like Tencent, JD.com, Xiaomi, and Bilibili.

Awards

2022	National Scholarship	Ministry of Education of China
2023 & 2020	Outstanding Graduate	Renmin University of China
2022 & 2021	1st Class Grade Scholarship	Renmin University of China
2021 & 2018	Merit Student	Renmin University of China
2019	1st Prize of China Undergraduate Mathematical Contest in Modeling (Beijing)	Beijing
2018	Meritorious Winner of American Mathematical Contest In Modeling	U.S.

Academic Service

Reviewer: AAAI 2023/2024, CVPR 2024/2025, NeurIPS 2024, ACM MM 2024/2025, Transactions on Image Processing.
Teaching assistant: Spoken Language Processing (RUC, 2020), Multimedia Application Technology (RUC, 2020), Academic Criterion and Writing (RUC, 2022), Human Language and Artificial Intelligence (PKU, 2024).