InternVL Family (original) (raw)

Sequential Diffusion Language Models

2025/08/26

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

2025/05/26

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

2025/04/11

InternVL3: Advancing Open-Source Multimodal Models with Native Multimodal Pretraining

2025/03/13

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

2024/12/20

InternVL2.5-MPO: Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

2024/12/05

InternVL2.5: Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

2024/10/25

Mini-InternVL 2.0: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

2024/10/10

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

2024/07/31

InternOmni: Extending InternVL with Audio Modality

2024/07/04

InternVL2: Better than the Best—Expanding Performance Boundaries of Open-Source Multimodal Models with the Progressive Scaling Strategy

2024/05/31

ShareGPT-4o: Comprehensive Multimodal Annotations With GPT-4o

2024/05/25

Mini-InternVL 1.5: A Powerful Pocket Multimodal Model with 8% Parameters for 80% Performance

2024/04/30

InternVL 1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

2024/02/21

InternVL 1.2: Scaling up LLM to 34B

2024/01/24

InternVL 1.1: Enhance Chinese and OCR Capabilities

2023/12/12

InternVL 1.0: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks