SOCIAL MEDIA TITLE TAG (original) (raw)

InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts
Peizhou Cao1✶, Yichen Jin1, Luo Li1, Wenzhe Cai1,3, Jingli Lin1,4, Hanqing Wang1,2, Zhaoyang Lyu1, Tai Wang2, Bo Dai1, Xudong Xu1†, Jiangmiao Pang1†,
✶ Equal Contribution, † Corresponding Author
NeurIPS 2025
📋 Abstract
The advancement of Embodied AI heavily relies on large-scale, simulatable 3D scene datasets characterized by scene diversity and realistic layouts. However, existing datasets typically suffer from limitations in data scale or diversity, sanitized layouts lacking small items, and severe object collisions. To address these shortcomings, we introduce InternScenes, a novel large-scale simulatable indoor scene dataset comprising approximately 40,000 diverse scenes by integrating three disparate scene sources, i.e., real-world scans, procedurally generated scenes, and designer-created scenes, including 1.96M objects and covering 15 common scene types and 288 object classes. We particularly preserve massive small items in the scenes, resulting in realistic and complex layouts with an average of 41.5 objects per region. Our comprehensive data processing pipeline ensures simulatability by creating real-to-sim replicas for real-world scans, enhances interactivity by incorporating interactive objects into these scenes, and resolves object collisions by physical simulations. We demonstrate the value of InternScenes with two benchmark applications: scene layout generation and point-goal navigation. Both show the new challenges posed by the complex and realistic layouts. More importantly, InternScenes paves the way for scaling up the model training for both tasks, making the generation and navigation in such complex scenes possible. We commit to open-sourcing the data, models, and benchmarks to benefit the whole community.
🎬 Demo Video
🏘️ InternScenes-Real2Sim

Pipeline for retrieving synthetic scenes from real scan scenes.
🎮 InternScenes-Synthetic

Pipeline for annotating and processing raw scenes to extract precise layout information.
🌠 Samples
Comprehensive 3D Assets (*.usd and *.glb) with Canonical Poses and Semantic Labels
🔎 All 3D CAD models have been carefully annotated manually
Bed
Bulky Item
Bookshelf
Bulky Item
Chair
Bulky Item
Couch
Bulky Item
Desk
Bulky Item
Refrigerator
Bulky Item
Electric Cooker
Medium Item
Microwave
Medium Item
Oven
Medium Item
Pan
Medium Item
Pot
Medium Item
Lamp
Medium Item
Clock
Small Item
Clothes
Small Item
Cup
Small Item
Fan
Small Item
Pillow
Small Item
Phone
Small Item
Keyboard
Small Item
Mouse
Small Item
Laptop
Small Item
Tray
Small Item
Shoe
Small Item
Toy
Small Item
Retrieving from Real Scans to Synthetic Scenes
(Drag the slider below to rotate the scene)
🌗 Real2Sim Comparison
BibTeX
@inproceedings{InternScenes,
title={InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts},
author={Zhong, Weipeng and Cao, Peizhou and Jin, Yichen and Li, Luo and Cai, Wenzhe and Lin, Jingli and Lyu, Zhaoyang and Wang, Tai and Dai, Bo and Xu, Xudong and Pang, Jiangmiao},
year={2025},
booktitle={arXiv},
}