GitHub - xianzhengma/Real-3DQA: [ICLR 2026] Official repository for "Real-3DQA" (original) (raw)
Real-3DQA: Do 3D Large Language Models Really Understand 3D Spatial Relationships?
ICLR 2026
Xianzheng Ma1,2*,Tao Sun3*,Shuai Chen1,2,Yash Bhalgat1,Jindong Gu5†,
Angel X Chang4,Iro Armeni3,Iro Laina1,Songyou Peng5‡,Victor Adrian Prisacariu2‡
1VGG, University of Oxford 2AVL, University of Oxford 3Stanford University
4Simon Fraser University 5Google DeepMind
* equal contribution † correspondence author ‡ equal supervision
This repository contains:
- 📁 Access to the Real-3DQA dataset (Hosted on Hugging Face).
- 📊 Two standalone evaluation scripts (Exact Match & Viewpoint Rotation Score).
- 🔧 The 3D-RFT (3D Reweighted Finetuning) pseudocode.
📁 Dataset (Hugging Face)
Our dataset is hosted on Hugging Face: Oliver-Ma/Real-3DQA
You can download the JSON files directly from the repository or use the datasets library to load them. The dataset contains a de-biased_testset along with 4 rotation splits (rotation_0, rotation_90, rotation_180, rotation_270).
📊 Evaluation
We provide two standalone scripts to compute metrics. Your model's prediction file should be a JSON containing a list of dictionaries, each with at least response_gt and response_pred (and question_id for VRS).
1. Debiased EM (Exact Match)
To evaluate standard Exact Match (EM) and Relaxed Match on the debiased test set:
python evaluate_debiased_exact_match.py predictions.json --name "MyModel"
2. VRS (Viewpoint Rotation Score)
To evaluate your model's rotation consistency across all 4 viewpoints:
python evaluate_rotation_robustness.py preds0.json preds90.json preds180.json preds270.json --name "MyModel"
🔧 3D-RFT (3D Reweighted Finetuning) Pseudocode
We provide the implementation pseudocode for our proposed 3D-RFT solution in 3D-RFT.py.
The key idea is to dynamically downweigh training samples where a blind model performs well, forcing the model to rely on genuine 3D spatial perception rather than linguistic shortcuts:
Core formula
loss_3drft = loss_full / loss_blind
📖 Citation
@inproceedings{ma2026real3dqa, title={Do 3D Large Language Models Really Understand 3D Spatial Relationships?}, author={Xianzheng Ma and Tao Sun and Shuai Chen and Yash Bhalgat and Jindong Gu and Angel X Chang and Iro Armeni and Iro Laina and Songyou Peng and Victor Adrian Prisacariu}, booktitle={The Fourteenth International Conference on Learning Representations}, year={2026}, url={https://openreview.net/forum?id=3vlMiJwo8b} }
📜 License
This project is licensed under CC BY-SA 4.0.
