GitHub - xianzhengma/Real-3DQA: [ICLR 2026] Official repository for "Real-3DQA" (original) (raw)

Real-3DQA: Do 3D Large Language Models Really Understand 3D Spatial Relationships?

ICLR 2026

Xianzheng Ma1,2*,Tao Sun3*,Shuai Chen1,2,Yash Bhalgat1,Jindong Gu5†,
Angel X Chang4,Iro Armeni3,Iro Laina1,Songyou Peng5‡,Victor Adrian Prisacariu2‡

1VGG, University of Oxford 2AVL, University of Oxford 3Stanford University
4Simon Fraser University 5Google DeepMind

* equal contribution † correspondence author ‡ equal supervision

This repository contains:

📁 Access to the Real-3DQA dataset (Hosted on Hugging Face).
📊 Two standalone evaluation scripts (Exact Match & Viewpoint Rotation Score).
🔧 The 3D-RFT (3D Reweighted Finetuning) pseudocode.

📁 Dataset (Hugging Face)

Our dataset is hosted on Hugging Face: Oliver-Ma/Real-3DQA

You can download the JSON files directly from the repository or use the datasets library to load them. The dataset contains a de-biased_testset along with 4 rotation splits (rotation_0, rotation_90, rotation_180, rotation_270).

📊 Evaluation

We provide two standalone scripts to compute metrics. Your model's prediction file should be a JSON containing a list of dictionaries, each with at least response_gt and response_pred (and question_id for VRS).

1. Debiased EM (Exact Match)

To evaluate standard Exact Match (EM) and Relaxed Match on the debiased test set:

python evaluate_debiased_exact_match.py predictions.json --name "MyModel"

2. VRS (Viewpoint Rotation Score)

To evaluate your model's rotation consistency across all 4 viewpoints:

python evaluate_rotation_robustness.py preds0.json preds90.json preds180.json preds270.json --name "MyModel"

🔧 3D-RFT (3D Reweighted Finetuning) Pseudocode

We provide the implementation pseudocode for our proposed 3D-RFT solution in 3D-RFT.py.

The key idea is to dynamically downweigh training samples where a blind model performs well, forcing the model to rely on genuine 3D spatial perception rather than linguistic shortcuts:

Core formula

loss_3drft = loss_full / loss_blind

📖 Citation

@inproceedings{ma2026real3dqa, title={Do 3D Large Language Models Really Understand 3D Spatial Relationships?}, author={Xianzheng Ma and Tao Sun and Shuai Chen and Yash Bhalgat and Jindong Gu and Angel X Chang and Iro Armeni and Iro Laina and Songyou Peng and Victor Adrian Prisacariu}, booktitle={The Fourteenth International Conference on Learning Representations}, year={2026}, url={https://openreview.net/forum?id=3vlMiJwo8b} }

📜 License

This project is licensed under CC BY-SA 4.0.