Current methods using Large Language Models (LLMs) to understand 3D visual information and generate segmentation prompts have not fully explored spatial reasoning capabilities. This is due to a lack of high-quality datasets that test both reasoning and spatial understanding abilities.
Overview
Dataset
1,000+
3D Scenes
10,000+
Text-Object Pairs
Key Features
- Text inputs as spatial reasoning questions
- 3D object masks as answers
- Built on ScanNet++ scene collection
Examples Browser
Annotation
Benchmark
Evaluation Metrics
Description of how models are evaluated on this dataset.
Baseline Results
| Method | Accuracy | IoU | F1 Score |
|---|---|---|---|
| Method 1 | 85.2% | 0.72 | 0.80 |
Challenge Cases
Download & Use
Usage Guide
# Example code for loading and using the dataset
import json
import numpy as np
# Load dataset
with open('spatial_reasoning_dataset.json', 'r') as f:
dataset = json.load(f)
# Access a sample
sample = dataset[0]
question = sample['question']
object_ids = sample['object_ids']
print(f"Question: {question}")
print(f"Answer object IDs: {object_ids}")