ScanNet++ Spatial Reasoning

3D Vision-Language Spatial Understanding Benchmark

Overview

Current methods using Large Language Models (LLMs) to understand 3D visual information and generate segmentation prompts have not fully explored spatial reasoning capabilities. This is due to a lack of high-quality datasets that test both reasoning and spatial understanding abilities.

Dataset

1,000+
3D Scenes
10,000+
Text-Object Pairs

Key Features

  • Text inputs as spatial reasoning questions
  • 3D object masks as answers
  • Built on ScanNet++ scene collection
Dataset Overview

Examples Browser

Annotation

Benchmark

Evaluation Metrics

Description of how models are evaluated on this dataset.

Baseline Results

Method Accuracy IoU F1 Score
Method 1 85.2% 0.72 0.80

Challenge Cases

Download & Use

Complete Dataset

Full ScanNet++ Spatial Reasoning dataset with all annotations.

Size: 4.2 GB

Download

Sample Version

Lightweight sample with 50 scenes for quick exploration.

Size: 500 MB

Download Sample

Code & Models

Evaluation code and baseline models.

GitHub Repo

Usage Guide


# Example code for loading and using the dataset
import json
import numpy as np

# Load dataset
with open('spatial_reasoning_dataset.json', 'r') as f:
    dataset = json.load(f)

# Access a sample
sample = dataset[0]
question = sample['question']
object_ids = sample['object_ids']

print(f"Question: {question}")
print(f"Answer object IDs: {object_ids}")