Aug3D: Augmenting large scale outdoor datasets for Generalizable Novel View Synthesis

Abstract

Recent photorealistic Novel View Synthesis (NVS) advances have increasingly gained attention. However, these approaches remain constrained to small indoor scenes. While optimization-based NVS models have made attempts to address this, generalizable feed-forward methods—offering significant advantages—remain underexplored. In this work, we train PixelNeRF, a feed-forward NVS model, on the large-scale UrbanScene3D dataset. We propose four training strategies to cluster and train on this dataset, highlighting that performance is hindered by limited view overlap. To address this, we introduce Aug3D, an augmentation technique that leverages reconstructed scenes using traditional Structure-from-Motion (SfM). Aug3D generates well-conditioned novel views through grid and semantic sampling to enhance feed-forward NVS model learning. Our experiments reveal that reducing the number of views per cluster from 20 to 10 improves PSNR by 10%, but the performance remains suboptimal. Aug3D further addresses this by combining the newly generated novel views with the original dataset, demonstrating its effective- ness in improving the model’s ability to predict novel views.

Different Methods of Data Curation for GNVS

Sequence ID

Camera Pose

Unprojected Points

Shared Features

Using a sliding window approach.

Using image camera poses in (x, y, z) coordinate.

A point from camera space is unprojected to the ground plane in the world space.

An image similarity matrix using shared feature points among two images.

Image Clustering Method	Best PSNR	Lowest PSNR	Average PSNR
Sequence ID	9.7	0.0	3.5
Camera Pose	12.2	0.0	4.6
Unprojected Points	13.6	0.0	9.9
Shared Features	20.03	10.9	14.6

Dataset	Configuration	Best PSNR
Real Dataset (Baseline)	Input views 3	20.03
Input views 6	19.95
Input views 9	19.59
Synthetic Dataset (Ours)	Grid Sampling	29.12
	Semantic Plane Fitting	28.79
Aug3D (Ours + Baseline)	Grid	21.67
Semantic	21.80

Dataset

Configuration

Best PSNR

Real Dataset (Baseline)

Input views 3

20.03

Input views 6

19.95

Input views 9

19.59

Synthetic Dataset (Ours)

Grid Sampling

29.12

Semantic Plane Fitting

28.79

Aug3D (Ours + Baseline)

Grid

21.67

Semantic

21.80

BibTeX

@misc{rauniyar2025aug3daugmentinglargescale, title={Aug3D: Augmenting large scale outdoor datasets for Generalizable Novel View Synthesis}, author={Aditya Rauniyar and Omar Alama and Silong Yong and Katia Sycara and Sebastian Scherer}, year={2025}, eprint={2501.06431}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2501.06431}, }

Aug3D: Augmenting Large Scale Outdoor Datasets for Generalizable Novel View Synthesis

IROS Workshop 2024

Solving dataset challenges: Aug3D generates new views to tackle low-overlap clusters, boosting model performance and enabling better view synthesis.

Abstract

Different Methods of Data Curation for GNVS

Augmentation Techniques in Aug3D

Results for Various Datasets

BibTeX