AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition (original) (raw)

View PDF HTML (experimental)

Abstract:The increasing reliance on large-scale datasets in machine learning poses significant privacy and ethical challenges, particularly in sensitive domains such as face recognition. Synthetic data generation offers a promising alternative; however, most existing methods depend heavily on external datasets or pre-trained models, increasing complexity and resource demands. In this paper, we introduce AugGen, a self-contained synthetic augmentation technique. AugGen strategically samples from a class-conditional generative model trained exclusively on the target FR dataset, eliminating the need for external resources. Evaluated across 8 FR benchmarks, including IJB-C and IJB-B, our method achieves 1-12% performance improvements, outperforming models trained solely on real data and surpassing state-of-the-art synthetic data generation approaches, while using less real data. Notably, these gains often exceed those from architectural enhancements, underscoring the value of synthetic augmentation in data-limited scenarios. Our findings demonstrate that carefully integrated synthetic data can both mitigate privacy constraints and substantially enhance recognition performance. Paper website: this https URL.

Submission history

From: Parsa Rahimi Noshanagh [view email]
[v1] Fri, 14 Mar 2025 16:10:21 UTC (29,066 KB)
[v2] Wed, 11 Jun 2025 12:32:30 UTC (29,064 KB)
[v3] Fri, 24 Oct 2025 13:47:21 UTC (29,062 KB)