DETONATE – A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization (original) (raw)

Authors

DOI:

https://doi.org/10.1609/aaai.v40i44.41106

Abstract

Alignment is crucial for text-to-image (T2I) models to ensure that the generated images faithfully capture user intent while maintaining safety and fairness. Direct Preference Optimization (DPO) has emerged as a key alignment technique for large language models (LLMs), and its influence is now extending to T2I systems. This paper introduces DPO-Kernels for T2I models, a novel extension of DPO that enhances alignment across three key dimensions: (i) Hybrid Loss, which integrates embedding-based objectives with the traditional probability-based loss to improve optimization; (ii) Kernelized Representations, leveraging Radial Basis Function (RBF), Polynomial, and Wavelet kernels to enable richer feature transformations, ensuring better separation between safe and unsafe inputs; and (iii) Divergence Selection, expanding beyond DPO’s default Kullback–Leibler (KL) regularizer by incorporating alternative divergence measures such as Wasserstein and Rényi divergences to enhance stability and robustness in alignment training. We introduce DETONATE, the first large-scale benchmark of its kind, comprising approximately 100K curated image pairs, categorized as chosen and rejected. This benchmark encapsulates three critical axes of social bias and discrimination: Race, Gender, and Disability. The prompts are sourced from the hate speech datasets, while the images are generated using state-of-the-art T2I models, including Stable Diffusion 3.5 Large (SD-3.5), Stable Diffusion XL (SD-XL), and Midjourney. Furthermore, to evaluate alignment beyond surface metrics, we introduce the Alignment Quality Index (AQI) for T2I systems: a novel geometric measure that quantifies latent space separability of safe/unsafe image activations, revealing hidden model vulnerabilities. While alignment techniques often risk overfitting, we empirically demonstrate that DPO-Kernels preserve strong generalization bounds using the theory of Heavy-Tailed Self-Regularization (HT-SR).

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

How to Cite

Mana, R. P. K., Borah, A., Abdullah, H. M., Shyalika, C., Singh, G., Garimella, R., Roy, R., Surana, H. R., Imanpour, N., Trivedy, S., Sheth, A., & Das, A. (2026). DETONATE – A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37709-37718. https://doi.org/10.1609/aaai.v40i44.41106

Issue

Section

AAAI Special Track on AI Alignment