DETONATE – A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization (original) (raw)
Authors
- Renjith Prasad Kaippilly Mana University of South Carolina
- Abhilekh Borah Manipal University Jaipur
- Hasnat Md Abdullah Texas A&M University
- Chathurangi Shyalika University of South Carolina
- Gurpreet Singh University of South Carolina
- Ritvik Garimella University of South Carolina
- Rajarshi Roy Kalyani Government Engineering College
- Harshul Raj Surana University of South Carolina
- Nasrin Imanpour University of South Carolina
- Suranjana Trivedy University of South Carolina
- Amit Sheth University of South Carolina
- Amitava Das BITS Pilani, Goa
DOI:
https://doi.org/10.1609/aaai.v40i44.41106
Abstract
Alignment is crucial for text-to-image (T2I) models to ensure that the generated images faithfully capture user intent while maintaining safety and fairness. Direct Preference Optimization (DPO) has emerged as a key alignment technique for large language models (LLMs), and its influence is now extending to T2I systems. This paper introduces DPO-Kernels for T2I models, a novel extension of DPO that enhances alignment across three key dimensions: (i) Hybrid Loss, which integrates embedding-based objectives with the traditional probability-based loss to improve optimization; (ii) Kernelized Representations, leveraging Radial Basis Function (RBF), Polynomial, and Wavelet kernels to enable richer feature transformations, ensuring better separation between safe and unsafe inputs; and (iii) Divergence Selection, expanding beyond DPO’s default Kullback–Leibler (KL) regularizer by incorporating alternative divergence measures such as Wasserstein and Rényi divergences to enhance stability and robustness in alignment training. We introduce DETONATE, the first large-scale benchmark of its kind, comprising approximately 100K curated image pairs, categorized as chosen and rejected. This benchmark encapsulates three critical axes of social bias and discrimination: Race, Gender, and Disability. The prompts are sourced from the hate speech datasets, while the images are generated using state-of-the-art T2I models, including Stable Diffusion 3.5 Large (SD-3.5), Stable Diffusion XL (SD-XL), and Midjourney. Furthermore, to evaluate alignment beyond surface metrics, we introduce the Alignment Quality Index (AQI) for T2I systems: a novel geometric measure that quantifies latent space separability of safe/unsafe image activations, revealing hidden model vulnerabilities. While alignment techniques often risk overfitting, we empirically demonstrate that DPO-Kernels preserve strong generalization bounds using the theory of Heavy-Tailed Self-Regularization (HT-SR).
How to Cite
Mana, R. P. K., Borah, A., Abdullah, H. M., Shyalika, C., Singh, G., Garimella, R., Roy, R., Surana, H. R., Imanpour, N., Trivedy, S., Sheth, A., & Das, A. (2026). DETONATE – A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37709-37718. https://doi.org/10.1609/aaai.v40i44.41106
Issue
Section
AAAI Special Track on AI Alignment