DETONATE – A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization (original) (raw)

Authors

Renjith Prasad Kaippilly Mana University of South Carolina
Abhilekh Borah Manipal University Jaipur
Hasnat Md Abdullah Texas A&M University
Chathurangi Shyalika University of South Carolina
Gurpreet Singh University of South Carolina
Ritvik Garimella University of South Carolina
Rajarshi Roy Kalyani Government Engineering College
Harshul Raj Surana University of South Carolina
Nasrin Imanpour University of South Carolina
Suranjana Trivedy University of South Carolina
Amit Sheth University of South Carolina
Amitava Das BITS Pilani, Goa

DOI:

https://doi.org/10.1609/aaai.v40i44.41106

Abstract

Alignment is crucial for text-to-image (T2I) models to ensure that the generated images faithfully capture user intent while maintaining safety and fairness. Direct Preference Optimization (DPO) has emerged as a key alignment technique for large language models (LLMs), and its influence is now extending to T2I systems. This paper introduces DPO-Kernels for T2I models, a novel extension of DPO that enhances alignment across three key dimensions: (i) Hybrid Loss, which integrates embedding-based objectives with the traditional probability-based loss to improve optimization; (ii) Kernelized Representations, leveraging Radial Basis Function (RBF), Polynomial, and Wavelet kernels to enable richer feature transformations, ensuring better separation between safe and unsafe inputs; and (iii) Divergence Selection, expanding beyond DPO’s default Kullback–Leibler (KL) regularizer by incorporating alternative divergence measures such as Wasserstein and Rényi divergences to enhance stability and robustness in alignment training. We introduce DETONATE, the first large-scale benchmark of its kind, comprising approximately 100K curated image pairs, categorized as chosen and rejected. This benchmark encapsulates three critical axes of social bias and discrimination: Race, Gender, and Disability. The prompts are sourced from the hate speech datasets, while the images are generated using state-of-the-art T2I models, including Stable Diffusion 3.5 Large (SD-3.5), Stable Diffusion XL (SD-XL), and Midjourney. Furthermore, to evaluate alignment beyond surface metrics, we introduce the Alignment Quality Index (AQI) for T2I systems: a novel geometric measure that quantifies latent space separability of safe/unsafe image activations, revealing hidden model vulnerabilities. While alignment techniques often risk overfitting, we empirically demonstrate that DPO-Kernels preserve strong generalization bounds using the theory of Heavy-Tailed Self-Regularization (HT-SR).

How to Cite

Mana, R. P. K., Borah, A., Abdullah, H. M., Shyalika, C., Singh, G., Garimella, R., Roy, R., Surana, H. R., Imanpour, N., Trivedy, S., Sheth, A., & Das, A. (2026). DETONATE – A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37709-37718. https://doi.org/10.1609/aaai.v40i44.41106

Issue

Section

AAAI Special Track on AI Alignment