Trustworthy AI (original) (raw)

Our trust in technology relies on understanding how it works. It’s important to understand why AI makes the decisions it does. We’re developing tools to make AI more explainable, fair, robust, private, and transparent.

Explore our topics

Overview

Artificial intelligence systems have become increasingly prevalent in everyday life and enterprise settings, and they’re now often being used to support human decision-making. These systems have grown increasingly complex and efficient, and AI holds the promise of uncovering valuable insights across a wide range of applications. But broad adoption of AI systems will require humans to trust their output.

When people understand how technology works, and we can assess that it’s safe and reliable, we’re far more inclined to trust it. Many AI systems to date have been black boxes, where data is fed in and results come out. To trust a decision made by an algorithm, we need to know that it is fair, that it’s reliable and can be accounted for, and that it will cause no harm. We need assurances that AI cannot be tampered with and that the system itself is secure. We need to be able to look inside AI systems, to understand the rationale behind the algorithmic outcome, and even ask it questions as to how it came to its decision.

At IBM Research, we’re working on a range of approaches to ensure that AI systems built in the future are fair, robust, explainable, account, and align with the values of the society they’re designed for. We’re ensuring that in the future, AI applications are as fair as they are efficient across their entire lifecycle.

Our work

Topics

AI Testing

We’re designing tools to help ensure that AI systems are trustworthy, reliable and can optimize business processes.

Adversarial Robustness and Privacy

We’re making tools to protect AI and certify its robustness, and helping AI systems adhere to privacy requirements.

Explainable AI

We’re creating tools to help AI systems explain why they made the decisions they did.

Fairness, Accountability, Transparency

We’re developing technologies to increase the end-to-end transparency and fairness of AI systems.

Trustworthy Generation

We’re developing theoretical and algorithmic frameworks for generative AI to accelerate future scientific discoveries.

Uncertainty Quantification

We’re developing ways for AI to communicate when it's unsure of a decision across the AI application development lifecycle.

Publications

Unsupervised Cycle Detection in Agentic Applications
- - Felix George
    - Divya Pathak
    - et al.
- 2026
- ICPE 2026
  Short paper
Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation
- - Asmita Bhardwaj
    - Yuya Ong
    - et al.
- 2026
- ICLR 2026
  Workshop paper
Evaluating Ill-Defined Tasks in Large Language Models
- - Yi Zhou
    - Basel Shbita
- 2026
- ICLR 2026
  Workshop paper
PRIGUARDAGENT: CONTEXT-AWARE PRIVACY GUARDRAILS FOR AGENTIC SYSTEMS
- - Chulin Xie
    - Amit Dhurandhar
    - et al.
- 2026
- ICLR 2026
  Workshop paper
LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics
- - Farhan Ahmed
    - Yuya Ong
    - et al.
- 2026
- ICLR 2026
  Workshop paper
Unifying Concept Representation Learning
- - Amit Dhurandhar
    - Amir-hossein Karimi
    - et al.
- 2026
- ICLR 2026
  Workshop

View all publications

Building trustworthy AI with Watson

Our research is regularly integrated into Watson solutions to make IBM’s AI for business more transparent, explainable, robust, private, and fair.

Trustworthy AI (original) (raw)

Overview

Our work

Introducing the IBM Granite 4.1 family of models

Toward a transparent supply chain for AI

How IBM Granite became a leader in responsible AI

LLMs have model cards. Now, benchmarks do, too

IBM Granite tops Stanford’s list as the world’s most transparent model

An artist’s tribute to modern AI

Topics

AI Testing

Adversarial Robustness and Privacy

Explainable AI

Fairness, Accountability, Transparency

Trustworthy Generation

Uncertainty Quantification

Publications

Unsupervised Cycle Detection in Agentic Applications

Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation

Evaluating Ill-Defined Tasks in Large Language Models

PRIGUARDAGENT: CONTEXT-AWARE PRIVACY GUARDRAILS FOR AGENTIC SYSTEMS

LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics

Unifying Concept Representation Learning

Building trustworthy AI with Watson