Carnegie Mellon University School of Computer Science

Skip to Main Content Toggle Visibility of Menu

BNY-CMU AI Lab

Reliable, Responsible, and Resilient AI for Mission-Critical Systems

Home > Research > BNY-CMU AI Lab

Located at Carnegie Mellon University, the BNY–CMU AI Lab will advance the scientific and engineering foundations for trustworthy autonomous AI in mission-critical systems.

As AI systems evolve from tools to autonomous agents capable of planning, tool use, coordination, and long-horizon decision-making, they are increasingly deployed in environments where errors, misalignment, or adversarial manipulation carry systemic consequences. Existing AI research has largely focused on capability scaling; far less work has addressed how to ensure these systems are reliable under stress, responsible in their behavior, and resilient to failure.

With the goal of establishing a world-class research program that bridges frontier AI development with rigorous evaluation science, formal guarantees, governance mechanisms, and system-level robustness, the BNY–CMU AI Lab will develop the foundations of Reliable, Responsible, and Resilient (RRR) Agentic AI.

Expected Impact

AI capability is advancing rapidly. Ensuring that autonomous systems are reliable, responsible, and resilient is now one of the most important scientific challenges of our time. The BNY–CMU AI Lab will lead this effort—developing the theory, infrastructure, and engineering principles required to make mission-critical AI trustworthy by design.

Advance the scientific foundations of trustworthy autonomous AI
Establish new standards for evaluating and stress-testing agentic systems
Produce high-impact publications at leading AI, systems, and security venues
Train the next generation of researchers in mission-critical AI engineering
Position the Lab as a global leader in reliable and resilient AI

By combining frontier AI research with rigorous system-level thinking, the Lab will help shape a future in which autonomous AI systems can be deployed safely in environments where reliability is non-negotiable.

Research Structure & Collaboration Model

Support one-year PhD and postdoctoral research projects aligned with the RRR pillars
Encourage interdisciplinary collaboration across machine learning, systems, robotics, security, economics, and policy
Provide access to real-world deployment scenarios and controlled sandbox environments
Develop shared infrastructure for benchmarking and evaluation

Projects are strongly encouraged to engage with the current state of the art in deployed agentic AI systems and frontier large-model capabilities.

Core Research Pillars

Autonomous Agent Engineering & Verification

Frontier models increasingly act autonomously. We will develop:

Formal verification methods for AI agents
Runtime constraint enforcement systems
Tool-use correctness guarantees
Safe planning under long-horizon uncertainty
Alignment mechanisms for decomposed multi-step tasks

This pillar integrates machine learning with formal methods, programming languages, control theory, and robotics-style autonomy.

Objective: Move from probabilistic competence to constrained, verifiable autonomy.

Evaluation Science & Stress Testing for Agentic Systems

Static benchmarks are insufficient for mission-critical AI. We will create next-generation evaluation frameworks including:

Long-horizon task evaluation
Multi-agent coordination stress tests
Adversarial red-teaming at system level
Synthetic simulation environments for stress testing
Evaluation of self-modifying or adaptive agents

A flagship outcome will be the development of a shared agent evaluation platform capable of rigorously testing reliability, responsibility, and resilience under controlled but realistic conditions.

Objective: Establish evaluation science as a first-class discipline for autonomous AI.

Governance & Mechanism Design for Multi-Agent AI

As autonomous agents interact, coordination and incentive alignment become central challenges. We will study:

Incentive-compatible agent coordination
Mechanism design for AI systems
Containment and escalation frameworks
Distributed oversight models
Transparency in multi-agent interactions

This research draws from economics, game theory, security, and public policy to address the emerging reality of AI systems interacting as semi-autonomous actors.

Objective: Design AI ecosystems that remain aligned and stable at scale.

System-Level Robustness & Resilience

Mission-critical AI must remain dependable under stress. Research topics include:

Robustness under distribution shift
Failure cascade modeling in multi-agent systems
Adversarial robustness and data poisoning defenses
Continual learning without catastrophic degradation
Secure model update and deployment pipelines

We will treat AI systems as socio-technical systems whose reliability depends on model behavior, infrastructure, and governance.

Objective: Prevent local failures from becoming systemic failures.

Responsible AI by Design

Rather than treating responsibility as a post-hoc layer, we will pursue:

Machine-readable policy constraints
Auditable reasoning traces
Self-documenting agent decisions
Calibrated uncertainty estimation
Human-AI escalation and override frameworks

This work aims to embed responsibility directly into system architecture.

Objective: Make transparency and accountability structural properties of AI systems.

Call for Faculty Research Proposals

The BNY–CMU AI Lab invites Carnegie Mellon University faculty to submit proposals for one-year research projects supporting PhD students or postdoctoral researchers aligned with the Lab’s mission of advancing Reliable, Responsible, and Resilient (RRR) Agentic AI for mission- critical systems.

View RFP

Leadership

Christopher Martin, Senior Director
Responsible AI
BNY

Zico Kolter, Professor and Department Head
Machine Learning Department
Carnegie Mellon University

Contact

Sara Werner
Machine Learning Department
Carnegie Mellon University
swerner@andrew.cmu.edu

News and Resources

Artificial Intelligence at BNY

BNY and Carnegie Mellon University Join Forces To Advance AI Education and Research - August 2025

24 February, 2026
PeerCoPilot Uses AI To Support Community Mental Health Care
23 February, 2026
Swartz Center Strengthens CMU’s Silicon Valley Connections at Lab to Market Summit
20 February, 2026
Carnegie Mellon, Sleep Cycle Will Explore Sleep Data To Detect Outbreaks