BNY-CMU AI Lab

Reliable, Responsible, and Resilient AI for Mission-Critical Systems

Home > Research > BNY-CMU AI Lab

Located at Carnegie Mellon University, the BNY–CMU AI Lab will advance the scientific and engineering foundations for trustworthy autonomous AI in mission-critical systems. 

As AI systems evolve from tools to autonomous agents capable of planning, tool use, coordination, and long-horizon decision-making, they are increasingly deployed in environments where errors, misalignment, or adversarial manipulation carry systemic consequences. Existing AI research has largely focused on capability scaling; far less work has addressed how to ensure these systems are reliable under stress, responsible in their behavior, and resilient to failure.

With the goal of establishing a world-class research program that bridges frontier AI development with rigorous evaluation science, formal guarantees, governance mechanisms, and system-level robustness, the BNY–CMU AI Lab will develop the foundations of Reliable, Responsible, and Resilient (RRR) Agentic AI.


Expected Impact

AI capability is advancing rapidly. Ensuring that autonomous systems are reliable, responsible, and resilient is now one of the most important scientific challenges of our time. The BNY–CMU AI Lab will lead this effort—developing the theory, infrastructure, and engineering principles required to make mission-critical AI trustworthy by design.

  1. Advance the scientific foundations of trustworthy autonomous AI
  2. Establish new standards for evaluating and stress-testing agentic systems
  3. Produce high-impact publications at leading AI, systems, and security venues
  4. Train the next generation of researchers in mission-critical AI engineering
  5. Position the Lab as a global leader in reliable and resilient AI

By combining frontier AI research with rigorous system-level thinking, the Lab will help shape a future in which autonomous AI systems can be deployed safely in environments where reliability is non-negotiable.


Research Structure & Collaboration Model

  • Support one-year PhD and postdoctoral research projects aligned with the RRR pillars
  • Encourage interdisciplinary collaboration across machine learning, systems, robotics, security, economics, and policy
  • Provide access to real-world deployment scenarios and controlled sandbox environments
  • Develop shared infrastructure for benchmarking and evaluation

Projects are strongly encouraged to engage with the current state of the art in deployed agentic AI systems and frontier large-model capabilities.


Core Research Pillars

Frontier models increasingly act autonomously. We will develop:

  • Formal verification methods for AI agents
  • Runtime constraint enforcement systems
  • Tool-use correctness guarantees
  • Safe planning under long-horizon uncertainty
  • Alignment mechanisms for decomposed multi-step tasks

This pillar integrates machine learning with formal methods, programming languages, control theory, and robotics-style autonomy.

Objective: Move from probabilistic competence to constrained, verifiable autonomy.

Static benchmarks are insufficient for mission-critical AI. We will create next-generation evaluation frameworks including:

  • Long-horizon task evaluation
  • Multi-agent coordination stress tests
  • Adversarial red-teaming at system level
  • Synthetic simulation environments for stress testing
  • Evaluation of self-modifying or adaptive agents

A flagship outcome will be the development of a shared agent evaluation platform capable of rigorously testing reliability, responsibility, and resilience under controlled but realistic conditions.

Objective: Establish evaluation science as a first-class discipline for autonomous AI.

As autonomous agents interact, coordination and incentive alignment become central challenges. We will study:

  • Incentive-compatible agent coordination
  • Mechanism design for AI systems
  • Containment and escalation frameworks
  • Distributed oversight models
  • Transparency in multi-agent interactions

This research draws from economics, game theory, security, and public policy to address the emerging reality of AI systems interacting as semi-autonomous actors.

Objective: Design AI ecosystems that remain aligned and stable at scale.

Mission-critical AI must remain dependable under stress. Research topics include:

  • Robustness under distribution shift
  • Failure cascade modeling in multi-agent systems
  • Adversarial robustness and data poisoning defenses
  • Continual learning without catastrophic degradation
  • Secure model update and deployment pipelines

We will treat AI systems as socio-technical systems whose reliability depends on model behavior, infrastructure, and governance.

Objective: Prevent local failures from becoming systemic failures.

Rather than treating responsibility as a post-hoc layer, we will pursue:

  • Machine-readable policy constraints
  • Auditable reasoning traces
  • Self-documenting agent decisions
  • Calibrated uncertainty estimation
  • Human-AI escalation and override frameworks

This work aims to embed responsibility directly into system architecture.

Objective: Make transparency and accountability structural properties of AI systems.

Call for Faculty Research Proposals

The BNY–CMU AI Lab invites Carnegie Mellon University faculty to submit proposals for one-year research projects supporting PhD students or postdoctoral researchers aligned with the Lab’s mission of advancing Reliable, Responsible, and Resilient (RRR) Agentic AI for mission- critical systems.

View RFP

Leadership

Christopher Martin, Senior Director
Responsible AI
BNY

Zico Kolter, Professor and Department Head
Machine Learning Department
Carnegie Mellon University

Contact

Sara Werner
Machine Learning Department
Carnegie Mellon University
swerner@andrew.cmu.edu