
As AI systems evolve from tools to autonomous agents capable of planning, tool use, coordination, and long-horizon decision-making, they are increasingly deployed in environments where errors, misalignment, or adversarial manipulation carry systemic consequences. Existing AI research has largely focused on capability scaling; far less work has addressed how to ensure these systems are reliable under stress, responsible in their behavior, and resilient to failure.
With the goal of establishing a world-class research program that bridges frontier AI development with rigorous evaluation science, formal guarantees, governance mechanisms, and system-level robustness, the BNY–CMU AI Lab will develop the foundations of Reliable, Responsible, and Resilient (RRR) Agentic AI.
AI capability is advancing rapidly. Ensuring that autonomous systems are reliable, responsible, and resilient is now one of the most important scientific challenges of our time. The BNY–CMU AI Lab will lead this effort—developing the theory, infrastructure, and engineering principles required to make mission-critical AI trustworthy by design.
By combining frontier AI research with rigorous system-level thinking, the Lab will help shape a future in which autonomous AI systems can be deployed safely in environments where reliability is non-negotiable.
Projects are strongly encouraged to engage with the current state of the art in deployed agentic AI systems and frontier large-model capabilities.
Frontier models increasingly act autonomously. We will develop:
This pillar integrates machine learning with formal methods, programming languages, control theory, and robotics-style autonomy.
Objective: Move from probabilistic competence to constrained, verifiable autonomy.
Static benchmarks are insufficient for mission-critical AI. We will create next-generation evaluation frameworks including:
A flagship outcome will be the development of a shared agent evaluation platform capable of rigorously testing reliability, responsibility, and resilience under controlled but realistic conditions.
Objective: Establish evaluation science as a first-class discipline for autonomous AI.
As autonomous agents interact, coordination and incentive alignment become central challenges. We will study:
This research draws from economics, game theory, security, and public policy to address the emerging reality of AI systems interacting as semi-autonomous actors.
Objective: Design AI ecosystems that remain aligned and stable at scale.
Mission-critical AI must remain dependable under stress. Research topics include:
We will treat AI systems as socio-technical systems whose reliability depends on model behavior, infrastructure, and governance.
Objective: Prevent local failures from becoming systemic failures.
Rather than treating responsibility as a post-hoc layer, we will pursue:
This work aims to embed responsibility directly into system architecture.
Objective: Make transparency and accountability structural properties of AI systems.