Model Risk Management for Agentic AI
Scalable Runtime Governance
for Agentic AI in Financial Services
An interactive companion to the paper.
What's Inside
Layers
From human policy through capability specs, governed execution, to runtime telemetry.
Tiers
Assistive, bounded workflow, high-impact governed, and critical autonomous - each with proportionate controls.
Steps
Capability catalogue, evidence packs, onboarding, conformance, deployment, monitoring, and change control.
Agentic AI challenges classical Model Risk Management
An agentic system is a composition of an LLM, system prompts, a tool layer, memory, and guardrails, coupled by an orchestration architecture. The primary object of concern is the execution trajectory: a sequence of intermediate reasoning steps, memory accesses, tool invocations, state transitions, and human approvals. Outcomes arise from these trajectories, not from a stable input-output mapping.
Prompt-based guardrails present an illusion of control
Prompt-level guardrails are advisory in nature. They depend on the model's probabilistic compliance instead of rigorous, enforceable constraints within the execution environment. Even when implemented through additional models or rule-based filters, guardrails are themselves components whose reliability must be tested, calibrated, and monitored.
LLM-based verification conflates plausibility with correctness
Many guardrail implementations rely on LLM-based verifiers that operate in the same semantic space as the systems they control. They assess whether an output appears reasonable, not whether it satisfies precise structural, numerical, or policy constraints. Fluently written but incorrect outputs systematically pass such checks.
Static periodic reviews cannot govern dynamic execution trajectories
Traditional model risk management relies on periodic assessment: models are validated prior to deployment and monitored through scheduled reviews using aggregate indicators. But the space of possible execution paths is combinatorially large. Minor variations in phrasing or context can lead to materially different trajectories. Even when aggregate metrics appear acceptable, severe failures may arise on specific trajectories.
Human-in-the-loop fails
Agents operate at machine speed; human review at human speed. Reviewers see final outputs or compressed summaries, not the full trajectory with intermediate reasoning and tool calls. Multi-step workflows with branching decisions generate more context than a human can reliably verify. These limitations do not eliminate human oversight - they change its function toward policy design, threshold setting, and review of monitoring evidence.
The shift: from periodic review to runtime governance
Runtime governance is real-time monitoring and enforcement of policy over an agentic system's execution trajectory. It encompasses governance-semantic telemetry, continuous authorisation, temporal and path conformance checking, monitoring of trajectory-level drift, and tiered containment when behaviour moves outside approved bounds. The system operates autonomously within governed boundaries but remains continuously subject to intervention - perpetual oversight.