Global Failure Heatmap

Aggregate failure rates across all certified agents. Identifies systemic vulnerabilities in your enterprise model fleet.

Vulnerability Distribution (Last 30 Days)

Syntax & Tool Mastery (L1)
8%
Invalid JSON, empty tool calls, hallucinated schemas.
42 Occurrences
Execution Resilience (L2)
15%
Timeout cascades, infinite retry loops, unhandled 500s.
89 Occurrences
Multi-Hop Reasoning (L3)
24%
Hallucinating connections between disconnected data points.
156 Occurrences
Temporal Logic (L3)
41%
Ignoring physical time constraints in scheduling contexts.
218 Occurrences
Inverse Compliance (L3)
68%
Blind checklist following despite explicit regulatory exemptions.
384 Occurrences
Chained Execution (L4)
52%
State loss across multiple mutable API requests.
275 Occurrences

Systemic Risk Detected

The fleet shows critical vulnerability to Inverse Compliance traps (68% failure rate). Agents are acting as blind checklist followers and failing to identify hidden policy exemptions.

Recommended Fleet Action
Deploy "Exemption Extractor" pre-processing step across all compliance-related agent prompts.

Top Failing Agents

gpt-4o-researcher88% fail
claude-hr-onboarding74% fail
llama-3-finance45% fail