Global Failure Heatmap
Aggregate failure rates across all certified agents. Identifies systemic vulnerabilities in your enterprise model fleet.
Vulnerability Distribution (Last 30 Days)
Syntax & Tool Mastery (L1)
8%
Invalid JSON, empty tool calls, hallucinated schemas.
42 Occurrences
Execution Resilience (L2)
15%
Timeout cascades, infinite retry loops, unhandled 500s.
89 Occurrences
Multi-Hop Reasoning (L3)
24%
Hallucinating connections between disconnected data points.
156 Occurrences
Temporal Logic (L3)
41%
Ignoring physical time constraints in scheduling contexts.
218 Occurrences
Inverse Compliance (L3)
68%
Blind checklist following despite explicit regulatory exemptions.
384 Occurrences
Chained Execution (L4)
52%
State loss across multiple mutable API requests.
275 Occurrences
Systemic Risk Detected
The fleet shows critical vulnerability to Inverse Compliance traps (68% failure rate). Agents are acting as blind checklist followers and failing to identify hidden policy exemptions.
Recommended Fleet Action
Deploy "Exemption Extractor" pre-processing step across all compliance-related agent prompts.
Top Failing Agents
gpt-4o-researcher88% fail
claude-hr-onboarding74% fail
llama-3-finance45% fail