Preview
Safety & risk measurement
Capability evaluation and risk evaluation use the same underlying verification primitives, in the same archive, under the same scoring infrastructure. The novel content for risk evaluation is in what to measure, not in how to measure.
Four risk axes
Each measured independently. They do not move together.
How resistant is the model to being manipulated into doing things it shouldn't.
Prompt injection resistance
Jailbreak resistance
System-prompt extraction attempts
Tool-misuse resistance
Risk quadrant
Defense vs. offense — the combination tells the story. Hover to explore.
LOW OFFENSEHIGH OFFENSE
STRONGWEAK
Strong defense
Low risk
Strong defense
Monitor
Weak defense
Medium
Weak defense
Catastrophic