Preview

Safety & risk measurement

Capability evaluation and risk evaluation use the same underlying verification primitives, in the same archive, under the same scoring infrastructure. The novel content for risk evaluation is in what to measure, not in how to measure.

Four risk axes

Each measured independently. They do not move together.

How resistant is the model to being manipulated into doing things it shouldn't.

Prompt injection resistance
Jailbreak resistance
System-prompt extraction attempts
Tool-misuse resistance

Risk quadrant

Defense vs. offense — the combination tells the story. Hover to explore.

LOW OFFENSEHIGH OFFENSE
STRONGWEAK

Strong defense

Low risk

Strong defense

Monitor

Weak defense

Medium

Weak defense

Catastrophic