Model rollouts, methodology updates, and evaluation milestones.
Tier 1 model rollout
Daily evaluation is now running across seven Tier 1 models. Tier 2 and Tier 3 are next.
447 evaluation instances
The instance pool has grown to 447 across 9 families, 16 domains, and 32 patterns.