Third-party AI measurement

Continuous benchmarking. Permanent record.

Every day, every tracked model. Versioned evaluations, append-only storage. Independent measurement you can cite, compare, and build on.

Today's leaderboard

GCI · difficulty-weighted
Live

Announcements

All announcements →

Why this exists

01

Benchmarks are being gamed.

Major model providers shape releases to optimize known evaluations. Static leaderboards are Goodharted within months of publication.

02

Continuity matters.

Point-in-time snapshots go stale. Continuous evaluation tracks how models actually change across releases, not just how they launch.

03

Third-party measurement matters.

Providers cannot grade their own work. A measurement institution sits outside the labs and outside the platforms.