Third-party AI measurement
Continuous benchmarking. Permanent record.
Every day, every tracked model. Versioned evaluations, append-only storage. Independent measurement you can cite, compare, and build on.
Today's leaderboard
GCI · difficulty-weightedAnnouncements
All announcements →Why this exists
01
Benchmarks are being gamed.
Major model providers shape releases to optimize known evaluations. Static leaderboards are Goodharted within months of publication.
02
Continuity matters.
Point-in-time snapshots go stale. Continuous evaluation tracks how models actually change across releases, not just how they launch.
03
Third-party measurement matters.
Providers cannot grade their own work. A measurement institution sits outside the labs and outside the platforms.