Third-party AI measurement

Continuous benchmarking. Permanent record.

Every day, every tracked model. Versioned evaluations, append-only storage. Independent measurement you can cite, compare, and build on.

Today's leaderboard

GCI · difficulty-weighted

Live

Major model providers shape releases to optimize known evaluations. Static leaderboards are Goodharted within months of publication.

Point-in-time snapshots go stale. Continuous evaluation tracks how models actually change across releases, not just how they launch.

Providers cannot grade their own work. A measurement institution sits outside the labs and outside the platforms.