About Dooryard
Independent, continuous measurement of AI systems. Open methodology, closed instances, public record.
Mission
Finance has credit rating agencies. Climate has the IPCC. Pharmaceuticals have the FDA. Nuclear has the IAEA. AI has no equivalent — no independent, methodologically rigorous institution producing longitudinal measurement that regulators, buyers, and the public can rely on. Dooryard exists to fill that gap.
We run a standardized battery of evaluations against commercial and open-source language models on a continuous basis, publish the results, and maintain the permanent record. The output is structured, comparable, and decomposed by use case — not a single score, but a capability profile that different audiences can interpret for their own needs.
Independent measurement
The evaluation methodology is not influenced by any model provider. No provider has editorial input into scoring, instance design, or reporting. Models are accessed through the same public APIs available to any developer. The measurement is credible because the methodology is independent.
The name
A dooryard is an Atlantic Canadian term for the front yard of a house — the space between the door and the road. We're working to help bring AI home: safely, securely, and with the measurement to make informed decisions.