Governance Gate - Benchmark Review

Pair-first benchmark surface derived from per-run archives and machine indexes. Default scope is post-2.0 baseline onward from 2026-04-16 UTC.

Machine indexes: runs/index.json · runs/benchmark_index.v1.json row evidence · runs/benchmark_index.v2.json paired benchmark view

Filters

Model

Pack

Proof class

Target kind

Gated run result

Scope

Primary reading path

Paired benchmark evidence

Pairs are the primary benchmark interpretation path. This is not a model leaderboard.

Baseline definition: --

Rows in view

Non-pass rows

Models

Packs

Proof classes

Pair view summary

Loading pair metrics.

Evidence handling

Each pair row keeps baseline and gated evidence grouped. Primary links stay visible and archive/proof files stay one click away.

Loading paired benchmark groups.

Secondary evidence surface: recent row evidence

Secondary evidence surface

Timestamp	Pack	Proof class	Model	Result	Leaks	Run ID	Evidence	CI
Loading row evidence.

Secondary grouped view by row_identity

Grouped output is derived from the same row entries with no added semantics.

row_identity	Runs	Latest	Sequence new to old	Latest run
Loading grouped rows.