Structural Design Labs logo

Structural Design Labs

Ownership and publication surface

Governance Gate - Benchmark Review

Pair-first benchmark surface derived from per-run archives and machine indexes. Default scope is post-2.0 baseline onward from 2026-04-16 UTC.

Machine indexes: runs/index.json · runs/benchmark_index.v1.json row evidence · runs/benchmark_index.v2.json paired benchmark view

Paired benchmark evidence

Pairs are the primary benchmark interpretation path. This is not a model leaderboard.

Baseline definition: --

Rows in view
--
Non-pass rows
--
Models
--
Packs
--
Proof classes
--
Pair view summary
Loading pair metrics.
Evidence handling
Each pair row keeps baseline and gated evidence grouped. Primary links stay visible and archive/proof files stay one click away.
Loading paired benchmark groups.
Secondary evidence surface: recent row evidence

Recent row evidence

Raw row evidence remains authoritative. Default view shows latest 5 rows.

Timestamp Pack Proof class Model Result Leaks Run ID Evidence CI
Loading row evidence.
Secondary grouped view by row_identity

Grouped output is derived from the same row entries with no added semantics.

row_identity Runs Latest Sequence new to old Latest run
Loading grouped rows.