The Proof of Intelligence. A decentralized adversarial evaluation protocol on Bittensor powered by LiveBench.
# OpenArena: The Truth Machine for AI
## The Problem: Benchmark Saturation
Static benchmarks (GSM8K, MMLU) are dead. Frontier models score 90%+ by
memorizing test sets but fail on novel problems. The industry cannot
distinguish a model that remembers from a model that reasons.
## The Solution: Dynamic Adversarial Evaluation
OpenArena is a decentralized Bittensor subnet where:
1. Validators pull fresh, contamination-free tasks from LiveBench
(a continuously updated, private-delayed benchmark — mathematically
impossible to memorize).
2. Miners solve tasks under a cryptographic Commit-Reveal scheme
(prevents front-running and answer copying).
3. Scoring uses the Generalization Score:
S = (Accuracy × Calibration) − Latency
Brier scoring penalizes hallucination and rewards calibrated confidence.
## The Unfair Advantage: KaggleIngest
Most subnets fail from cold-start — no skilled miners. We solve this via
KaggleIngest, bridging 15M+ Kaggle data scientists directly into Bittensor.
- !pip install openarena-kaggle — one-line onboarding
- Web2-clean leaderboard UI — no wallet required to compete
- Cold start solved: instant liquidity of intelligence
## Architecture
- Consensus: Bittensor (Yuma Consensus + Commit-Reveal)
- Entropy Source: LiveBench-2026-01-08 (private delayed questions)
- Scoring: Brier Score decomposition (accuracy + calibration)
- Frontend: Next.js with live generalization leaderboard
- Security: SHA-256 commit hashes prevent plagiarism
<p>- Whitepaper: Formalized "Proof of Intelligence" game theory and </p><p> Generalization Score formula (S = Accuracy × Calibration − Latency).</p><p>- Commit-Reveal: Implemented cryptographic anti-plagiarism scheme </p><p> in openarena/utils/<a href="http://crypto.py">crypto.py</a>.</p><p>- Validator Loop: Built LiveBench task dispatcher with epoch-based cadence.</p><p>- Miner Loop: Built LLM inference agent with commit → reveal flow.</p><p>- Simulation: <a href="http://demo.py">demo.py</a> proves honest miners win; copycat miners are slashed.</p><p>- Frontend: Next.js brutalist dashboard with live mock leaderboard and </p><p> Mermaid architecture diagram at <a href="http://openarena.kaggleingest.com">openarena.kaggleingest.com</a>.</p><p>- <a href="http://PROPOSAL.md">PROPOSAL.md</a>: Full Ridges-template subnet design proposal in repo root.</p><p></p>
<p>Not funded. Bootstrapped for the ideathon. Seeking seed to audit consensus logic and launch incentivized testnet in Q3 2026.</p><p></p>