hackquest logo

OpenArena

The Proof of Intelligence. A decentralized adversarial evaluation protocol on Bittensor powered by LiveBench.

Videos

Description

# OpenArena: The Truth Machine for AI

## The Problem: Benchmark Saturation

Static benchmarks (GSM8K, MMLU) are dead. Frontier models score 90%+ by

memorizing test sets but fail on novel problems. The industry cannot

distinguish a model that remembers from a model that reasons.

## The Solution: Dynamic Adversarial Evaluation

OpenArena is a decentralized Bittensor subnet where:

1. Validators pull fresh, contamination-free tasks from LiveBench

(a continuously updated, private-delayed benchmark — mathematically

impossible to memorize).

2. Miners solve tasks under a cryptographic Commit-Reveal scheme

(prevents front-running and answer copying).

3. Scoring uses the Generalization Score:

S = (Accuracy × Calibration) − Latency

Brier scoring penalizes hallucination and rewards calibrated confidence.

## The Unfair Advantage: KaggleIngest

Most subnets fail from cold-start — no skilled miners. We solve this via

KaggleIngest, bridging 15M+ Kaggle data scientists directly into Bittensor.

- !pip install openarena-kaggle — one-line onboarding

- Web2-clean leaderboard UI — no wallet required to compete

- Cold start solved: instant liquidity of intelligence

## Architecture

- Consensus: Bittensor (Yuma Consensus + Commit-Reveal)

- Entropy Source: LiveBench-2026-01-08 (private delayed questions)

- Scoring: Brier Score decomposition (accuracy + calibration)

- Frontend: Next.js with live generalization leaderboard

- Security: SHA-256 commit hashes prevent plagiarism

Progress During Hackathon

<p>- Whitepaper: Formalized "Proof of Intelligence" game theory and </p><p> Generalization Score formula (S = Accuracy × Calibration − Latency).</p><p>- Commit-Reveal: Implemented cryptographic anti-plagiarism scheme </p><p> in openarena/utils/<a href="http://crypto.py">crypto.py</a>.</p><p>- Validator Loop: Built LiveBench task dispatcher with epoch-based cadence.</p><p>- Miner Loop: Built LLM inference agent with commit → reveal flow.</p><p>- Simulation: <a href="http://demo.py">demo.py</a> proves honest miners win; copycat miners are slashed.</p><p>- Frontend: Next.js brutalist dashboard with live mock leaderboard and </p><p> Mermaid architecture diagram at <a href="http://openarena.kaggleingest.com">openarena.kaggleingest.com</a>.</p><p>- <a href="http://PROPOSAL.md">PROPOSAL.md</a>: Full Ridges-template subnet design proposal in repo root.</p><p></p>

Tech Stack

Python
AI
Web3
Next

Fundraising Status

<p>Not funded. Bootstrapped for the ideathon. Seeking seed to audit consensus logic and launch incentivized testnet in Q3 2026.</p><p></p>

Sector
AIInfra

Builders Also Viewed