OpenArena

Videos

Description

# OpenArena: The Truth Machine for AI

## The Problem: Benchmark Saturation

Static benchmarks (GSM8K, MMLU) are dead. Frontier models score 90%+ by

memorizing test sets but fail on novel problems. The industry cannot

distinguish a model that remembers from a model that reasons.

## The Solution: Dynamic Adversarial Evaluation

OpenArena is a decentralized Bittensor subnet where:

1. Validators pull fresh, contamination-free tasks from LiveBench

(a continuously updated, private-delayed benchmark — mathematically

impossible to memorize).

2. Miners solve tasks under a cryptographic Commit-Reveal scheme

(prevents front-running and answer copying).

3. Scoring uses the Generalization Score:

S = (Accuracy × Calibration) − Latency

Brier scoring penalizes hallucination and rewards calibrated confidence.

## The Unfair Advantage: KaggleIngest

Most subnets fail from cold-start — no skilled miners. We solve this via

KaggleIngest, bridging 15M+ Kaggle data scientists directly into Bittensor.

- !pip install openarena-kaggle — one-line onboarding

- Web2-clean leaderboard UI — no wallet required to compete

- Cold start solved: instant liquidity of intelligence

## Architecture

- Consensus: Bittensor (Yuma Consensus + Commit-Reveal)

- Entropy Source: LiveBench-2026-01-08 (private delayed questions)

- Scoring: Brier Score decomposition (accuracy + calibration)

- Frontend: Next.js with live generalization leaderboard

- Security: SHA-256 commit hashes prevent plagiarism

Progress During Hackathon

- Whitepaper: Formalized "Proof of Intelligence" game theory and Generalization Score formula (S = Accuracy × Calibration − Latency).- Commit-Reveal: Implemented cryptographic anti-plagiarism scheme in openarena/utils/<a href="http://crypto.py">crypto.py</a>.- Validator Loop: Built LiveBench task dispatcher with epoch-based cadence.- Miner Loop: Built LLM inference agent with commit → reveal flow.- Simulation: <a href="http://demo.py">demo.py</a> proves honest miners win; copycat miners are slashed.- Frontend: Next.js brutalist dashboard with live mock leaderboard and Mermaid architecture diagram at <a href="http://openarena.kaggleingest.com">openarena.kaggleingest.com</a>.- <a href="http://PROPOSAL.md">PROPOSAL.md</a>: Full Ridges-template subnet design proposal in repo root.

Tech Stack

Python

AI

Web3

Not funded. Bootstrapped for the ideathon. Seeking seed to audit consensus logic and launch incentivized testnet in Q3 2026.

OpenArena

Videos

Description

Progress During Hackathon

Tech Stack

Fundraising Status

Github Link

Builders Also Viewed