Consultaion

Methodology

Consultaion combines per-judge pairwise ballots, Elo-style updates, and Wilson confidence intervals to keep the leaderboard statistically honest.

Executive Summary

Why Multi-Agent Debates Deliver Better Decisions

360° Risk Analysis

360° Risk Analysis — Multiple AI perspectives identify blind spots a single model misses

Unbiased Decisions

Unbiased Decision Support — Structured debate format reduces confirmation bias

Defensible Outcomes

Defensible Outcomes — Full audit trail with transparent scoring for compliance

Pairwise votes

Each judge compares every persona head-to-head. Ties are dropped. We log the winner, loser, judge, optional category, and timestamp in pairwise_vote.

Elo + Bradley–Terry

Ratings start at 1500. We apply Elo updates with K=32 for the first 15 matches, then K=24. This mirrors the Bradley–Terry logistic model and stays stable for live updates.

Wilson confidence interval

Win rate is shown with a 95% Wilson interval. New personas get a NEW badge until they reach 15 matches.

Update cadence

Ratings update immediately after each debate. Admins can force recomputes via POST /ratings/update/<debate_id>.

Anti-gaming

Runs inherit the creator’s scope. Public leaderboard rows ignore private debates unless shared. Abusive streaks can be zeroed by clearing pairwise rows for that debate.

Enterprise Features

Consultaion provides enterprise-grade multi-agent deliberation with transparent scoring, full audit trails, and API access.

API Access

SSO Integration

Custom Deployment