Pairwise votes
Each judge compares every persona head-to-head. Ties are dropped. We log the winner, loser, judge, optional category, and timestamp in pairwise_vote.
Consultaion
Consultaion combines per-judge pairwise ballots, Elo-style updates, and Wilson confidence intervals to keep the leaderboard statistically honest.
Each judge compares every persona head-to-head. Ties are dropped. We log the winner, loser, judge, optional category, and timestamp in pairwise_vote.
Ratings start at 1500. We apply Elo updates with K=32 for the first 15 matches, then K=24. This mirrors the Bradley–Terry logistic model and stays stable for live updates.
Win rate is shown with a 95% Wilson interval. New personas get a NEW badge until they reach 15 matches.
Ratings update immediately after each debate. Admins can force recomputes via POST /ratings/update/<debate_id>.
Runs inherit the creator’s scope. Public leaderboard rows ignore private debates unless shared. Abusive streaks can be zeroed by clearing pairwise rows for that debate.
Consultaion provides enterprise-grade multi-agent deliberation with transparent scoring, full audit trails, and API access.
API Access
SSO Integration
Custom Deployment