scorio.rank
Ranking methods for comparing multiple models.
This module implements ranking estimators for binary response tensors
used in scorio evaluation workflows.
Notation
All methods use the same core notation:
\(R \in \{0,1\}^{L \times M \times N}\) is the response tensor.
\(L\) is the number of models.
\(M\) is the number of questions.
\(N\) is the number of trials per question.
\(R_{lmn}=1\) means model \(l\) solves question \(m\) on trial \(n\).
For single-trial input, (L, M) is promoted to (L, M, 1).
Longitudinal variants of scorio.rank.dynamic_irt()
(variant="growth" and variant="state_space") interpret axis
\(N\) as ordered measurement time rather than i.i.d. trials.
Every ranking method computes raw scores \(s_l\) and then maps those scores to ranks:
By default, methods return r as a one-dimensional array of shape (L,).
With return_scores=True, methods return (r, s).
Available Methods
Evaluation metric-based: avg, bayes, pass_at_k, pass_hat_k, g_pass_at_k_tau, mg_pass_at_k.
Paired-comparison probabilistic models: bradley_terry, bradley_terry_map, bradley_terry_davidson, bradley_terry_davidson_map, rao_kupper, rao_kupper_map.
Accuracy-based: inverse_difficulty.
Pairwise rating systems: elo, glicko, trueskill.
Bayesian methods: thompson, bayesian_mcmc.
Voting methods: borda, copeland, win_rate, minimax, schulze, ranked_pairs, kemeny_young, nanson, baldwin, majority_judgment.
Item Response Theory: rasch, rasch_map, rasch_2pl, rasch_2pl_map, rasch_3pl, rasch_3pl_map, rasch_mml, rasch_mml_credible, dynamic_irt.
Graph-based: pagerank, spectral, rank_centrality, alpharank, nash.
Seriation-based: serial_rank.
Hodge-theoretic: hodge_rank.
Listwise and setwise choice models (Luce family): plackett_luce, plackett_luce_map, davidson_luce, davidson_luce_map, bradley_terry_luce, bradley_terry_luce_map.
Examples
>>> import numpy as np
>>> from scorio import rank
>>> # Generate sample data: 3 models, 5 questions, 4 trials
>>> R = np.random.randint(0, 2, size=(3, 5, 4))
>>> # Get rankings using different methods
>>> ranks_bayes = rank.bayes(R, w=np.array([0.0, 1.0]))
>>> ranks_pass_at_k = rank.pass_at_k(R, k=2)
>>> # Get raw scores when needed
>>> ranking, scores = rank.avg(R, return_scores=True)
Notation
All ranking methods use response tensors \(R \in \{0,1\}^{L \times M \times N}\),
with \(L\) models, \(M\) questions, and \(N\) trials per question.
Methods compute raw scores \(s_l\) and then convert scores to ranks with
scorio.utils.rank_scores.
Prior Classes
Prior penalties for MAP ranking methods.
- Method context:
Several ranking methods in
scorio.rankestimate latent log-strengthsthetavia a MAP objective of the form\[\mathcal{L}_{\text{MAP}}(\theta) = \mathcal{L}_{\text{NLL}}(\theta) + P(\theta),\]where
P(theta)is the prior penalty (negative log-prior up to constants). This module defines reusable prior classes implementingP(theta)through a commonPriorinterface.
- class Prior[source]
Bases:
ABCAbstract interface for prior penalties on log-strength parameters.
- Method context:
Ranking MAP solvers optimize a negative objective and add a prior penalty term. Subclasses define that term via
penalty().
Notes
thetashould be interpreted as centered log-strengths in most Scorio MAP estimators because strengths are identifiable only up to an additive constant in log-space.- abstractmethod penalty(theta)[source]
Evaluate the prior penalty
P(theta).- Parameters:
theta (
ndarray) – Log-strength vector of shape(L,).- Return type:
- Returns:
Scalar penalty to be added to a negative log-likelihood objective.
- Notation:
theta_idenotes the latent log-strength for modeli.- Formula:
Subclasses implement concrete forms of
\[P(\theta) = -\log p(\theta) + \text{constant}.\]
- class EmpiricalPrior(R0, var=1.0, eps=1e-6)[source]
Bases:
PriorEmpirical Gaussian prior from prior outcome tensor
R0.- Method context:
Builds a model-specific prior mean from empirical accuracies in
R0and applies a shared Gaussian variance around those means. This is useful when a deterministic baseline (for example greedy decoding) should inform stochastic ranking.
References
Hariri, M., Samandar, A., Hinczewski, M., & Chaudhary, V. (2026). Don’t Pass@k: A Bayesian Framework for Large Language Model Evaluation. https://arxiv.org/abs/2510.04265
- Parameters:
Examples
>>> import numpy as np >>> R0 = np.array([ ... [1, 1, 1, 0, 1], ... [0, 1, 0, 0, 1], ... ]) >>> prior = EmpiricalPrior(R0, var=1.0) >>> prior.prior_mean.shape (2,)
- Notation:
acc_i: mean empirical accuracy for modeliinR0.mu_i: empirical logit mean.- Formula:
- \[\mu_i = \operatorname{logit}(\operatorname{clip}(acc_i, \epsilon, 1-\epsilon))\]\[P(\theta) = \frac{1}{2\,\mathrm{var}} \sum_i (\theta_i - \mu_i)^2\]
Notes
Prior means are centered to match BT-style identifiability constraints.
- penalty(theta)[source]
Evaluate empirical-Gaussian penalty around learned prior means.
- Parameters:
theta (
ndarray) – Log-strength vector of shape(L,).- Return type:
- Returns:
Scalar penalty value.
- Raises:
ValueError – If
thetalength differs from prior model count.
- class GaussianPrior(mean=0.0, var=1.0)[source]
Bases:
PriorIsotropic Gaussian prior on log-strengths.
- Method context:
Standard L2-style regularization used in many MAP ranking objectives.
- Parameters:
- Returns:
penalty(theta)returns a scalar quadratic penalty.
- Formula:
- \[P(\theta) = \frac{1}{2\,\mathrm{var}}\sum_i (\theta_i-\mathrm{mean})^2\]
Examples
>>> prior = GaussianPrior(mean=0.0, var=1.0) >>> prior.penalty(np.array([0.5, -0.5])) 0.25
- class LaplacePrior(loc=0.0, scale=1.0)[source]
Bases:
PriorLaplace prior on log-strengths.
- Method context:
L1-style shrinkage prior that can be more robust than Gaussian around outliers and may encourage sparse deviations from
loc.
Examples
>>> prior = LaplacePrior(loc=0.0, scale=1.0) >>> prior.penalty(np.array([0.5, -0.5])) 1.0
- Formula:
- \[P(\theta) = \frac{1}{\mathrm{scale}} \sum_i |\theta_i - \mathrm{loc}|\]
- class CauchyPrior(loc=0.0, scale=1.0)[source]
Bases:
PriorCauchy prior on log-strengths.
- Method context:
Heavy-tailed prior that penalizes extreme values less aggressively than Gaussian/Laplace priors.
Examples
>>> prior = CauchyPrior(loc=0.0, scale=1.0) >>> prior.penalty(np.array([2.0, -2.0])) 3.218...
- Formula:
- \[P(\theta) = \sum_i \log\left(1 + \left(\frac{\theta_i-\mathrm{loc}}{\mathrm{scale}}\right)^2\right)\]
- class UniformPrior[source]
Bases:
PriorImproper uniform prior on log-strengths.
- Method context:
Disables regularization in MAP routines (equivalent to ML objective).
Examples
>>> prior = UniformPrior() >>> prior.penalty(np.array([100.0, -100.0])) 0.0
- Formula:
- \[P(\theta) = 0\]
- class CustomPrior(penalty_fn)[source]
Bases:
PriorUser-defined prior penalty wrapper.
- Method context:
Allows custom regularization while preserving the
Priorinterface expected by MAP estimators.
Examples
>>> def horseshoe_penalty(theta): ... return np.log1p(theta**2).sum() >>> prior = CustomPrior(horseshoe_penalty) >>> float(prior.penalty(np.array([0.0, 1.0]))) > 0.0 True
Evaluation-based Ranking Methods
Evaluation-metric ranking methods.
These methods map each model’s responses to a scalar score and then convert
scores to ranks with scorio.utils.rank_scores().
Notation
Let \(R \in \{0,1,\ldots,C\}^{L \times M \times N}\) denote model outcomes, and define per-question correct-count summaries \(k_{lm}=\sum_{n=1}^{N} R_{lmn}\) when outcomes are binary.
The module follows the score template
where \(g_m\) depends on the selected evaluation metric.
- avg(R, method='competition', return_scores=False)[source]
Rank models by mean accuracy over all questions and trials.
- Method context:
This is the simplest pointwise ranking baseline: each model receives one score equal to its empirical success rate across all
M * Noutcomes.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed torank_scores. One of"competition","dense","avg","competition_max".return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returnsscoresof shape(L,).
- Notation:
R[l, m, n]is the binary outcome for modell, questionm, trialn.- Formula:
- \[s_l^{\mathrm{avg}} = \frac{1}{MN} \sum_{m=1}^{M}\sum_{n=1}^{N} R_{lmn}\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [0, 1]], ... [[1, 0], [0, 0]], ... ]) >>> ranks, scores = rank.avg(R, return_scores=True) >>> scores.round(3).tolist() [0.75, 0.25] >>> ranks.tolist() [1, 2]
- bayes(R, w=None, R0=None, quantile=None, method='competition', return_scores=False)[source]
Rank models with Bayes@N posterior statistics.
- Method context:
For each model, this method computes Bayes@N posterior summary statistics
(mu_l, sigma_l)from categorical outcomes. Ranking can be based on posterior mean (default) or a Normal-quantile conservative score.
References
Hariri, M., Samandar, A., Hinczewski, M., & Chaudhary, V. (2026). Don’t Pass@k: A Bayesian Framework for Large Language Model Evaluation. ICLR 2026, arXiv:2510.04265. https://arxiv.org/abs/2510.04265
- Parameters:
R (
ndarray) – Categorical outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1). Entries must be integers in{0, ..., C}.w (
ndarray|None) – Weight vector of shape(C+1,)mapping categories to scores. If not provided and R is binary (contains only 0 and 1), defaults to[1, 0]. For non-binary R, w is required.R0 (
ndarray|None) – Optional prior outcomes. Supported shapes: -(M, D): one shared prior matrix reused for all models. -(L, M, D): model-specific prior outcomes.quantile (
float|None) – Optional quantileqin[0, 1]. IfNone, rank by posterior mean. Otherwise rank bymu_l + Phi^{-1}(q) sigma_l.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule for score-to-rank conversion.return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns per-model scores used for ranking (posterior means or quantile scores), shape(L,).
- Notation:
mu_l, sigma_lare Bayes@N posterior mean and uncertainty for modellcomputed byscorio.eval.bayes().- Formula:
- \[\begin{split}s_l = \begin{cases} \mu_l, & \text{if } q\text{ is None} \\ \mu_l + \Phi^{-1}(q)\,\sigma_l, & \text{otherwise} \end{cases}\end{split}\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 0], [1, 1], [0, 0]], ... [[0, 0], [1, 0], [1, 1]], ... ]) >>> w = np.array([0.0, 1.0]) >>> R0 = np.array([[1, 1], [0, 1], [0, 0]]) # shared prior >>> ranks, scores = rank.bayes(R, w=w, R0=R0, return_scores=True) >>> ranks.shape, scores.shape ((2,), (2,))
Notes
Lower quantiles (for example
q=0.05) implement conservative ranking by penalizing posterior uncertainty.
- pass_at_k(R, k, method='competition', return_scores=False)[source]
Rank models by the Pass@k metric.
- Method context:
Pass@k measures the probability that at least one of
kdraws without replacement is correct for a question. Scores are averaged across questions per model.
References
Chen, M., Tworek, J., Jun, H., et al. (2021). Evaluating Large Language Models Trained on Code. arXiv:2107.03374. https://arxiv.org/abs/2107.03374
- Parameters:
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns per-model Pass@k scores.
- Notation:
nu_lm = sum_{n=1}^N R_lmnis the number of successes for modellon questionm.- Formula:
- \[s_l^{\mathrm{Pass@}k} = \frac{1}{M} \sum_{m=1}^{M} \left(1 - \frac{{N-\nu_{lm} \choose k}}{{N \choose k}}\right)\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1, 0], [0, 1, 0]], ... [[1, 0, 0], [0, 0, 0]], ... ]) >>> ranks, scores = rank.pass_at_k(R, k=2, return_scores=True) >>> ranks.tolist() [1, 2]
- pass_hat_k(R, k, method='competition', return_scores=False)[source]
Rank models by Pass-hat@k (G-Pass@k).
- Method context:
Pass-hat@k is the probability that all
kselected samples are correct for a question, then averaged across questions.
References
Yao, S., Shinn, N., Razavi, P., & Narasimhan, K. (2024). tau-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains. arXiv:2406.12045. https://arxiv.org/abs/2406.12045
- Parameters:
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns per-model Pass-hat@k scores.
- Notation:
nu_lm = sum_{n=1}^N R_lmn.- Formula:
- \[s_l^{\widehat{\mathrm{Pass@}k}} = \frac{1}{M} \sum_{m=1}^{M} \frac{{\nu_{lm} \choose k}}{{N \choose k}}\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1, 0], [0, 1, 0]], ... [[1, 0, 0], [0, 0, 0]], ... ]) >>> rank.pass_hat_k(R, k=1).tolist() [1, 2]
- g_pass_at_k_tau(R, k, tau, method='competition', return_scores=False)[source]
Rank models by generalized G-Pass@k_tau.
- Method context:
G-Pass@k_tau measures the probability of obtaining at least
ceil(tau * k)successes inkdraws without replacement. It interpolates between Pass@k (small tau) and Pass-hat@k (tau=1).
References
Liu, J., Liu, H., Xiao, L., et al. (2025). Are Your LLMs Capable of Stable Reasoning? arXiv:2412.13147. https://arxiv.org/abs/2412.13147
- Parameters:
R (
ndarray) – Binary outcome tensor of shape(L, M, N)or matrix(L, M).k (
int) – Number of selected samples, with1 <= k <= N.tau (
float) – Threshold parameter in[0, 1].method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule forrank_scores.return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns per-model G-Pass@k_tau scores.
- Notation:
X_lm ~ Hypergeom(N, nu_lm, k)wherenu_lmis the success count for modelland questionm.- Formula:
- \[s_l^{\mathrm{G\text{-}Pass@}k_\tau} = \frac{1}{M} \sum_{m=1}^{M} \Pr\left(X_{lm} \ge \lceil \tau k \rceil\right)\]\[\Pr\left(X_{lm} \ge \lceil \tau k \rceil\right) = \sum_{j=\lceil \tau k \rceil}^{k} \frac{{\nu_{lm} \choose j}{N-\nu_{lm} \choose k-j}} {{N \choose k}}\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1, 0], [0, 1, 0]], ... [[1, 0, 0], [0, 0, 0]], ... ]) >>> rank.g_pass_at_k_tau(R, k=2, tau=1.0).tolist() == rank.pass_hat_k(R, 2).tolist() True
- mg_pass_at_k(R, k, method='competition', return_scores=False)[source]
Rank models by mG-Pass@k (mean generalized pass metric).
- Method context:
mG-Pass@k aggregates G-Pass@k_tau for
tau in [0.5, 1]via the discrete summation proposed in the G-Pass literature, producing a stability-focused score.
References
Liu, J., Liu, H., Xiao, L., et al. (2025). Are Your LLMs Capable of Stable Reasoning? arXiv:2412.13147. https://arxiv.org/abs/2412.13147
- Parameters:
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns per-model mG-Pass@k scores.
- Notation:
X_lm ~ Hypergeom(N, nu_lm, k), andm0 = ceil(k/2).- Formula:
- \[s_l^{\mathrm{mG\text{-}Pass@}k} = \frac{1}{M} \sum_{m=1}^{M} \frac{2}{k} \sum_{i=m_0+1}^{k} \Pr(X_{lm} \ge i)\]\[\frac{2}{k} \sum_{i=m_0+1}^{k} \Pr(X_{lm} \ge i) = \frac{2}{k} \, \mathbb{E}\left[(X_{lm}-m_0)_+\right]\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1, 0], [0, 1, 0]], ... [[1, 0, 0], [0, 0, 0]], ... ]) >>> ranks, scores = rank.mg_pass_at_k(R, k=2, return_scores=True) >>> ranks.tolist() [1, 2]
Pointwise Methods
Pointwise ranking methods.
Pointwise methods aggregate each model’s item-level performance into one scalar without explicit head-to-head modeling.
Notation
Let \(R \in \{0,1\}^{L \times M \times N}\) and define \(k_{lm}=\sum_{n=1}^{N} R_{lmn}\), \(\widehat{p}_{lm}=k_{lm}/N\).
The pointwise score template is
where \(\phi_m\) is an item statistic, \(g\) is a fixed transform, and \(w_m\) are nonnegative weights.
- inverse_difficulty(R, method='competition', return_scores=False, clip_range=(0.01, 0.99))[source]
Rank models by inverse-difficulty-weighted per-question accuracy.
- Method context:
This pointwise method upweights hard questions by assigning each question an inverse weight proportional to the reciprocal global solve rate. It emphasizes rare successes while still aggregating all questions into one scalar score per model.
References
Inverse probability weighting (Wikipedia): https://en.wikipedia.org/wiki/Inverse_probability_weighting
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores(). One of"competition","competition_max","dense", or"avg".return_scores (
bool) – IfTrue, return(ranking, scores).clip_range (
tuple) – Two-sided clipping interval(a, b)applied to global solve rates before inversion. Must satisfy0 < a < b <= 1.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns inverse-difficulty scores of shape(L,).
- Notation:
k_lm = sum_{n=1}^N R_lmnandp_hat_lm = k_lm / N. The global per-question solve rate isp_bar_m = (1/L) * sum_l p_hat_lm.- Formula:
- \[w_m \propto \frac{1}{\operatorname{clip}(\bar p_m, a, b)}, \quad \sum_{m=1}^{M} w_m = 1\]\[s_l^{\mathrm{inv\text{-}diff}} = \sum_{m=1}^{M} w_m \, \widehat{p}_{lm}\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [0, 0], [0, 0]], ... [[0, 0], [1, 1], [0, 0]], ... ]) >>> ranks, scores = rank.inverse_difficulty(R, return_scores=True) >>> ranks.tolist() [1, 1] >>> scores.shape (2,)
>>> # tighter clipping is allowed >>> rank.inverse_difficulty(R, clip_range=(0.05, 0.95)).shape (2,)
Notes
Very small clip lower bounds can make the weighting highly sensitive to a few rare solves. This implementation is a simple inverse-weighted pointwise scorer.
Rank Centrality Methods
Rank Centrality: ranking from pairwise comparisons.
This module implements the Markov-chain estimator of Negahban, Oh, and Shah (2017).
Notation
Let \(R \in \{0,1\}^{L \times M \times N}\). For each pair \((i,j)\), let \(W_{ij}\) be decisive wins of \(i\) over \(j\). Define the pairwise empirical probabilities \(\widehat{P}_{i\succ j}\) from decisive outcomes (optionally tie adjusted).
Rank Centrality builds a row-stochastic transition matrix \(P\) with off-diagonal mass proportional to \(\widehat{P}_{j\succ i}\) and ranks by the stationary distribution \(\pi\):
- rank_centrality(R, method='competition', return_scores=False, tie_handling='half', smoothing=0.0, teleport=0.0, max_iter=10_000, tol=1e-12)[source]
Rank models with Rank Centrality.
- Method context:
Build a row-stochastic random-walk matrix over models where transition probabilities prefer moving from a model to models that beat it. The stationary distribution of this chain is the score vector.
- Parameters:
R (
ndarray) – Binary tensor of shape (L, M, N) (or (L, M) which is treated as N=1).method (
Literal['competition','competition_max','dense','avg']) – Ranking method passed to scorio.utils.rank_scores.return_scores (
bool) – If True, return (ranking, scores) where scores are the stationary distribution.tie_handling (
str) –“ignore”: only decisive comparisons (i correct, j incorrect)
”half”: treat ties (both same) as 0.5 win for each side
smoothing (
float) – Nonnegative pseudocount added to every directed win count. Use this to avoid disconnected graphs when tie_handling=”ignore”.teleport (
float) – Teleportation probability in [0, 1). When > 0, makes the Markov chain ergodic even if the comparison graph is disconnected.max_iter (
int) – Max iterations for the power method.tol (
float) – Convergence tolerance (L1 difference) for the power method.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns stationary-distribution scores (shape(L,)).
- Formula:
Let
d_maxbe the maximum degree of the undirected comparison graph. Fori != j,\[P_{ij} = \frac{1}{d_{\max}}\,\widehat{P}_{j\succ i}, \quad P_{ii} = 1 - \sum_{j\neq i} P_{ij}\]where
P_hatis computed from pairwise tied-split outcomes.
References
Negahban, S., Oh, S., & Shah, D. (2017). Rank Centrality: Ranking from Pairwise Comparisons. Operations Research.
HodgeRank
HodgeRank: statistical ranking via combinatorial Hodge theory.
This module implements the weighted \(\ell_2\) HodgeRank estimator from Jiang, Lim, Yao, and Ye (2009).
Notation
Let \(R \in \{0,1\}^{L \times M \times N}\) and define decisive wins \(W_{ij}\) and ties \(T_{ij}\) from pairwise model comparisons. From these counts, construct a skew-symmetric edge flow \(Y\) and nonnegative symmetric edge weights \(w_{ij}\).
HodgeRank estimates a potential vector \(s \in \mathbb{R}^L\) by
This yields the weighted Laplacian normal equations
where \(\Delta_0^{\dagger}\) is the Moore-Penrose pseudoinverse. Only score differences are identifiable; adding a constant to \(s\) does not change the ranking.
- hodge_rank(R, pairwise_stat='binary', weight_method='total', epsilon=0.5, method='competition', return_scores=False, return_diagnostics=False)[source]
Rank models with l2 HodgeRank on pairwise-comparison graphs.
- Method context:
HodgeRank treats aggregated pairwise outcomes as an edge flow and finds global potentials whose gradient best matches that flow in weighted least squares. The minimum-norm solution is obtained from a graph Laplacian pseudoinverse.
References
Jiang, X., Lim, L.-H., Yao, Y., & Ye, Y. (2009). Statistical Ranking and Combinatorial Hodge Theory. https://arxiv.org/abs/0811.1067
- Parameters:
R (
ndarray) – Binary tensor of shape (L, M, N) (or (L, M) treated as N=1).pairwise_stat (
str) –“binary”: Ȳ_ij = P(j>i) - P(i>j)
”log_odds”: Ȳ_ij = log(P(j>=i)/P(j<=i)) with additive smoothing
weight_method (
str) –“total”: w_ij = #comparisons (including ties)
”decisive”: w_ij = #non-tie comparisons (wins+losses)
”uniform”: w_ij = 1 if comparable else 0
epsilon (
float) – Additive smoothing (counts) used only for pairwise_stat=”log_odds”.method (
Literal['competition','competition_max','dense','avg']) – Ranking method passed to scorio.utils.rank_scores.return_scores (
bool) – If True, return (ranking, scores) where scores are the HodgeRank potentials s (higher ⇒ better).return_diagnostics (
bool) – If True, also returns a small diagnostics dict with least-squares residual norms.
- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,dict[str,float]]- Returns:
Ranking array of shape
(L,)by default. Ifreturn_scores=True, returns(ranking, scores). Ifreturn_diagnostics=True, returns(ranking, scores, diagnostics).
- Notation:
Y: skew-symmetric observed edge flow from pairwise outcomes.w_ij: symmetric nonnegative edge weights.(grad s)_ij = s_j - s_i.(div Y)_i = \sum_j w_{ij} Y_{ij}.\Delta_0: weighted graph Laplacian.- Formula:
- \[s^{\star} \in \arg\min_{s\in\mathbb{R}^{L}} \sum_{i<j} w_{ij}\left((s_j-s_i)-Y_{ij}\right)^2\]\[\Delta_0 s^{\star} = -\operatorname{div}(Y), \qquad s^{\star} = -\Delta_0^{\dagger}\operatorname{div}(Y)\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([[[1, 1], [1, 1]], [[0, 0], [0, 0]]]) >>> rank.hodge_rank(R).tolist() [1, 2]
Notes
Scores are identifiable only up to an additive constant; rankings depend on score differences. Diagnostics report weighted residual norms
||Y - grad(s)||_{2,w}.
Serial Rank
SerialRank: spectral ranking by seriation.
This module implements the SerialRank construction of Fogel, d’Aspremont, and Vojnovic.
Notation
Let \(R \in \{0,1\}^{L \times M \times N}\) and build a skew-symmetric pairwise comparison matrix \(C\), with \(C_{ij}>0\) indicating evidence that model \(i\) is stronger than model \(j\).
SerialRank forms
and ranks models by sorting a Fiedler vector of \(L_S\) (the eigenvector associated with the second-smallest eigenvalue).
- serial_rank(R, comparison='prob_diff', method='competition', return_scores=False)[source]
Rank models with SerialRank spectral seriation.
- Method context:
SerialRank builds a similarity matrix from pairwise comparisons, computes its graph Laplacian, and orders models by a Fiedler vector (second-smallest Laplacian eigenvector). This is a seriation-based ranking method designed to be robust to noisy pairwise outcomes.
References
Fogel, F., d’Aspremont, A., & Vojnovic, M. (2016). Spectral Ranking Using Seriation. Journal of Machine Learning Research. https://jmlr.org/papers/v17/16-035.html
- Parameters:
R (
ndarray) – Binary tensor of shape (L, M, N) (or (L, M) treated as N=1).comparison (
str) – How to aggregate multiple comparisons between a pair: - “prob_diff”: C_ij = (wins_ij - wins_ji) / total_ij in [-1, 1] - “sign”: C_ij = sign(wins_ij - wins_ji) in {-1, 0, 1}method (
Literal['competition','competition_max','dense','avg']) – Ranking method passed to scorio.utils.rank_scores.return_scores (
bool) – If True, return (ranking, scores) where scores are the oriented Fiedler vector (higher ⇒ better).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, returns(ranking, scores)with scores of shape(L,).
- Notation:
L: number of models.W_ij: number of decisive outcomes where modelibeatsj.T_ij: number of ties between modelsiandj.C: skew-symmetric comparison matrix.S: SerialRank similarity matrix.L_S: graph Laplacian ofS.- Formula:
- \[C_{ij} = \frac{W_{ij} - W_{ji}}{W_{ij} + W_{ji} + T_{ij}}, \quad C_{ii} = 0\]\[S = \frac{1}{2}\left(L\,\mathbf{1}\mathbf{1}^{\top} + C C^{\top}\right), \quad L_S = \operatorname{diag}(S\mathbf{1}) - S\]
Rank by sorting a Fiedler vector of
L_S. The sign is chosen to best align with observed pairwise comparisons.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([[[1, 1], [1, 0]], [[0, 0], [0, 1]]]) >>> rank.serial_rank(R).tolist() [1, 2]
Notes
The original SerialRank paper sets
C_ii = 1for binary tournaments. In this implementation we keepCskew-symmetric withC_ii = 0; this changesSonly by a constant diagonal shift and leaves the Laplacian (hence the ranking) unchanged.If the Fiedler eigenvalue is not simple, the ordering is not identifiable. In that degenerate case, the method falls back to mean accuracy scores.
Graph-based Methods
Graph-based ranking methods.
These methods convert pairwise outcomes into a graph or Markov chain and rank models with stationary-distribution, spectral, or equilibrium concepts.
Notation
Let \(R \in \{0,1\}^{L \times M \times N}\) and define decisive wins \(W_{ij}\) and ties \(T_{ij}\). A shared empirical pairwise relation is
Graph methods construct an operator \(\mathcal{G}(\widehat{P})\) and rank from a derived score vector \(s\), such as a stationary distribution, principal eigenvector, or game-theoretic equilibrium score.
- pagerank(R, damping=0.85, max_iter=100, tol=1e-6, method='competition', return_scores=False, teleport=None)[source]
Rank models with PageRank on the pairwise win-probability graph.
- Method context:
Build a directed graph where edge
j -> ihas weightP_hat[i, j](losers link to winners). Column-normalize to obtain a transition matrix and run the damped PageRank fixed point with a teleportation vectore.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).damping (
float) – PageRank damping factordin(0, 1).max_iter (
int) – Positive maximum number of power iterations.tol (
float) – Positive L1 convergence tolerance.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).teleport (
ndarray|None) – Optional teleportation vectore(shape(L,), nonnegative, finite). IfNone, uses uniform teleportation(1/L) * 1.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns PageRank scoresr(shape(L,)).
- Notation:
P_hat[i, j]is the tied-split empirical win probability of modeliagainst modelj.- Formula:
- \[P_{ij} = \frac{\widehat{P}_{i\succ j}} {\sum_{k\neq j}\widehat{P}_{k\succ j}}\]\[r = d P r + (1-d)\,e\]
where
eis a probability vector. The default choice is \(e = \frac{1}{L}\mathbf{1}\).
References
Page, L., et al. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> rank.pagerank(R).tolist() [1, 2]
- spectral(R, max_iter=10_000, tol=1e-12, method='competition', return_scores=False)[source]
Rank models by spectral centrality of pairwise win probabilities.
- Method context:
Form a nonnegative matrix
Wwith off-diagonal entriesW[i, j] = P_hat[i, j]and diagonal self-loop mass equal to row sum. The normalized dominant right eigenvector is the score vector. This is an eigenvector-based Perron-style spectral ranking heuristic.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).max_iter (
int) – Positive max iterations for power iteration.tol (
float) – Positive L1 convergence tolerance.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns normalized spectral scores (shape(L,)).
- Formula:
- \[W_{ij}=\widehat{P}_{i\succ j}\;(i\neq j),\quad W_{ii}=\sum_{j\neq i}W_{ij}\]\[v \propto W v,\quad \sum_i v_i = 1\]
References
Vigna, S. (2016). Spectral ranking. Network Science. Keener, J. P. (1993). The Perron-Frobenius theorem and the ranking of football teams. SIAM Review.
Examples
>>> import numpy as np >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> _, scores = spectral(R, return_scores=True) >>> scores[0] > scores[1] True
- alpharank(R, alpha=1.0, population_size=50, max_iter=100_000, tol=1e-12, method='competition', return_scores=False)[source]
Rank models with single-population alpha-Rank.
- Method context:
Treat models as strategies in a symmetric constant-sum game with payoff
P_hat[i, j]. Build fixation probabilities under Fermi selection in a finite population and compute the stationary distribution of the induced Markov chain. This corresponds to the single-population setting from the AlphaRank framework (the paper’s general method also covers multi-population and asymmetric games).
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).alpha (
float) – Selection intensityalpha >= 0.population_size (
int) – Finite population sizem >= 2.max_iter (
int) – Positive max iterations for stationary-distribution iteration.tol (
float) – Positive L1 convergence tolerance.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns alpha-Rank stationary distribution scores (shape(L,)).
- Formula:
- \[\begin{split}u = \alpha\frac{m}{m-1}\left(\widehat{P}_{r\succ s}-\frac12\right), \quad \rho_{r,s} = \begin{cases} \frac{1-e^{-u}}{1-e^{-mu}}, & u\neq 0 \\ \frac{1}{m}, & u=0 \end{cases}\end{split}\]\[C_{sr}=\frac{1}{L-1}\rho_{r,s},\quad C_{ss}=1-\sum_{r\neq s}C_{sr}\]
References
Omidshafiei, S., et al. (2019). α-Rank: Multi-Agent Evaluation by Evolution. Scientific Reports.
Examples
>>> import numpy as np >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> _, scores = alpharank(R, return_scores=True) >>> scores[0] > scores[1] True
- nash(R, n_iter=100, temperature=0.1, solver='lp', score_type='vs_equilibrium', return_equilibrium=False, method='competition', return_scores=False)[source]
Rank models via Nash equilibrium on the zero-sum meta-game.
- Method context:
Construct antisymmetric payoff matrix
A = 2 * P_hat - 1(with zero diagonal), solve a maximin mixed strategyx, then score models by expected performance versusx(Nash-averaging style) or alternative score views.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).n_iter (
int) – Unused when solver=”lp” (kept for backward compatibility).temperature (
float) – Unused when solver=”lp” (kept for backward compatibility).solver (
str) – Currently only “lp” is supported.score_type (
str) – Which score vector to rank by: - “vs_equilibrium”: expected win probability vs equilibrium opponent. - “equilibrium”: the equilibrium mixture itself. - “advantage_vs_equilibrium”: expected zero-sum advantage vs equilibrium.return_equilibrium (
bool) – If True, also return the equilibrium mixture.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,ndarray]- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns scores for the selectedscore_type. Ifreturn_equilibrium=True, also returns equilibrium mixturex.
- Formula:
- \[A_{ij} = 2\widehat{P}_{i\succ j} - 1,\quad A_{ii}=0\]\[x \in \arg\max_{x\in\Delta^{L-1}}\min_{y\in\Delta^{L-1}} x^\top A y\]\[s_i = \sum_j \widehat{P}_{i\succ j} x_j\]
Notes
The cited papers establish empirical Nash equilibria and Nash-based population evaluation in symmetric zero-sum games. The
score_typechoices here are practical per-model summaries derived from that equilibrium for ranking API use.References
Balduzzi, D., Garnelo, M., Bachrach, Y., Czarnecki, W. M., P{‘e}rolat, J., Jaderberg, M., & Graepel, T. (2019). Open-ended Learning in Symmetric Zero-sum Games. ICML. Balduzzi, D., Tuyls, K., P{‘e}rolat, J., & Graepel, T. (2018). Re-evaluating Evaluation. NeurIPS.
Examples
>>> import numpy as np >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks, scores, eq = nash(R, return_scores=True, return_equilibrium=True) >>> scores[0] > scores[1] True
Pairwise Methods
Sequential pairwise rating methods.
This module adapts Elo, TrueSkill, and Glicko to binary response tensors by streaming induced pairwise outcomes.
Notation
Let \(R \in \{0,1\}^{L \times M \times N}\). For each event \((m,n)\) and each pair \((i,j)\), define an observed pair score \(S_{ij}^{(m,n)} \in \{0, 0.5, 1\}\) according to the configured tie policy.
Each method updates latent rating states \(\Theta_i\) sequentially:
where \(s_i\) is the final score used for ranking.
- elo(R, K=32.0, initial_rating=1500.0, tie_handling='correct_draw_only', method='competition', return_scores=False)[source]
Rank models with sequential Elo updates on induced pairwise matches.
- Method context:
Each question-trial pair
(m, n)induces head-to-head outcomes for every model pair. Ratings are updated online, so final ratings depend on stream order.
References
Elo, A. E. (1978). The Rating of Chessplayers, Past and Present. Arco Publishing. https://archive.org/details/ratingofchesspla0000eloa
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).K (
float) – Positive Elo step size.initial_rating (
float) – Initial rating assigned to every model.tie_handling (
Literal['skip','draw','correct_draw_only']) – Policy for(1,1)and(0,0)outcomes: -"skip": ignore ties -"draw": treat both tie types as draws -"correct_draw_only": draw only for(1,1), skip(0,0)method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, ratings).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns final Elo ratings of shape(L,).
- Notation:
r_iis modelirating, andS_{ij}is the observed score for modeliagainstjin one induced match.- Formula:
- \[E_{ij} = \frac{1}{1 + 10^{(r_j-r_i)/400}}\]\[r_i \leftarrow r_i + K(S_{ij} - E_{ij}), \quad r_j \leftarrow r_j + K((1-S_{ij}) - (1-E_{ij}))\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks, ratings = rank.elo(R, return_scores=True) >>> ranks.tolist() [1, 2] >>> float(ratings[0] > ratings[1]) 1.0
>>> # tie handling can change updates >>> rank.elo(np.array([[[1]], [[1]]]), tie_handling="skip").tolist() [1, 1]
Notes
The implementation processes pairs in index order for each
(trial, question)event.
- trueskill(R, mu_initial=25.0, sigma_initial=25.0 / 3, beta=25.0 / 6, tau=25.0 / 300, method='competition', return_scores=False, tie_handling='skip', draw_margin=0.0)[source]
Rank models with a two-player TrueSkill update stream.
- Method context:
Each model has latent skill belief \(\mathcal{N}(\mu_i, \sigma_i^2)\). Every decisive pairwise result from
Rapplies a two-player TrueSkill update. Ties can be skipped or treated as draw events with an optional draw margin. Ranking uses finalmu.
References
Herbrich, R., Minka, T., & Graepel, T. (2006). TrueSkill(TM): A Bayesian Skill Rating System. NeurIPS 19. https://proceedings.neurips.cc/paper_files/paper/2006/file/f44ee263952e65b3610b8ba51229d1f9-Paper.pdf
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).mu_initial (
float) – Initial mean skill.sigma_initial (
float) – Initial skill standard deviation. Must be positive.beta (
float) – Performance-noise scale. Must be positive.tau (
float) – Per-round dynamics scale for sigma inflation. Must be nonnegative.method (
Literal['competition','competition_max','dense','avg']) – Ranking variant passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, mu).tie_handling (
Literal['skip','draw','correct_draw_only']) – Policy for tied outcomes(1,1)and(0,0): -"skip": ignore ties -"draw": treat both tie types as draws -"correct_draw_only": draw only for(1,1), skip(0,0)draw_margin (
float) – Nonnegative draw margin \(\epsilon\) in the performance-difference space.0recovers no-margin updates.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns finalmuvalues (shape(L,)).
- Notation:
For one match between models
iandj:c = sqrt(2 beta^2 + sigma_i^2 + sigma_j^2),t = (mu_i - mu_j) / c,epsilon = draw_margin / c.- Formula:
- \[v_{win}(t,\epsilon) = \frac{\phi(t-\epsilon)}{\Phi(t-\epsilon)},\quad w_{win}(t,\epsilon)=v_{win}(t,\epsilon)\left(v_{win}(t,\epsilon)+t-\epsilon\right)\]\[\mu_i' = \mu_i + \frac{\sigma_i^2}{c} v_{win}(t,\epsilon),\quad \mu_j' = \mu_j - \frac{\sigma_j^2}{c} v_{win}(t,\epsilon)\]\[\sigma_i'^2 = \sigma_i^2\left(1 - \frac{\sigma_i^2}{c^2} w_{win}(t,\epsilon)\right),\quad \sigma_j'^2 = \sigma_j^2\left(1 - \frac{\sigma_j^2}{c^2} w_{win}(t,\epsilon)\right)\]\[v_{draw}(t,\epsilon) = \frac{\phi(-\epsilon-t)-\phi(\epsilon-t)}{\Phi(\epsilon-t)-\Phi(-\epsilon-t)}\]\[w_{draw}(t,\epsilon) = v_{draw}(t,\epsilon)^2 + \frac{(\epsilon-t)\phi(\epsilon-t)-(-\epsilon-t)\phi(-\epsilon-t)}{\Phi(\epsilon-t)-\Phi(-\epsilon-t)}\]\[\mu_i' = \mu_i + \frac{\sigma_i^2}{c} v_{draw}(t,\epsilon),\quad \mu_j' = \mu_j - \frac{\sigma_j^2}{c} v_{draw}(t,\epsilon)\]\[\sigma_i'^2 = \sigma_i^2\left(1 - \frac{\sigma_i^2}{c^2} w_{draw}(t,\epsilon)\right),\quad \sigma_j'^2 = \sigma_j^2\left(1 - \frac{\sigma_j^2}{c^2} w_{draw}(t,\epsilon)\right)\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks, mu = rank.trueskill(R, return_scores=True) >>> ranks.tolist() [1, 2] >>> float(mu[0] > mu[1]) 1.0
Notes
This is a simplified two-player update stream, not full multiplayer factor-graph inference from the original TrueSkill paper.
- glicko(R, initial_rating=1500.0, initial_rd=350.0, c=0.0, rd_max=350.0, tie_handling='correct_draw_only', return_deviation=False, method='competition', return_scores=False)[source]
Rank models with Glicko rating and rating-deviation updates.
- Method context:
Glicko extends Elo by tracking both rating
rand uncertaintyRD. Each(question, trial)is treated as one rating period containing all pairwise induced matches for that slice.
References
Glickman, M. E. (1999). Parameter Estimation in Large Dynamic Paired Comparison Experiments. Journal of the Royal Statistical Society: Series C, 48(3), 377-394. https://doi.org/10.1111/1467-9876.00159
Glickman, M. E. (n.d.). The Glicko System. https://www.glicko.net/glicko/glicko.pdf
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).initial_rating (
float) – Initial ratingrfor every model.initial_rd (
float) – Initial rating deviationRDfor every model.c (
float) – Nonnegative RD inflation parameter between rating periods.rd_max (
float) – Positive hard cap forRD.tie_handling (
Literal['skip','draw','correct_draw_only']) – Policy for tied outcomes: -"skip": ignore(1,1)and(0,0)-"draw": treat both as draws -"correct_draw_only": draw for(1,1), skip(0,0)return_deviation (
bool) – IfTrue, returnRDin addition to ratings.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule forscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return ratings with ranking.
- Returns:
ranking array of shape
(L,).If
return_deviation=Falseandreturn_scores=True:(ranking, rating)whereratinghas shape(L,).If
return_deviation=True:(ranking, rating, rd)with both vectors shape(L,).- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,ndarray]
- Notation:
q = ln(10)/400,g(RD) = 1 / sqrt(1 + 3 q^2 RD^2 / pi^2). For modeliin one period with opponentsj:S_ijis observed score andE_ij = 1 / (1 + 10^{-g(RD_j)(r_i-r_j)/400}).- Formula:
- \[d_i^2 = \left(q^2 \sum_j g(RD_j)^2 E_{ij}(1-E_{ij})\right)^{-1}\]\[RD_i' = \left(\frac{1}{RD_i^2} + \frac{1}{d_i^2}\right)^{-1/2}, \quad r_i' = r_i + \frac{q}{\frac{1}{RD_i^2}+\frac{1}{d_i^2}} \sum_j g(RD_j)(S_{ij}-E_{ij})\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks, ratings, rds = rank.glicko(R, return_deviation=True) >>> ranks.tolist() [1, 2] >>> float(ratings[0] > ratings[1]) 1.0
Notes
Models with no games in a rating period keep rating unchanged but can still receive RD inflation when
c > 0.
Paired-Comparison Probabilistic Models
Paired-comparison (pairwise) probabilistic ranking models.
In Scorio’s binary tensor setting \(R \in \{0,1\}^{L \times M \times N}\),
each model pair (i, j) generates decisive wins and ties:
Paired-comparison models assume latent positive strengths \(\pi_i\) and map them to pairwise event probabilities. A common likelihood template is
where \(\eta\) denotes tie-related parameter(s) when present.
Main families implemented here:
Bradley-Terry (BT) (no explicit ties):
\[\Pr(i \succ j) = \frac{\pi_i}{\pi_i + \pi_j}.\]Davidson (1970) tie extension with \(\nu > 0\):
\[\Pr(i \sim j) = \frac{\nu\sqrt{\pi_i\pi_j}} {\pi_i + \pi_j + \nu\sqrt{\pi_i\pi_j}}.\]Rao-Kupper (1967) tie extension with \(\kappa \ge 1\):
\[\Pr(i \succ j) = \frac{\pi_i}{\pi_i + \kappa\pi_j}, \quad \Pr(i \sim j) = \frac{(\kappa^2-1)\pi_i\pi_j} {(\pi_i+\kappa\pi_j)(\kappa\pi_i+\pi_j)}.\]
Each family has ML and MAP variants (MAP adds a prior penalty on the latent log-strengths \(\theta_i=\log\pi_i\) for regularization and numerical stability).
- bradley_terry(R, method='competition', return_scores=False, max_iter=500)[source]
Rank models with Bradley-Terry maximum-likelihood strengths.
- Method context:
Pairwise decisive outcomes are modeled with positive strengths \(\pi_i\). Tied outcomes are ignored for BT-ML.
References
Bradley, R. A., & Terry, M. E. (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika. https://doi.org/10.1093/biomet/39.3-4.324
Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models. The Annals of Statistics. https://doi.org/10.1214/aos/1079120141
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns BT strengthspi(shape(L,)).
- Notation:
W_ijis the number of decisive outcomes where modelibeats modelj.- Formula:
- \[\Pr(i \succ j) = \frac{\pi_i}{\pi_i + \pi_j}\]\[\log p(W\mid\pi) = \sum_{i\ne j} W_{ij} \left[\log \pi_i - \log(\pi_i+\pi_j)\right]\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks, scores = rank.bradley_terry(R, return_scores=True) >>> ranks.tolist() [1, 2] >>> float(scores[0] > scores[1]) 1.0
Notes
If there are no decisive outcomes at all, all strengths are returned equal because relative ability is unidentifiable.
- bradley_terry_map(R, prior=1.0, method='competition', return_scores=False, max_iter=500)[source]
Rank models with Bradley-Terry MAP estimation.
- Method context:
This method adds a prior penalty on centered log-strengths
theta_i = log(pi_i)to regularize BT estimation.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).prior (
Prior|float) – Prior on log-strengths. Ifprioris a float, it is interpreted as Gaussian prior variance usingscorio.rank.GaussianPriorwith mean 0. Ifprioris aPriorinstance, its penalty is used directly.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns MAP strengthspi(shape(L,)).
- Notation:
theta_i = log(pi_i), andP(theta)denotes the prior.- Formula:
- \[\hat{\theta} = \arg\min_{\theta} \left[ -\log p(W\mid\theta) + \operatorname{penalty}(\theta) \right]\]\[\hat{\pi}_i = \exp(\hat{\theta}_i)\]
References
Caron, F., & Doucet, A. (2012). Efficient Bayesian inference for generalized Bradley-Terry models. Journal of Computational and Graphical Statistics. https://doi.org/10.1080/10618600.2012.638220
Examples
>>> import numpy as np >>> from scorio import rank >>> from scorio.rank import GaussianPrior >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks, scores = rank.bradley_terry_map(R, prior=1.0, return_scores=True) >>> ranks.tolist() [1, 2]
>>> prior = GaussianPrior(mean=0.0, var=0.5) >>> rank.bradley_terry_map(R, prior=prior).shape (2,)
Notes
With informative priors, MAP can remain identifiable in settings where BT-ML has weak or degenerate decisive information.
- bradley_terry_davidson(R, method='competition', return_scores=False, max_iter=500)[source]
Rank models with the Bradley-Terry-Davidson tie model (ML).
- Method context:
Davidson extends BT by introducing a tie parameter
nu > 0. In this benchmark setting, ties correspond to(1,1)or(0,0)outcomes for a model pair on the same question-trial event.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns Davidson strengthspi(shape(L,)).
- Notation:
W_ijare decisive win counts andT_ijtie counts.- Formula:
- \[\Pr(i\succ j) = \frac{\pi_i}{\pi_i+\pi_j+\nu\sqrt{\pi_i\pi_j}}, \quad \Pr(i\sim j) = \frac{\nu\sqrt{\pi_i\pi_j}}{\pi_i+\pi_j+\nu\sqrt{\pi_i\pi_j}}\]
References
Davidson, R. R. (1970). On extending the Bradley-Terry model to accommodate ties in paired comparison experiments. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.1970.10481082
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 0]], ... [[1, 0], [0, 0]], ... ]) >>> ranks, scores = rank.bradley_terry_davidson(R, return_scores=True) >>> ranks.tolist() [1, 2]
Notes
If there are ties but no decisive outcomes, strengths are set equal because relative ability is not identified from ties alone.
- bradley_terry_davidson_map(R, prior=1.0, method='competition', return_scores=False, max_iter=500)[source]
Rank models with the Bradley-Terry-Davidson tie model (MAP).
- Method context:
Adds a prior penalty on centered log-strengths on top of Davidson’s tie likelihood.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).prior (
Prior|float) – Prior on log-strengths. -float: interpreted as Gaussian prior variance with mean 0. -Priorinstance: custom prior penalty.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns Davidson-MAP strengthspi(shape(L,)).
- Notation:
theta_i = log(pi_i)andnuis the Davidson tie parameter.- Formula:
- \[\hat{\theta},\hat{\nu} = \arg\min_{\theta,\nu>0} \left[ -\log p(W,T\mid\theta,\nu) + \operatorname{penalty}(\theta) \right]\]
References
Davidson, R. R. (1970). On extending the Bradley-Terry model to accommodate ties in paired comparison experiments. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.1970.10481082
Caron, F., & Doucet, A. (2012). Efficient Bayesian inference for generalized Bradley-Terry models. https://doi.org/10.1080/10618600.2012.638220
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 0]], ... [[1, 0], [0, 0]], ... ]) >>> ranks, scores = rank.bradley_terry_davidson_map( ... R, prior=1.0, return_scores=True ... ) >>> ranks.tolist() [1, 2]
- rao_kupper(R, tie_strength=1.1, method='competition', return_scores=False, max_iter=500)[source]
Rank models with the Rao-Kupper tie model (ML).
- Method context:
Rao-Kupper augments paired comparison with a threshold parameter
kappa >= 1controlling tie prevalence. In this API,kappais a fixed hyperparameter (tie_strength), not estimated.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).tie_strength (
float) – Rao-Kupper parameterkappa >= 1.kappa=1reduces to no-tie BT.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns Rao-Kupper strengthspi(shape(L,)).
- Notation:
W_ijare decisive win counts andT_ijtie counts.- Formula:
- \[\Pr(i\succ j) = \frac{\pi_i}{\pi_i + \kappa\pi_j}, \quad \Pr(j\succ i) = \frac{\pi_j}{\kappa\pi_i + \pi_j}\]\[\Pr(i\sim j)= \frac{(\kappa^2-1)\pi_i\pi_j} {(\pi_i+\kappa\pi_j)(\kappa\pi_i+\pi_j)}\]
References
Rao, P. V., & Kupper, L. L. (1967). Ties in paired-comparison experiments: A generalization of the Bradley-Terry model. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.1967.10482901
Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models. The Annals of Statistics. https://doi.org/10.1214/aos/1079120141
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 0]], ... [[1, 0], [0, 0]], ... ]) >>> ranks, scores = rank.rao_kupper(R, tie_strength=1.1, return_scores=True) >>> ranks.tolist() [1, 2]
Notes
If
tie_strength=1and tie counts are present, the model is inconsistent and raisesValueError.
- rao_kupper_map(R, tie_strength=1.1, prior=1.0, method='competition', return_scores=False, max_iter=500)[source]
Rank models with the Rao-Kupper tie model (MAP).
- Method context:
Adds a prior penalty on centered log-strengths to Rao-Kupper likelihood while keeping
kappafixed.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).tie_strength (
float) – Rao-Kupper parameterkappa >= 1.prior (
Prior|float) – Prior on log-strengths. -float: interpreted as Gaussian prior variance with mean 0. -Priorinstance: custom prior penalty.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns Rao-Kupper MAP strengthspi(shape(L,)).
- Notation:
theta_i = log(pi_i)andkappais fixed.- Formula:
- \[\hat{\theta} = \arg\min_{\theta} \left[ -\log p(W,T\mid\theta,\kappa) + \operatorname{penalty}(\theta) \right]\]
References
Rao, P. V., & Kupper, L. L. (1967). Ties in paired-comparison experiments: A generalization of the Bradley-Terry model. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.1967.10482901
Caron, F., & Doucet, A. (2012). Efficient Bayesian inference for generalized Bradley-Terry models. https://doi.org/10.1080/10618600.2012.638220
Bayesian Ranking Methods
Bayesian ranking methods.
These methods rank models using posterior summaries rather than point estimates alone.
Notation
Let \(R \in \{0,1\}^{L \times M \times N}\). Each method introduces latent model parameters \(\theta_l\), computes a posterior \(p(\theta \mid R)\), and ranks with posterior scores \(s_l\).
The generic form is
where \(\mathbb{Q}_q\) denotes a posterior quantile.
- thompson(R, n_samples=10000, prior_alpha=1.0, prior_beta=1.0, seed=42, method='competition', return_scores=False)[source]
Rank models by Thompson-sampling posterior expected rank.
- Method context:
This method assumes each model has one latent Bernoulli success probability over all
M*Noutcomes and uses the conjugate Beta-Binomial posterior. Ranking score is the negative Monte Carlo estimate of posterior expected rank.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).n_samples (
int) – Positive number of posterior Monte Carlo samplesT.prior_alpha (
float) – Positive Beta prior alpha.prior_beta (
float) – Positive Beta prior beta.seed (
int) – Random seed for reproducibility.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returnsscores = - E[r_l | R]approximations (shape(L,)), where higher is better.
- Notation:
S_l = sum_{m,n} R[l,m,n]andT_tot = M*N.r_l^(t)is modellrank in posterior drawt.- Formula:
- \[p_l \mid R \sim \mathrm{Beta}(\alpha + S_l,\ \beta + T_{\mathrm{tot}} - S_l)\]\[s_l^{\mathrm{TS}} = -\frac{1}{T}\sum_{t=1}^{T} r_l^{(t)}\]
References
Thompson, W. R. (1933). On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika. https://doi.org/10.1093/biomet/25.3-4.285
Russo, D. J., et al. (2018). A Tutorial on Thompson Sampling. Foundations and Trends in Machine Learning. https://doi.org/10.1561/2200000070
Gelman, A., et al. (2013). Bayesian Data Analysis (3rd ed.). https://doi.org/10.1201/b16018
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks, scores = rank.thompson(R, n_samples=2000, return_scores=True) >>> ranks.tolist() [1, 2]
Notes
If all models have identical posterior Beta parameters, the exact expected ranks are equal and the implementation returns equal scores deterministically (instead of Monte Carlo tie-breaking noise).
- bayesian_mcmc(R, n_samples=5000, burnin=1000, prior_var=1.0, seed=42, method='competition', return_scores=False)[source]
Rank models via Bayesian Bradley-Terry posterior means from MCMC.
- Method context:
This method uses decisive pairwise counts with Bradley-Terry likelihood and independent Gaussian priors on log-strengths. Posterior inference is approximated by random-walk Metropolis-Hastings.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).n_samples (
int) – Positive number of retained MCMC samples.burnin (
int) – Nonnegative number of warmup iterations.prior_var (
float) – Positive Gaussian prior variance ontheta.seed (
int) – Random seed for reproducibility.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns posterior meanE[theta|W]estimates (shape(L,)).
- Notation:
W_ijis decisive wins where modelibeatsj.theta_iare log-strengths withpi_i = exp(theta_i).- Formula:
- \[\Pr(i\succ j\mid\theta)= \frac{\exp(\theta_i)}{\exp(\theta_i)+\exp(\theta_j)},\quad \theta_i\sim\mathcal{N}(0,\sigma^2)\]\[s_i^{\mathrm{MCMC}} = \mathbb{E}[\theta_i\mid W]\]
References
Bradley, R. A., & Terry, M. E. (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika. https://doi.org/10.1093/biomet/39.3-4.324
Metropolis, N., et al. (1953). Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics. https://doi.org/10.1063/1.1699114
Hastings, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika. https://doi.org/10.1093/biomet/57.1.97
Caron, F., & Doucet, A. (2012). Efficient Bayesian inference for generalized Bradley-Terry models. https://doi.org/10.1080/10618600.2012.638220
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks, scores = rank.bayesian_mcmc( ... R, n_samples=2000, burnin=500, return_scores=True ... ) >>> ranks.tolist() [1, 2]
Notes
If there are no decisive outcomes, posterior means are exactly equal under the symmetric Gaussian prior, and zeros are returned directly.
Item Response Theory Methods
Item Response Theory (IRT) ranking methods.
This module estimates latent model abilities and question parameters under binary IRT families.
Notation
Let \(R \in \{0,1\}^{L \times M \times N}\) and \(k_{lm}=\sum_{n=1}^{N} R_{lmn}\). Model abilities are \(\theta_l\); item parameters include difficulty \(b_m\), discrimination \(a_m\), and optional pseudo-guessing \(c_m\).
A general binary IRT response model is
Special cases:
1PL (Rasch): \(a_m=1\), \(c_m=0\).
2PL: \(c_m=0\), free \(a_m\) and \(b_m\).
3PL: free \(a_m\), \(b_m\), and \(c_m\).
Rankings are induced by ability scores \(s_l\), typically \(s_l=\hat\theta_l\) or a posterior summary of \(\theta_l\).
The module includes maximum-likelihood and joint maximum-likelihood estimators, MAP variants with configurable priors, and MML-EAP estimators.
- rasch(R, method='competition', return_scores=False, max_iter=500, return_item_params=False)[source]
Rank models with Rasch (1PL) IRT via joint MLE.
- Method context:
Each model
lhas latent abilitytheta_land each questionmhas difficultyb_m. We estimate both by maximizing the binomial likelihood over per-question correct counts.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.return_item_params (
bool) – If True, also returns estimated item parameters (difficulty). Implies returning scores.
- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,dict[str,ndarray]]- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns ability scorestheta(shape(L,)). Ifreturn_item_params=True, also returns{"difficulty": b}(shape(M,)).
- Notation:
k_{lm} = sum_n R_{lmn}is the correct-count for modelland questionm.- Formula:
- \[k_{lm} \sim \mathrm{Binomial}\left(N,\sigma(\theta_l-b_m)\right)\]\[b \leftarrow b - \frac{1}{M}\sum_m b_m\]
References
Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks, scores = rank.rasch(R, return_scores=True) >>> ranks.tolist() [1, 2]
- rasch_map(R, prior=1.0, method='competition', return_scores=False, max_iter=500, return_item_params=False)[source]
Rank models with Rasch (1PL) IRT via MAP estimation.
- Method context:
Same likelihood as
rasch(), with an additional prior penalty on abilitiesthetafor shrinkage and numerical stability.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).prior (
Prior|float) – Ability prior. Afloatis interpreted as Gaussian prior variance; otherwise must be aPriorinstance.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.return_item_params (
bool) – If True, also returns estimated item parameters. Implies returning scores.
- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,dict[str,ndarray]]- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns MAP ability scorestheta. Ifreturn_item_params=True, also returns{"difficulty": b}.
- Formula:
- \[\hat\theta,\hat b = \arg\min_{\theta,b} \left[ -\sum_{l,m}\log p(k_{lm}\mid\theta_l,b_m) + \mathrm{penalty}(\theta) \right]\]
References
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> rank.rasch_map(R, prior=1.0).tolist() [1, 2]
- rasch_mml(R, method='competition', return_scores=False, max_iter=100, em_iter=20, n_quadrature=21, return_item_params=False)[source]
Rank models with Rasch MML (EM + quadrature) and EAP scoring.
- Method context:
Integrates out abilities under a population prior (standard normal), estimates item difficulties by EM, then computes expected-a-posteriori (EAP) model abilities.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive max optimizer iterations in each M-step item update.em_iter (
int) – Positive number of EM iterations.n_quadrature (
int) – Number of Gauss-Hermite nodes (integer>=2).return_item_params (
bool) – If True, also returns estimated item parameters. Implies returning scores.
- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,dict[str,ndarray]]- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns EAP ability scores. Ifreturn_item_params=True, also returns{"difficulty": b, "ability_sd": sd(theta|R)}.
- Formula:
- \[\hat\theta_l^{\mathrm{EAP}} = \sum_q w_{lq}\,\theta_q, \quad w_{lq} \propto p(k_l\mid\theta_q,b)\,w_q\]
References
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2), 177-195.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> rank.rasch_mml(R).tolist() [1, 2]
- rasch_mml_credible(R, quantile=0.05, method='competition', return_scores=False, max_iter=100, em_iter=20, n_quadrature=21)[source]
Rank models by a posterior quantile under Rasch MML.
- Method context:
Uses the discrete posterior from
rasch_mml()and ranks by posterior quantileQ_q(theta_l | R). Lower quantiles provide conservative, uncertainty-aware ordering.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).quantile (
float) – Posterior quantileqin(0, 1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive max optimizer iterations in each M-step item update.em_iter (
int) – Positive number of EM iterations.n_quadrature (
int) – Number of Gauss-Hermite nodes (integer>=2).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns posterior-quantile scores.
- Formula:
- \[s_l = Q_q(\theta_l\mid R)\]
- rasch_2pl(R, method='competition', return_scores=False, max_iter=500, return_item_params=False, reg_discrimination=0.01)[source]
Rank models with 2PL IRT via joint (optionally regularized) JMLE.
- Method context:
Extends Rasch with item discrimination
a_m > 0, so items can differ in how strongly they separate abilities. By default, a small L2 penalty is applied onlog(a)for numerical stability.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.return_item_params (
bool) – If True, also returns estimated item parameters (difficulty and discrimination). Implies returning scores.reg_discrimination (
float) – Non-negative L2 penalty weight onlog(a). Set to0.0for pure (unpenalized) JMLE.
- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,dict[str,ndarray]]- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns ability scorestheta. Ifreturn_item_params=True, also returns{"difficulty": b, "discrimination": a}.
- Formula:
- \[k_{lm} \sim \mathrm{Binomial} \left(N,\sigma\left(a_m(\theta_l-b_m)\right)\right)\]
References
Birnbaum, A. (1968). Some Latent Trait Models and Their Use in Inferring an Examinee’s Ability. In Statistical Theories of Mental Test Scores.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> rank.rasch_2pl(R).tolist() [1, 2]
- rasch_2pl_map(R, prior=1.0, method='competition', return_scores=False, max_iter=500, return_item_params=False, reg_discrimination=0.01)[source]
Rank models with 2PL IRT via MAP estimation.
- Method context:
Same 2PL likelihood as
rasch_2pl(), with a prior penalty on model abilitiesthetaand an optional L2 penalty onlog(a).
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).prior (
Prior|float) – Ability prior. Afloatis interpreted as Gaussian prior variance; otherwise must be aPriorinstance.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.return_item_params (
bool) – If True, also returns estimated item parameters. Implies returning scores.reg_discrimination (
float) – Non-negative L2 penalty weight onlog(a). Set to0.0to remove item-discrimination regularization.
- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,dict[str,ndarray]]- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns MAP ability scorestheta. Ifreturn_item_params=True, also returns{"difficulty": b, "discrimination": a}.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> rank.rasch_2pl_map(R, prior=1.0).tolist() [1, 2]
- rasch_3pl(R, method='competition', return_scores=False, max_iter=500, fix_guessing=None, return_item_params=False, reg_discrimination=0.01, reg_guessing=0.1, guessing_upper=0.5)[source]
Rank models with 3PL IRT via joint (optionally regularized) JMLE.
- Method context:
Extends 2PL with item-specific pseudo-guessing
c_m. Estimated guessing is constrained to[0, guessing_upper]; optionally a fixed value can be used. By default, small L2 penalties are applied onlog(a)and guessing logits for numerical stability.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.fix_guessing (
float|None) – If provided, fixes the guessing parameter to this value for all questions; must lie in[0, guessing_upper].return_item_params (
bool) – If True, also returns estimated item parameters. Implies returning scores.reg_discrimination (
float) – Non-negative L2 penalty weight onlog(a). Set to0.0for pure (unpenalized) JMLE.reg_guessing (
float) – Non-negative L2 penalty weight on guessing logits. Set to0.0for pure (unpenalized) JMLE.guessing_upper (
float) – Upper bound for item guessingc_m. Must be in(0, 1). Default0.5is suitable for binary outcomes.
- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,dict[str,ndarray]]- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns ability scorestheta. Ifreturn_item_params=True, also returns{"difficulty": b, "discrimination": a, "guessing": c}.
- Formula:
- \[p_{lm} = c_m + (1-c_m)\sigma\left(a_m(\theta_l-b_m)\right)\]
References
Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Routledge.
Birnbaum, A. (1968). Some Latent Trait Models and Their Use in Inferring an Examinee’s Ability. In Statistical Theories of Mental Test Scores.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> rank.rasch_3pl(R, fix_guessing=0.25).tolist() [1, 2]
- rasch_3pl_map(R, prior=1.0, method='competition', return_scores=False, max_iter=500, fix_guessing=None, return_item_params=False, reg_discrimination=0.01, reg_guessing=0.1, guessing_upper=0.5)[source]
Rank models with 3PL IRT via MAP estimation.
- Method context:
Same 3PL likelihood as
rasch_3pl(), with prior regularization on model abilitiesthetaand optional L2 regularization on item parameters.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).prior (
Prior|float) – Ability prior. Afloatis interpreted as Gaussian prior variance; otherwise must be aPriorinstance.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.fix_guessing (
float|None) – Optional fixed guessing parameter in[0, guessing_upper].return_item_params (
bool) – IfTrue, also return item parameters.reg_discrimination (
float) – Non-negative L2 penalty weight onlog(a).reg_guessing (
float) – Non-negative L2 penalty weight on guessing logits.guessing_upper (
float) – Upper bound for item guessingc_min(0, 1). Default is0.5for binary outcomes.
- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,dict[str,ndarray]]- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns MAP ability scorestheta. Ifreturn_item_params=True, also returns{"difficulty": b, "discrimination": a, "guessing": c}.
- dynamic_irt(R, variant='linear', method='competition', return_scores=False, max_iter=500, return_item_params=False, time_points=None, score_target='final', slope_reg=0.01, state_reg=1.0, assume_time_axis=False)[source]
Rank models with dynamic (longitudinal) IRT variants.
- Method context:
variant="linear"is a static Rasch baseline over aggregated counts.variant="growth"fits a longitudinal logistic growth model with per-model baselinetheta0_land slopetheta1_l:\[\theta_{ln}=\theta_{0,l}+\theta_{1,l}t_n\]variant="state_space"fits a dynamic Rasch trajectory \(\theta_{ln}\) with random-walk smoothness regularization:\[P(R_{lmn}=1)=\sigma\left(\theta_{ln}-b_m\right)\]\[\mathrm{penalty}=\lambda\sum_{l,n>0} \frac{\left(\theta_{ln}-\theta_{l,n-1}\right)^2}{t_n-t_{n-1}}\]
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).variant (
Literal['linear','growth','state_space']) –"linear","growth", or"state_space".method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).max_iter (
int) – Positive maximum number of L-BFGS iterations.return_item_params (
bool) – If True, also returns estimated item parameters. Implies returning scores.time_points (
ndarray|None) – Optional ordered measurement times of lengthN. IfNone, uses equally spaced times in[0, 1]. Used only for longitudinal variants.score_target (
Literal['initial','final','mean','gain','baseline','start','end','average','delta','trend']) – Longitudinal score extracted from ability paths for ranking in growth and state-space variants. One of{"initial", "final", "mean", "gain"}.slope_reg (
float) – Non-negative L2 regularization weight on growth slopes. Used only forvariant="growth".state_reg (
float) – Non-negative random-walk smoothness penalty invariant="state_space".assume_time_axis (
bool) – Safety switch for longitudinal variants. SetTrueto acknowledge that axis-2 ofRis ordered time, not i.i.d. sampling trials.
- Return type:
ndarray|tuple[ndarray,ndarray] |tuple[ndarray,ndarray,dict[str,ndarray]]- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns scores:thetaforlinearand dynamic target scores forgrowth/state_space. Ifreturn_item_params=True, also returns{"difficulty": b}(linear), or for longitudinal variants:{"difficulty": b, "ability_path": theta_path, ...}.
- Formula:
- \[P(R_{lmn}=1) = \sigma\left(\theta_{0,l} + \theta_{1,l} t_n - b_m\right)\]
References
Verhelst, N. D., & Glas, C. A. (1993). A dynamic generalization of the Rasch model. Psychometrika.
Wang, C., & Nydick, S. W. (2020). On Longitudinal Item Response Theory Models: A Didactic. Journal of Educational and Behavioral Statistics.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1, 1], [1, 1, 1]], ... [[0, 0, 0], [0, 0, 0]], ... ]) >>> rank.dynamic_irt(R, variant="linear").tolist() [1, 2]
Voting Methods
Voting-based ranking methods.
This module adapts social-choice rules to model ranking from response tensors.
Notation
Let \(R \in \{0,1\}^{L \times M \times N}\) and define question-level grades
Each question \(m\) is treated as one voter over models based on \(k_{:m}\). Voting rules produce scores \(s_l\) from these per-question preferences, then ranks are derived from \(s\).
When \(N=1\), each question induces a two-level ordering (correct versus incorrect), so many voting rules reduce to simple majority-style behavior.
- borda(R, method='competition', return_scores=False)[source]
Rank models with Borda count on per-question rankings.
- Method context:
Each question acts as a voter that ranks models by per-question correct count
k[l, m]. Borda assigns positional points per question and sums them across questions; ties use averaged ranks.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns Borda scores (shape(L,)).
- Notation:
k_{lm} = sum_{n=1}^N R_{lmn}andr_{lm}is modell’s tie-averaged descending rank amongk_{:,m}(rank 1 is best).- Formula:
- \[s_l^{\mathrm{Borda}} = \sum_{m=1}^{M} (L - r_{lm})\]
References
de Borda, J.-C. (1781/1784). Mémoire sur les élections au scrutin. In Histoire de l’Académie Royale des Sciences.
Brandt, F., Conitzer, V., Endriss, U., Lang, J., & Procaccia, A. D. (2016). Handbook of Computational Social Choice. Cambridge University Press.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1, 1], [1, 0, 0]], # k: [3, 1] ... [[1, 1, 0], [0, 1, 0]], # k: [2, 1] ... [[0, 0, 0], [1, 1, 1]], # k: [0, 3] ... ]) >>> ranks, scores = rank.borda(R, return_scores=True) >>> ranks.tolist() [1, 3, 2] >>> scores.round(2).tolist() [2.5, 1.5, 2.0]
Notes
Adding a constant to every model’s score does not change the ranking.
- copeland(R, method='competition', return_scores=False)[source]
Rank models with Copeland pairwise-majority scores.
- Method context:
For each model pair
(i, j), count how many questions preferioverjby comparing per-question correct counts. A strict majority contributes+1to the winner and-1to the loser; tied pairwise contests contribute0.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns Copeland scores (shape(L,)).
- Notation:
k_{im} = sum_{n=1}^N R_{imn}andW^{(q)}_{ij} = sum_{m=1}^{M} 1[k_{im} > k_{jm}].- Formula:
- \[s_i^{\mathrm{Copeland}} = \sum_{j\neq i} \operatorname{sign}\!\left( W^{(q)}_{ij} - W^{(q)}_{ji} \right)\]
References
Copeland, A. H. (1951). A Reasonable Social Welfare Function. Seminar on Applications of Mathematics to the Social Sciences, University of Michigan.
Brandt, F., Conitzer, V., Endriss, U., Lang, J., & Procaccia, A. D. (2016). Handbook of Computational Social Choice. Cambridge University Press.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [0, 0], [1, 0]], # k: [2, 0, 1] ... [[1, 0], [1, 1], [0, 0]], # k: [1, 2, 0] ... [[0, 0], [1, 0], [1, 1]], # k: [0, 1, 2] ... ]) >>> ranks, scores = rank.copeland(R, return_scores=True) >>> ranks.tolist() [1, 1, 1] >>> scores.tolist() [0.0, 0.0, 0.0]
Notes
This implementation compares pairwise question-level majorities and does not use magnitude of per-question margins beyond the sign.
- win_rate(R, method='competition', return_scores=False)[source]
Rank models by pairwise question-level win rate.
- Method context:
For each model pair, count on how many questions model
ihas a higher per-question correct count than modelj. A model’s score is the fraction of decisive pairwise outcomes it wins.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns win-rate scores in[0, 1](shape(L,)).
- Notation:
k_{im} = sum_{n=1}^N R_{imn}andW^{(q)}_{ij} = sum_{m=1}^{M} 1[k_{im} > k_{jm}].- Formula:
- \[s_i^{\mathrm{winrate}} = \frac{\sum_{j\neq i} W^{(q)}_{ij}} {\sum_{j\neq i} \left(W^{(q)}_{ij} + W^{(q)}_{ji}\right)}\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks, scores = rank.win_rate(R, return_scores=True) >>> ranks.tolist() [1, 2] >>> scores.round(2).tolist() [1.0, 0.0]
Notes
If a model has no decisive pairwise outcomes against any opponent, its score is set to
0.5.
- minimax(R, variant='margin', tie_policy='half', method='competition', return_scores=False)[source]
Rank models with the Minimax (Simpson-Kramer) Condorcet rule.
- Method context:
Build pairwise question-level preference counts
P[i, j]from per-question correct counts. Each model is scored by its worst pairwise defeat; smaller worst defeat is better.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).variant (
str) – Defeat-strength definition: -"margin": use margin of defeat -"winning_votes": use opponent’s winning-vote counttie_policy (
str) – How per-question ties contribute to pairwise counts:"ignore"or"half".method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns minimax scores (shape(L,)), where values are non-positive and larger is better.
- Notation:
P_{ij}is pairwise preference count andDelta_{ij} = P_{ij} - P_{ji}.- Formula:
- \[s_i^{\mathrm{minimax}} = -\max_{j\neq i} \max(0, \Delta_{ji})\]\[s_i^{\mathrm{wv}} = -\max_{j : P_{ji} > P_{ij}} P_{ji}\]
References
Brandt, F., Conitzer, V., Endriss, U., Lang, J., & Procaccia, A. D. (2016). Handbook of Computational Social Choice. Cambridge University Press.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1], [1, 1]], ... [[1, 0], [1, 0], [1, 0]], ... [[0, 0], [0, 0], [0, 0]], ... ]) >>> ranks, scores = rank.minimax(R, return_scores=True) >>> ranks.tolist() [1, 2, 2] >>> scores.tolist() [-0.0, -3.0, -3.0]
- schulze(R, tie_policy='half', method='competition', return_scores=False)[source]
Rank models with the Schulze beatpath Condorcet method.
- Method context:
Build pairwise preference counts
Pfrom per-question outcomes. Initialize direct victories, then run a strongest-path closure: path strength is the maximum bottleneck strength across all directed paths. Modeliis preferred tojwhenp[i, j] > p[j, i].
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).tie_policy (
str) – How per-question ties contribute to pairwise counts:"ignore"or"half".method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns level scores derived from the beatpath dominance relation (shape(L,)).
- Notation:
P_{ij}is pairwise preference count andp_{ij}is strongest path strength fromitoj.- Formula:
- \[\begin{split}p_{ij} = \begin{cases} P_{ij}, & P_{ij} > P_{ji} \\ 0, & \text{otherwise} \end{cases}\end{split}\]\[p_{jk} \leftarrow \max\bigl(p_{jk}, \min(p_{ji}, p_{ik})\bigr)\]
References
Schulze, M. (2010). A new monotonic, clone-independent, reversal symmetric, and Condorcet-consistent single-winner election method. Social Choice and Welfare.
Brandt, F., Conitzer, V., Endriss, U., Lang, J., & Procaccia, A. D. (2016). Handbook of Computational Social Choice. Cambridge University Press.
- ranked_pairs(R, strength='margin', tie_policy='half', method='competition', return_scores=False)[source]
Rank models with the Ranked Pairs (Tideman) Condorcet method.
- Method context:
Compute directed pairwise victories and sort them by strength. Lock victories in descending order, skipping any edge that would create a directed cycle. The locked acyclic graph induces the final ranking.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).strength (
str) – Primary edge-strength key: -"margin": absolute pairwise margin|P_ij - P_ji|-"winning_votes": winning-vote countP_ijtie_policy (
str) – How per-question ties contribute to pairwise counts:"ignore"or"half".method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns topological level scores from the locked graph (shape(L,)).
- Notation:
P_{ij}is pairwise preference count andDelta_{ij} = P_{ij} - P_{ji}.- Formula:
Lock directed wins
i -> jin nonincreasing order of selected strength, accepting a lock only if it keeps the locked graph acyclic.
References
Tideman, T. N. (1987). Independence of clones as a criterion for voting rules. Social Choice and Welfare.
Brandt, F., Conitzer, V., Endriss, U., Lang, J., & Procaccia, A. D. (2016). Handbook of Computational Social Choice. Cambridge University Press.
- kemeny_young(R, tie_policy='half', method='competition', return_scores=False, time_limit=None, tie_aware=True)[source]
Rank models with Kemeny-Young rank aggregation via exact MILP.
- Method context:
Treat each question as a voter and aggregate pairwise preferences into
P. Kemeny-Young selects a total order maximizing agreement with pairwise counts.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).tie_policy (
str) – How per-question ties contribute to pairwise counts:"ignore"or"half".method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).time_limit (
float|None) – Optional positive MILP time limit in seconds.tie_aware (
bool) – IfTrue(default), derive a tie-aware preorder by checking which pairwise orders are forced across optimal Kemeny solutions. IfFalse, return the single optimal order selected by the MILP solver.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns Kemeny scores (shape(L,)). Fortie_aware=True, these are level scores from the forced-order DAG over all optimal Kemeny solutions. Fortie_aware=False, scores are the number of opponents ranked below each model in the selected optimal order.
- Notation:
P_{ij}is the aggregated pairwise preference count.y_{ij} in {0,1}indicates whether modeliis ranked above modelj.- Formula:
- \[\max_{y} \sum_{i \ne j} P_{ij} y_{ij}\]\[y_{ij} + y_{ji} = 1, \quad y_{ij} + y_{jk} + y_{ki} \le 2 \;\; (\forall i,j,k \text{ distinct})\]
References
Kemeny, J. G. (1959). Mathematics without Numbers. Daedalus.
Young, H. P. (1977). Extending Condorcet’s rule. Journal of Economic Theory.
Brandt, F., Conitzer, V., Endriss, U., Lang, J., & Procaccia, A. D. (2016). Handbook of Computational Social Choice. Cambridge University Press.
Notes
Exact Kemeny optimization is NP-hard in general; this implementation uses
scipy.optimize.milpon the induced linear-ordering ILP. Tie-aware mode solves additional MILPs (up to one per model pair).
- nanson(R, rank_ties='average', method='competition', return_scores=False)[source]
Rank models with Nanson’s Borda-elimination rule.
- Method context:
Recompute Borda scores among currently active models each round, then eliminate all models scoring strictly below the round mean.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).rank_ties (
str) – Tie rule passed toscipy.stats.rankdata()when computing per-question Borda ranks among active models.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns survival-round scores (shape(L,)), where larger means eliminated later.
- Notation:
At round
t, active set isA_tand Borda score in that round iss_i^{(t)}fori in A_t.- Formula:
- \[E_t = \{ i \in A_t : s_i^{(t)} < \overline{s}^{(t)} \} ,\quad A_{t+1} = A_t \setminus E_t\]
References
Brandt, F., Conitzer, V., Endriss, U., Lang, J., & Procaccia, A. D. (2016). Handbook of Computational Social Choice. Cambridge University Press.
Notes
If no active model is strictly below the mean, elimination stops and all remaining models tie.
- baldwin(R, rank_ties='average', method='competition', return_scores=False)[source]
Rank models with Baldwin’s iterative Borda-elimination rule.
- Method context:
Recompute Borda scores among active models each round and eliminate the model(s) with the minimum score. This implementation removes all models tied at the minimum in a round.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).rank_ties (
str) – Tie rule passed toscipy.stats.rankdata()when computing per-question Borda ranks among active models.method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns survival-round scores (shape(L,)), where larger means eliminated later.
- Notation:
At round
t, active set isA_tand Borda score iss_i^{(t)}fori in A_t.- Formula:
- \[E_t = \arg\min_{i \in A_t} s_i^{(t)} ,\quad A_{t+1} = A_t \setminus E_t\]
References
Brandt, F., Conitzer, V., Endriss, U., Lang, J., & Procaccia, A. D. (2016). Handbook of Computational Social Choice. Cambridge University Press.
Notes
If all active models tie at the minimum in a round, elimination stops and all remaining models tie.
- majority_judgment(R, method='competition', return_scores=False)[source]
Rank models with Majority Judgment on per-question trial counts.
- Method context:
Each model receives per-question integer grades
k_{lm} in {0, ..., N}. Models are first compared by median grade. Ties are broken with majority-gauge style iterative median removal.
- Parameters:
R (
ndarray) – Binary outcome tensor with shape(L, M, N)or matrix(L, M)(treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Tie-handling rule passed toscorio.utils.rank_scores().return_scores (
bool) – IfTrue, return(ranking, scores).
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, also returns level scores induced by the MJ comparison preorder (shape(L,)).
- Notation:
k_{lm} = sum_{n=1}^N R_{lmn}is the grade assigned by questionmto modell.- Formula:
Rank by median grade, then for tied models iteratively remove one occurrence of the current median grade from each tied model and repeat median comparison.
References
Balinski, M., & Laraki, R. (2011). Majority Judgment: Measuring, Ranking, and Electing. MIT Press.
Notes
The implementation uses the lower median for even sample sizes.
Listwise and Setwise Choice Models
Listwise and setwise probabilistic choice models (Luce family).
In the binary tensor setting \(R \in \{0,1\}^{L \times M \times N}\), each event \((m,n)\) induces a winner set \(W_{mn}\) (correct models) and loser set \(L_{mn}\) (incorrect models).
Luce-family models assign positive strengths \(\pi_i\) and define selection probabilities over comparison sets:
For strict orderings \(i_1 \succ i_2 \succ \cdots \succ i_K\), the Plackett-Luce likelihood is
This module uses three constructions:
Pairwise reduction (wins only): reduce \(R\) to decisive pairwise counts and fit the Bradley-Terry form of Plackett-Luce.
Setwise likelihood with ties: treat each observed winner set as one tied choice event using Davidson-Luce normalization.
Setwise composite likelihood: convert each winner into a Luce choice from
{winner} union {losers}(Bradley-Terry-Luce rank breaking).
Estimation uses MM updates for the Plackett-Luce pairwise reduction and L-BFGS optimization for the setwise models.
References
Plackett, R. L. (1975). The Analysis of Permutations. Journal of the Royal Statistical Society: Series C.
Luce, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis. John Wiley & Sons.
Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models. The Annals of Statistics, 32(1), 384-406.
Firth, D., Kosmidis, I., & Turner, H. L. (2019). Davidson-Luce model for multi-item choice with ties. arXiv:1909.07123.
- plackett_luce(R, method='competition', return_scores=False, max_iter=500, tol=1e-8)[source]
Rank models with Plackett-Luce maximum likelihood.
- Method context:
In Scorio’s binary tensor setting, we reduce outcomes to decisive pairwise win counts and fit the Bradley-Terry form of a Plackett-Luce model using Hunter’s MM updates.
References
Plackett, R. L. (1975). The Analysis of Permutations. Luce, R. D. (1959). Individual Choice Behavior. Hunter, D. R. (2004). MM Algorithms for Generalized Bradley-Terry Models.
- Parameters:
R (
ndarray) – Binary tensor of shape(L, M, N)(or(L, M)treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Ranking method passed toscorio.utils.rank_scores.return_scores (
bool) – If True, return(ranking, scores).max_iter (
int) – Positive MM iteration budget.tol (
float) – Positive finite convergence tolerance on max parameter change.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, returns(ranking, scores)where scores are the estimated strengthspi.
- Notation:
W_ij: number of decisive outcomes where modelibeatsj.N_ij = W_ij + W_ji: total decisive pairwise comparisons.pi_i > 0: latent model strengths.- Formula:
- \[\pi_i^{(k+1)} = \frac{\sum_j W_{ij}} {\sum_{j \ne i} N_{ij} / (\pi_i^{(k)} + \pi_j^{(k)})}\]
followed by normalization to resolve scale non-identifiability.
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks = rank.plackett_luce(R) >>> ranks[0] < ranks[1] # Model 0 has better (lower) rank True
Notes
This implementation intentionally ignores within-outcome ties (both-correct or both-incorrect), matching pairwise decisive reduction.
- plackett_luce_map(R, prior=1.0, method='competition', return_scores=False, max_iter=500)[source]
Rank models with Plackett-Luce maximum a posteriori estimation.
- Method context:
Adds a prior penalty on centered log-strengths to the pairwise-reduced Plackett-Luce likelihood. Numeric priors are interpreted as Gaussian prior variances.
References
Luce, R. D. (1959). Individual Choice Behavior. Hunter, D. R. (2004). MM Algorithms for Generalized Bradley-Terry Models.
- Parameters:
R (
ndarray) – Binary tensor of shape(L, M, N)(or(L, M)treated asN=1).prior (
Prior|float) –Priorinstance or positive finite float variance forGaussianPrior(mean=0, var=prior).method (
Literal['competition','competition_max','dense','avg']) – Ranking method passed toscorio.utils.rank_scores.return_scores (
bool) – If True, return(ranking, scores).max_iter (
int) – Positive optimizer iteration budget.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, returns(ranking, scores).
- Formula:
Let
theta_i = log(pi_i)andP(theta)be the prior penalty.\[\hat\theta \in \arg\min_{\theta} \left[ -\sum_{i \ne j} W_{ij} \left(\theta_i - \log(e^{\theta_i}+e^{\theta_j})\right) + P(\theta - \bar\theta) \right]\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks = rank.plackett_luce_map(R, prior=1.0) >>> ranks[0] < ranks[1] True
Notes
The MAP objective is solved with L-BFGS-B over centered log-strengths.
- davidson_luce(R, method='competition', return_scores=False, max_iter=500, max_tie_order=None)[source]
Rank models with Davidson-Luce maximum likelihood (setwise ties).
- Method context:
Each question-trial induces a winner set
Wand loser setL. Davidson-Luce models tied winners directly with tie-order parameters. Normalization terms are computed with elementary symmetric polynomials.
References
Firth, D., Kosmidis, I., & Turner, H. L. (2019). Davidson-Luce model for multi-item choice with ties. https://arxiv.org/abs/1909.07123
- Parameters:
R (
ndarray) – Binary tensor of shape(L, M, N)(or(L, M)treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Ranking method passed toscorio.utils.rank_scores.return_scores (
bool) – If True, return(ranking, scores).max_iter (
int) – Positive optimizer iteration budget.max_tie_order (
int|None) – Maximum tie orderDused in normalization; default isL-1.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, returns(ranking, scores)where scores are strengthsalpha.
- Notation:
S = W \cup Lis the comparison set andt = |W|.delta_1 = 1anddelta_t > 0fort >= 2.g_t(T) = (\prod_{i\in T} alpha_i)^{1/t}.- Formula:
- \[\Pr(W \mid S) = \frac{\delta_t g_t(W)} {\sum_{t'=1}^{\min(D,|S|)} \delta_{t'} \sum_{|T|=t'} g_{t'}(T)}\]
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks = rank.davidson_luce(R) >>> ranks[0] < ranks[1] # Model 0 has better (lower) rank True
Notes
Events with all winners or all losers are dropped as uninformative.
- davidson_luce_map(R, prior=1.0, method='competition', return_scores=False, max_iter=500, max_tie_order=None)[source]
Rank models with Davidson-Luce MAP estimation.
- Method context:
Adds a prior penalty on centered log-strengths to the Davidson-Luce setwise tie likelihood.
- Parameters:
R (
ndarray) – Binary tensor of shape(L, M, N)(or(L, M)treated asN=1).prior (
Prior|float) –Priorinstance or positive finite float variance forGaussianPrior(mean=0, var=prior).method (
Literal['competition','competition_max','dense','avg']) – Ranking method passed toscorio.utils.rank_scores.return_scores (
bool) – If True, return(ranking, scores).max_iter (
int) – Positive optimizer iteration budget.max_tie_order (
int|None) – Maximum tie orderDused in normalization; default isL-1.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, returns(ranking, scores).
Examples
>>> import numpy as np >>> from scorio import rank >>> R = np.array([ ... [[1, 1], [1, 1]], ... [[0, 0], [0, 0]], ... ]) >>> ranks = rank.davidson_luce_map(R, prior=1.0) >>> ranks[0] < ranks[1] True
- bradley_terry_luce(R, method='competition', return_scores=False, max_iter=500)[source]
Rank models with Bradley-Terry-Luce composite-likelihood ML.
- Method context:
For each event
(W, L), each winneri in Wis treated as a Luce choice from{i} union L. This yields a rank-breaking composite likelihood objective (not a normalized probability for the whole winner setWas a single event).
References
Luce, R. D. (1959). Individual Choice Behavior.
- Parameters:
R (
ndarray) – Binary tensor of shape(L, M, N)(or(L, M)treated asN=1).method (
Literal['competition','competition_max','dense','avg']) – Ranking method passed toscorio.utils.rank_scores.return_scores (
bool) – If True, return(ranking, scores).max_iter (
int) – Positive optimizer iteration budget.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, returns(ranking, scores).
- Formula:
For event
(W, L), define per-winner choice setsC_i = {i} union Land optimize the composite log-likelihood\[\ell_{\mathrm{comp}}(\pi) = \sum_{(W,L)}\sum_{i\in W} \left[ \log \pi_i - \log\left(\pi_i + \sum_{j\in L}\pi_j\right) \right]\]
Notes
This objective is a Luce-style composite likelihood induced by rank-breaking, rather than a full normalized likelihood over all possible winner subsets.
- bradley_terry_luce_map(R, prior=1.0, method='competition', return_scores=False, max_iter=500)[source]
Rank models with Bradley-Terry-Luce composite-likelihood MAP estimation.
- Method context:
Adds a prior penalty on centered log-strengths to the BTL setwise-choice composite likelihood.
- Parameters:
R (
ndarray) – Binary tensor of shape(L, M, N)(or(L, M)treated asN=1).prior (
Prior|float) –Priorinstance or positive finite float variance forGaussianPrior(mean=0, var=prior).method (
Literal['competition','competition_max','dense','avg']) – Ranking method passed toscorio.utils.rank_scores.return_scores (
bool) – If True, return(ranking, scores).max_iter (
int) – Positive optimizer iteration budget.
- Return type:
- Returns:
Ranking array of shape
(L,). Ifreturn_scores=True, returns(ranking, scores).