Quick Start
This guide will help you get started with scorio for performance evaluation.
Basic Concepts
Data Format
scorio works with outcome matrices:
R: M × N integer matrix where:
M = number of questions being evaluated
N = number of trials/samples per question
Entries are categories in {0, …, C}
w: Weight vector of length C+1 mapping categories to scores
Binary Evaluation
For binary outcomes (correct/incorrect):
import numpy as np
from scorio import eval
# 2 question, 5 trials each
# 0 = incorrect, 1 = correct
R = np.array([[0, 1, 1, 0, 1],
[1, 1, 0, 1, 1]])
# Weight vector: 0→0.0, 1→1.0
w = np.array([0.0, 1.0])
# Bayesian evaluation
mu, sigma = eval.bayes(R, w)
print(f"Estimate: {mu:.4f} ± {sigma:.4f}")
Multi-Category Evaluation
For outcomes with multiple categories:
# 0 = incorrect, 1 = partial, 2 = correct
R = np.array([[0, 1, 2, 2, 1],
[1, 1, 0, 2, 2]])
# Weight vector for 3 categories
w = np.array([0.0, 0.5, 1.0])
mu, sigma = eval.bayes(R, w)
print(f"Estimate: {mu:.4f} ± {sigma:.4f}")
Using Prior Knowledge
Incorporate prior outcomes:
R = np.array([[0, 1, 2, 2, 1],
[1, 1, 0, 2, 2]])
w = np.array([0.0, 0.5, 1.0])
# Prior outcomes (2 trials per question)
R0 = np.array([[0, 2],
[1, 2]])
mu, sigma = eval.bayes(R, w, R0)
print(f"With prior: {mu:.4f} ± {sigma:.4f}")
Pass@k Metrics
Standard Pass@k (at least one correct):
R = np.array([[0, 1, 1, 0, 1],
[1, 1, 0, 1, 1]])
# Probability at least 1 of 2 samples is correct
pass_2 = eval.pass_at_k(R, k=2)
print(f"Pass@2: {pass_2:.4f}")
Pass^k (all correct):
# Probability all 2 samples are correct
pass_hat_2 = eval.pass_hat_k(R, k=2)
print(f"Pass^2: {pass_hat_2:.4f}")
Generalized Pass@k with threshold:
# Probability at least 50% of k samples are correct
g_pass = eval.g_pass_at_k_tau(R, k=3, tau=0.5)
print(f"G-Pass@3(τ=0.5): {g_pass:.4f}")
Simple Average
For basic accuracy:
R = np.array([[0, 1, 1, 0, 1],
[1, 1, 0, 1, 1]])
avg, avg_sigma = eval.avg(R)
print(f"Average: {avg:.4f} ± {avg_sigma:.4f}")
Next Steps
See Examples for more detailed use cases
Check scorio.eval for complete API documentation
Read the paper: https://arxiv.org/abs/2510.04265