Quick Start =========== This guide will help you get started with scorio for performance evaluation. Basic Concepts -------------- Data Format ~~~~~~~~~~~ scorio works with outcome matrices: - **R**: M × N integer matrix where: - M = number of questions being evaluated - N = number of trials/samples per question - Entries are categories in {0, ..., C} - **w**: Weight vector of length C+1 mapping categories to scores Binary Evaluation ----------------- For binary outcomes (correct/incorrect): .. code-block:: python import numpy as np from scorio import eval # 2 question, 5 trials each # 0 = incorrect, 1 = correct R = np.array([[0, 1, 1, 0, 1], [1, 1, 0, 1, 1]]) # Weight vector: 0→0.0, 1→1.0 w = np.array([0.0, 1.0]) # Bayesian evaluation mu, sigma = eval.bayes(R, w) print(f"Estimate: {mu:.4f} ± {sigma:.4f}") Multi-Category Evaluation -------------------------- For outcomes with multiple categories: .. code-block:: python # 0 = incorrect, 1 = partial, 2 = correct R = np.array([[0, 1, 2, 2, 1], [1, 1, 0, 2, 2]]) # Weight vector for 3 categories w = np.array([0.0, 0.5, 1.0]) mu, sigma = eval.bayes(R, w) print(f"Estimate: {mu:.4f} ± {sigma:.4f}") Using Prior Knowledge --------------------- Incorporate prior outcomes: .. code-block:: python R = np.array([[0, 1, 2, 2, 1], [1, 1, 0, 2, 2]]) w = np.array([0.0, 0.5, 1.0]) # Prior outcomes (2 trials per question) R0 = np.array([[0, 2], [1, 2]]) mu, sigma = eval.bayes(R, w, R0) print(f"With prior: {mu:.4f} ± {sigma:.4f}") Pass@k Metrics -------------- Standard Pass@k (at least one correct): .. code-block:: python R = np.array([[0, 1, 1, 0, 1], [1, 1, 0, 1, 1]]) # Probability at least 1 of 2 samples is correct pass_2 = eval.pass_at_k(R, k=2) print(f"Pass@2: {pass_2:.4f}") Pass^k (all correct): .. code-block:: python # Probability all 2 samples are correct pass_hat_2 = eval.pass_hat_k(R, k=2) print(f"Pass^2: {pass_hat_2:.4f}") Generalized Pass@k with threshold: .. code-block:: python # Probability at least 50% of k samples are correct g_pass = eval.g_pass_at_k_tau(R, k=3, tau=0.5) print(f"G-Pass@3(τ=0.5): {g_pass:.4f}") Simple Average -------------- For basic accuracy: .. code-block:: python R = np.array([[0, 1, 1, 0, 1], [1, 1, 0, 1, 1]]) avg, avg_sigma = eval.avg(R) print(f"Average: {avg:.4f} ± {avg_sigma:.4f}") Next Steps ---------- - See :doc:`examples` for more detailed use cases - Check :doc:`api/eval` for complete API documentation - Read the paper: https://arxiv.org/abs/2510.04265