scorio.utils
- rank_scores(scores_in_id_order, tol=1e-12, sigmas_in_id_order=None, confidence=0.95, ci_tie_method='zscore_adjacent')[source]
Convert scores to ranks using multiple ranking methods (with optional confidence-aware ties).
This is a utility that many evaluation scripts already use: it converts a list of scores into four standard rank conventions (competition/min, competition/max, dense, average/fractional). Higher scores get better (lower) ranks.
Extension: if sigmas_in_id_order is provided, this function ALSO returns a second set of ranks that treats two adjacent scores as tied when they are not separable at the requested confidence level.
- Parameters:
scores_in_id_order (
Sequence[float] |ndarray) – Scores aligned by ID order.tol (
float) – Tolerance threshold for treating scores as equal (default: 1e-12).sigmas_in_id_order (
Sequence[float] |ndarray|None) – Optional per-score uncertainty aligned by ID order. If provided, uncertainty-aware ranks are added under keys suffixed with “_ci”.confidence (
float) – Confidence used for the uncertainty-aware tie rule.ci_tie_method (
Literal['zscore_adjacent','ci_overlap_adjacent']) – How to decide ties when sigmas are provided: - “zscore_adjacent” (paper Sec. 2.9): tie if z < Φ^{-1}(confidence) - “ci_overlap_adjacent”: tie if confidence-level CIs overlap
- Returns:
- “competition”: Min-rank competition (1, 2, 2, 4)
”competition_max”: Max-rank competition (1, 3, 3, 4)
”dense”: Dense ranking (1, 2, 2, 3)
”avg”: Fractional/average ranking (1, 2.5, 2.5, 4)
- If sigmas_in_id_order is provided, four extra keys are included:
”competition_ci”
”competition_max_ci”
”dense_ci”
”avg_ci”
- Return type:
Examples
>>> import numpy as np >>> scores = [95.0, 87.5, 87.5, 80.0, 75.0] >>> ranks = rank_scores(scores) >>> ranks["competition"].tolist() == [1, 2, 2, 4, 5] True >>> ranks["dense"].tolist() == [1, 2, 2, 3, 4] True
With uncertainty: the top two are close and get tied at 95%:
>>> sigmas = [1.0, 1.0, 0.1, 0.1, 0.1] >>> ranks2 = rank_scores(scores, sigmas_in_id_order=sigmas, confidence=0.95) >>> "dense_ci" in ranks2 True
- compare_rankings(ranked_list_a, ranked_list_b, method='all')[source]
Compare two rankings using multiple correlation metrics.
Computes Kendall’s tau, Spearman’s rho, and weighted Kendall’s tau to measure agreement between two rankings.
- Parameters:
ranked_list_a – First ranking (numeric array or list).
ranked_list_b – Second ranking (numeric array or list).
method – Which metric to return: “kendall”, “spearman”, “weighted_kendall”, or “all” (default).
- Returns:
If method is not “all”, returns a
(statistic, pvalue)tuple. If method is “all”, returns a dictionary with:kendalltau,spearmanr,weighted_kendalltau,fraction_mismatched, andmax_disp.- Raises:
ValueError – If lists have different lengths or contain NaN/inf.
TypeError – If lists are not numeric.
Notes
scipy.stats.weightedtaudoes not compute p-values (pvalue is NaN). Rankings are compared element-wise at matching indices.Examples
>>> import numpy as np >>> rank_a = [1, 2, 3, 4, 5] >>> rank_b = [1, 3, 2, 4, 5] >>> tau, pval = compare_rankings(rank_a, rank_b, method="kendall") >>> round(tau, 2) 0.8 >>> results = compare_rankings(rank_a, rank_b, method="all") >>> round(results["fraction_mismatched"], 2) 0.4
- lehmer_hash(ranked_list)[source]
Convert a permutation to its Lehmer code (factorial number system).
The Lehmer code provides a bijection between permutations and integers in the range [0, n!-1], useful for hashing, indexing, or enumerating all possible permutations.
- Parameters:
ranked_list – Permutation of integers
0..n-1as a list or array. Must contain no ties (all values unique).- Returns:
Unique hash value in range [0, n!-1] where n = len(ranked_list).
- Return type:
Notes
Time complexity: O(n²) due to inversion counting
Space complexity: O(1)
This function does NOT handle ties. All elements must be distinct.
Pre-computes factorials for efficiency
- Algorithm:
For each position i, counts inversions (elements to the right that are smaller) and encodes them in the factorial number system.
Examples
>>> lehmer_hash([0, 1, 2]) 0 >>> lehmer_hash([2, 1, 0]) 5 >>> lehmer_hash([0, 2, 1]) 1 >>> lehmer_hash([1, 2, 0]) 3
References
Lehmer, D. H. (1960). Teaching combinatorial tricks to a computer. In Combinatorial Analysis (Proceedings of Symposia in Applied Mathematics, Vol. 10, pp. 179–193). American Mathematical Society. MR 0113289.
- lehmer_unhash(hash_value, n)[source]
Convert a Lehmer code (hash) back to its permutation.
Inverse operation of lehmer_hash. Reconstructs the original permutation from its integer representation in the factorial number system.
- Parameters:
hash_value – Integer in range [0, n!-1] representing a permutation.
n – Length of the permutation to generate.
- Returns:
Permutation of integers [0, 1, …, n-1] corresponding to hash_value.
- Return type:
- Raises:
ValueError – If hash_value >= n! (invalid hash for given n).
Notes
Time complexity: O(n²) due to element removal tracking
Space complexity: O(n)
Pre-computes factorials for efficiency
Examples
>>> lehmer_unhash(0, 3) [0, 1, 2] >>> lehmer_unhash(5, 3) [2, 1, 0] >>> lehmer_unhash(1, 3) [0, 2, 1] >>> lehmer_unhash(3, 3) [1, 2, 0]
References
Lehmer, D. H. (1960). Teaching combinatorial tricks to a computer. In Combinatorial Analysis (Proceedings of Symposia in Applied Mathematics, Vol. 10, pp. 179–193). American Mathematical Society. MR 0113289.
- ranking_hash(rank_list, tol=1e-12)[source]
Perfect collision-free hash for rankings with ties.
Encodes any ranking (with or without ties) into a unique integer using ordered Bell numbers (Fubini numbers). This is the theoretically optimal approach that fully preserves all ranking information.
- Parameters:
rank_list – List where rank_list[i] is the rank of item i. Lower values = better ranks. Ties are represented by equal values. Example: [0, 1, 1, 3] means item 0 is first, items 1 and 2 are tied for second, item 3 is fourth.
tol – Tolerance for treating float ranks as equal (default: 1e-12).
- Returns:
- Unique hash value in range [0, F(n)-1] where F(n) is the
n-th ordered Bell number. F(n) = exact count of all possible rankings with ties for n items.
- Return type:
Notes
Time complexity: O(n²) for encoding
Space complexity: O(n) for storing ordered Bell numbers
Collision-free: Different rankings always get different hashes
Complete encoding: Hash fully captures both order and tie structure
Optimal: Uses exactly log₂(F(n)) bits for F(n) possible rankings
- Algorithm:
Uses ordered Bell (Fubini) numbers to enumerate all weak orderings. For each tie group, encodes both the subset selection (which items) and the lexicographic ordering within the subset.
Examples
>>> ranking_hash([0, 1, 2]) # No ties 0 >>> ranking_hash([0, 1, 1]) # Items 1,2 tied at second place 2 >>> ranking_hash([0, 0, 1]) # Items 0,1 tied at first place 9 >>> ranking_hash([0, 0, 0]) # All tied 12 >>> ranking_hash([1, 0, 1]) # Items 0,2 tied at rank 1, item 1 at rank 0 5
- Comparison to factorial-based hashing:
For example,
ranking_hash([0, 1, 1]) == 2,ranking_hash([0, 0, 1]) == 9, andranking_hash([1, 0, 1]) == 5. These hashes differ, unlike simpler approaches that lose tie information.
References
Ordered Bell numbers (a.k.a. Fubini numbers), OEIS A000670: https://en.wikipedia.org/wiki/Ordered_Bell_number https://oeis.org/A000670
Weak orderings / total preorders (rankings with ties): https://en.wikipedia.org/wiki/Total_preorder
Combinatorial number system (ranking/unranking k-subsets): https://en.wikipedia.org/wiki/Combinatorial_number_system
See also
unhash_ranking(): Inverse operation to reconstruct ranking from hash.
- unhash_ranking(h, n)[source]
Reconstruct ranking with ties from its hash value.
Inverse operation of ranking_hash. Decodes a hash back to the original ranking, returning competition ranks (min-rank with gaps).
- Parameters:
- Returns:
- Ranking in competition format (min-rank with gaps).
Example: [1, 2, 2, 4] means item 0 is rank 1, items 1 and 2 are tied at rank 2, item 3 is rank 4.
- Return type:
- Raises:
ValueError – If h is out of valid range [0, F(n)-1]
Notes
Time complexity: O(n²) for decoding
Space complexity: O(n) for ordered Bell numbers
Returns competition/min-rank format with gaps after ties
Fully reconstructs original ranking including tie structure
Examples
>>> unhash_ranking(0, 3) # First possible ranking [1, 2, 3] >>> unhash_ranking(2, 3) # Items 1,2 tied [1, 2, 2] >>> unhash_ranking(9, 3) # Items 0,1 tied [1, 1, 3] >>> unhash_ranking(12, 3) # All tied [1, 1, 1] >>> unhash_ranking(5, 3) # Items 0,2 tied at rank 2 [2, 1, 2]
See also
ranking_hash(): Inverse operation to hash a ranking.