scorio.utils

rank_scores(scores_in_id_order, tol=1e-12, sigmas_in_id_order=None, confidence=0.95, ci_tie_method='zscore_adjacent')[source]

Convert scores to ranks using multiple ranking methods (with optional confidence-aware ties).

This is a utility that many evaluation scripts already use: it converts a list of scores into four standard rank conventions (competition/min, competition/max, dense, average/fractional). Higher scores get better (lower) ranks.

Extension: if sigmas_in_id_order is provided, this function ALSO returns a second set of ranks that treats two adjacent scores as tied when they are not separable at the requested confidence level.

Parameters:

scores_in_id_order (Sequence[float] | ndarray) – Scores aligned by ID order.
tol (float) – Tolerance threshold for treating scores as equal (default: 1e-12).
sigmas_in_id_order (Sequence[float] | ndarray | None) – Optional per-score uncertainty aligned by ID order. If provided, uncertainty-aware ranks are added under keys suffixed with “_ci”.
confidence (float) – Confidence used for the uncertainty-aware tie rule.
ci_tie_method (Literal['zscore_adjacent', 'ci_overlap_adjacent']) – How to decide ties when sigmas are provided: - “zscore_adjacent” (paper Sec. 2.9): tie if z < Φ^{-1}(confidence) - “ci_overlap_adjacent”: tie if confidence-level CIs overlap

Returns:

“competition”: Min-rank competition (1, 2, 2, 4)
- ”competition_max”: Max-rank competition (1, 3, 3, 4)
- ”dense”: Dense ranking (1, 2, 2, 3)
- ”avg”: Fractional/average ranking (1, 2.5, 2.5, 4)

If sigmas_in_id_order is provided, four extra keys are included:

”competition_ci”
”competition_max_ci”
”dense_ci”
”avg_ci”

Return type:

dict[str, ndarray]

Examples

>>> import numpy as np
>>> scores = [95.0, 87.5, 87.5, 80.0, 75.0]
>>> ranks = rank_scores(scores)
>>> ranks["competition"].tolist() == [1, 2, 2, 4, 5]
True
>>> ranks["dense"].tolist() == [1, 2, 2, 3, 4]
True

With uncertainty: the top two are close and get tied at 95%:

>>> sigmas = [1.0, 1.0, 0.1, 0.1, 0.1]
>>> ranks2 = rank_scores(scores, sigmas_in_id_order=sigmas, confidence=0.95)
>>> "dense_ci" in ranks2
True

compare_rankings(ranked_list_a, ranked_list_b, method='all')[source]

Compare two rankings using multiple correlation metrics.

Computes Kendall’s tau, Spearman’s rho, and weighted Kendall’s tau to measure agreement between two rankings.

Parameters:

ranked_list_a – First ranking (numeric array or list).
ranked_list_b – Second ranking (numeric array or list).
method – Which metric to return: “kendall”, “spearman”, “weighted_kendall”, or “all” (default).

Returns:

If method is not “all”, returns a (statistic, pvalue) tuple. If method is “all”, returns a dictionary with: kendalltau, spearmanr, weighted_kendalltau, fraction_mismatched, and max_disp.

Raises:

ValueError – If lists have different lengths or contain NaN/inf.
TypeError – If lists are not numeric.

Notes

scipy.stats.weightedtau does not compute p-values (pvalue is NaN). Rankings are compared element-wise at matching indices.

Examples

>>> import numpy as np
>>> rank_a = [1, 2, 3, 4, 5]
>>> rank_b = [1, 3, 2, 4, 5]
>>> tau, pval = compare_rankings(rank_a, rank_b, method="kendall")
>>> round(tau, 2)
0.8
>>> results = compare_rankings(rank_a, rank_b, method="all")
>>> round(results["fraction_mismatched"], 2)
0.4

lehmer_hash(ranked_list)[source]

Convert a permutation to its Lehmer code (factorial number system).

The Lehmer code provides a bijection between permutations and integers in the range [0, n!-1], useful for hashing, indexing, or enumerating all possible permutations.

Parameters:: ranked_list – Permutation of integers 0..n-1 as a list or array. Must contain no ties (all values unique).
Returns:: Unique hash value in range [0, n!-1] where n = len(ranked_list).
Return type:: int

Notes

Time complexity: O(n²) due to inversion counting
Space complexity: O(1)
This function does NOT handle ties. All elements must be distinct.
Pre-computes factorials for efficiency

Algorithm:: For each position i, counts inversions (elements to the right that are smaller) and encodes them in the factorial number system.

Examples

>>> lehmer_hash([0, 1, 2])
0
>>> lehmer_hash([2, 1, 0])
5
>>> lehmer_hash([0, 2, 1])
1
>>> lehmer_hash([1, 2, 0])
3

References

Lehmer, D. H. (1960). Teaching combinatorial tricks to a computer. In Combinatorial Analysis (Proceedings of Symposia in Applied Mathematics, Vol. 10, pp. 179–193). American Mathematical Society. MR 0113289.

lehmer_unhash(hash_value, n)[source]

Convert a Lehmer code (hash) back to its permutation.

Inverse operation of lehmer_hash. Reconstructs the original permutation from its integer representation in the factorial number system.

Parameters:

hash_value – Integer in range [0, n!-1] representing a permutation.
n – Length of the permutation to generate.

Returns:

Permutation of integers [0, 1, …, n-1] corresponding to hash_value.

Return type:

list

Raises:

ValueError – If hash_value >= n! (invalid hash for given n).

Notes

Time complexity: O(n²) due to element removal tracking
Space complexity: O(n)
Pre-computes factorials for efficiency

Examples

>>> lehmer_unhash(0, 3)
[0, 1, 2]
>>> lehmer_unhash(5, 3)
[2, 1, 0]
>>> lehmer_unhash(1, 3)
[0, 2, 1]
>>> lehmer_unhash(3, 3)
[1, 2, 0]

References

Lehmer, D. H. (1960). Teaching combinatorial tricks to a computer. In Combinatorial Analysis (Proceedings of Symposia in Applied Mathematics, Vol. 10, pp. 179–193). American Mathematical Society. MR 0113289.

ranking_hash(rank_list, tol=1e-12)[source]

Perfect collision-free hash for rankings with ties.

Encodes any ranking (with or without ties) into a unique integer using ordered Bell numbers (Fubini numbers). This is the theoretically optimal approach that fully preserves all ranking information.

Parameters:

rank_list – List where rank_list[i] is the rank of item i. Lower values = better ranks. Ties are represented by equal values. Example: [0, 1, 1, 3] means item 0 is first, items 1 and 2 are tied for second, item 3 is fourth.
tol – Tolerance for treating float ranks as equal (default: 1e-12).

Returns:

Unique hash value in range [0, F(n)-1] where F(n) is the: n-th ordered Bell number. F(n) = exact count of all possible rankings with ties for n items.

Return type:

int

Notes

Time complexity: O(n²) for encoding
Space complexity: O(n) for storing ordered Bell numbers
Collision-free: Different rankings always get different hashes
Complete encoding: Hash fully captures both order and tie structure
Optimal: Uses exactly log₂(F(n)) bits for F(n) possible rankings

Algorithm:: Uses ordered Bell (Fubini) numbers to enumerate all weak orderings. For each tie group, encodes both the subset selection (which items) and the lexicographic ordering within the subset.

Examples

>>> ranking_hash([0, 1, 2])  # No ties
0
>>> ranking_hash([0, 1, 1])  # Items 1,2 tied at second place
2
>>> ranking_hash([0, 0, 1])  # Items 0,1 tied at first place
9
>>> ranking_hash([0, 0, 0])  # All tied
12
>>> ranking_hash([1, 0, 1])  # Items 0,2 tied at rank 1, item 1 at rank 0
5

Comparison to factorial-based hashing:: For example, ranking_hash([0, 1, 1]) == 2, ranking_hash([0, 0, 1]) == 9, and ranking_hash([1, 0, 1]) == 5. These hashes differ, unlike simpler approaches that lose tie information.

References

Ordered Bell numbers (a.k.a. Fubini numbers), OEIS A000670: https://en.wikipedia.org/wiki/Ordered_Bell_number https://oeis.org/A000670

Weak orderings / total preorders (rankings with ties): https://en.wikipedia.org/wiki/Total_preorder

Combinatorial number system (ranking/unranking k-subsets): https://en.wikipedia.org/wiki/Combinatorial_number_system