scorio.utils

rank_scores(scores_in_id_order, tol=1e-12, sigmas_in_id_order=None, confidence=0.95, ci_tie_method='zscore_adjacent')[source]

Convert scores to ranks using multiple ranking methods (with optional confidence-aware ties).

This is a utility that many evaluation scripts already use: it converts a list of scores into four standard rank conventions (competition/min, competition/max, dense, average/fractional). Higher scores get better (lower) ranks.

Extension: if sigmas_in_id_order is provided, this function ALSO returns a second set of ranks that treats two adjacent scores as tied when they are not separable at the requested confidence level.

Parameters:
  • scores_in_id_order (Sequence[float] | ndarray) – Scores aligned by ID order.

  • tol (float) – Tolerance threshold for treating scores as equal (default: 1e-12).

  • sigmas_in_id_order (Sequence[float] | ndarray | None) – Optional per-score uncertainty aligned by ID order. If provided, uncertainty-aware ranks are added under keys suffixed with “_ci”.

  • confidence (float) – Confidence used for the uncertainty-aware tie rule.

  • ci_tie_method (Literal['zscore_adjacent', 'ci_overlap_adjacent']) – How to decide ties when sigmas are provided: - “zscore_adjacent” (paper Sec. 2.9): tie if z < Φ^{-1}(confidence) - “ci_overlap_adjacent”: tie if confidence-level CIs overlap

Returns:

  • “competition”: Min-rank competition (1, 2, 2, 4)
    • ”competition_max”: Max-rank competition (1, 3, 3, 4)

    • ”dense”: Dense ranking (1, 2, 2, 3)

    • ”avg”: Fractional/average ranking (1, 2.5, 2.5, 4)

If sigmas_in_id_order is provided, four extra keys are included:
  • ”competition_ci”

  • ”competition_max_ci”

  • ”dense_ci”

  • ”avg_ci”

Return type:

dict[str, ndarray]

Examples

>>> import numpy as np
>>> scores = [95.0, 87.5, 87.5, 80.0, 75.0]
>>> ranks = rank_scores(scores)
>>> ranks["competition"].tolist() == [1, 2, 2, 4, 5]
True
>>> ranks["dense"].tolist() == [1, 2, 2, 3, 4]
True

With uncertainty: the top two are close and get tied at 95%:

>>> sigmas = [1.0, 1.0, 0.1, 0.1, 0.1]
>>> ranks2 = rank_scores(scores, sigmas_in_id_order=sigmas, confidence=0.95)
>>> "dense_ci" in ranks2
True
compare_rankings(ranked_list_a, ranked_list_b, method='all')[source]

Compare two rankings using multiple correlation metrics.

Computes Kendall’s tau, Spearman’s rho, and weighted Kendall’s tau to measure agreement between two rankings.

Parameters:
  • ranked_list_a – First ranking (numeric array or list).

  • ranked_list_b – Second ranking (numeric array or list).

  • method – Which metric to return: “kendall”, “spearman”, “weighted_kendall”, or “all” (default).

Returns:

If method is not “all”, returns a (statistic, pvalue) tuple. If method is “all”, returns a dictionary with: kendalltau, spearmanr, weighted_kendalltau, fraction_mismatched, and max_disp.

Raises:
  • ValueError – If lists have different lengths or contain NaN/inf.

  • TypeError – If lists are not numeric.

Notes

scipy.stats.weightedtau does not compute p-values (pvalue is NaN). Rankings are compared element-wise at matching indices.

Examples

>>> import numpy as np
>>> rank_a = [1, 2, 3, 4, 5]
>>> rank_b = [1, 3, 2, 4, 5]
>>> tau, pval = compare_rankings(rank_a, rank_b, method="kendall")
>>> round(tau, 2)
0.8
>>> results = compare_rankings(rank_a, rank_b, method="all")
>>> round(results["fraction_mismatched"], 2)
0.4
lehmer_hash(ranked_list)[source]

Convert a permutation to its Lehmer code (factorial number system).

The Lehmer code provides a bijection between permutations and integers in the range [0, n!-1], useful for hashing, indexing, or enumerating all possible permutations.

Parameters:

ranked_list – Permutation of integers 0..n-1 as a list or array. Must contain no ties (all values unique).

Returns:

Unique hash value in range [0, n!-1] where n = len(ranked_list).

Return type:

int

Notes

  • Time complexity: O(n²) due to inversion counting

  • Space complexity: O(1)

  • This function does NOT handle ties. All elements must be distinct.

  • Pre-computes factorials for efficiency

Algorithm:

For each position i, counts inversions (elements to the right that are smaller) and encodes them in the factorial number system.

Examples

>>> lehmer_hash([0, 1, 2])
0
>>> lehmer_hash([2, 1, 0])
5
>>> lehmer_hash([0, 2, 1])
1
>>> lehmer_hash([1, 2, 0])
3

References

Lehmer, D. H. (1960). Teaching combinatorial tricks to a computer. In Combinatorial Analysis (Proceedings of Symposia in Applied Mathematics, Vol. 10, pp. 179–193). American Mathematical Society. MR 0113289.

lehmer_unhash(hash_value, n)[source]

Convert a Lehmer code (hash) back to its permutation.

Inverse operation of lehmer_hash. Reconstructs the original permutation from its integer representation in the factorial number system.

Parameters:
  • hash_value – Integer in range [0, n!-1] representing a permutation.

  • n – Length of the permutation to generate.

Returns:

Permutation of integers [0, 1, …, n-1] corresponding to hash_value.

Return type:

list

Raises:

ValueError – If hash_value >= n! (invalid hash for given n).

Notes

  • Time complexity: O(n²) due to element removal tracking

  • Space complexity: O(n)

  • Pre-computes factorials for efficiency

Examples

>>> lehmer_unhash(0, 3)
[0, 1, 2]
>>> lehmer_unhash(5, 3)
[2, 1, 0]
>>> lehmer_unhash(1, 3)
[0, 2, 1]
>>> lehmer_unhash(3, 3)
[1, 2, 0]

References

Lehmer, D. H. (1960). Teaching combinatorial tricks to a computer. In Combinatorial Analysis (Proceedings of Symposia in Applied Mathematics, Vol. 10, pp. 179–193). American Mathematical Society. MR 0113289.

ranking_hash(rank_list, tol=1e-12)[source]

Perfect collision-free hash for rankings with ties.

Encodes any ranking (with or without ties) into a unique integer using ordered Bell numbers (Fubini numbers). This is the theoretically optimal approach that fully preserves all ranking information.

Parameters:
  • rank_list – List where rank_list[i] is the rank of item i. Lower values = better ranks. Ties are represented by equal values. Example: [0, 1, 1, 3] means item 0 is first, items 1 and 2 are tied for second, item 3 is fourth.

  • tol – Tolerance for treating float ranks as equal (default: 1e-12).

Returns:

Unique hash value in range [0, F(n)-1] where F(n) is the

n-th ordered Bell number. F(n) = exact count of all possible rankings with ties for n items.

Return type:

int

Notes

  • Time complexity: O(n²) for encoding

  • Space complexity: O(n) for storing ordered Bell numbers

  • Collision-free: Different rankings always get different hashes

  • Complete encoding: Hash fully captures both order and tie structure

  • Optimal: Uses exactly log₂(F(n)) bits for F(n) possible rankings

Algorithm:

Uses ordered Bell (Fubini) numbers to enumerate all weak orderings. For each tie group, encodes both the subset selection (which items) and the lexicographic ordering within the subset.

Examples

>>> ranking_hash([0, 1, 2])  # No ties
0
>>> ranking_hash([0, 1, 1])  # Items 1,2 tied at second place
2
>>> ranking_hash([0, 0, 1])  # Items 0,1 tied at first place
9
>>> ranking_hash([0, 0, 0])  # All tied
12
>>> ranking_hash([1, 0, 1])  # Items 0,2 tied at rank 1, item 1 at rank 0
5
Comparison to factorial-based hashing:

For example, ranking_hash([0, 1, 1]) == 2, ranking_hash([0, 0, 1]) == 9, and ranking_hash([1, 0, 1]) == 5. These hashes differ, unlike simpler approaches that lose tie information.

References

Ordered Bell numbers (a.k.a. Fubini numbers), OEIS A000670: https://en.wikipedia.org/wiki/Ordered_Bell_number https://oeis.org/A000670

Weak orderings / total preorders (rankings with ties): https://en.wikipedia.org/wiki/Total_preorder

Combinatorial number system (ranking/unranking k-subsets): https://en.wikipedia.org/wiki/Combinatorial_number_system

See also

unhash_ranking(): Inverse operation to reconstruct ranking from hash.

unhash_ranking(h, n)[source]

Reconstruct ranking with ties from its hash value.

Inverse operation of ranking_hash. Decodes a hash back to the original ranking, returning competition ranks (min-rank with gaps).

Parameters:
  • h (int) – Hash value in range [0, F(n)-1]

  • n (int) – Number of items to rank

Returns:

Ranking in competition format (min-rank with gaps).

Example: [1, 2, 2, 4] means item 0 is rank 1, items 1 and 2 are tied at rank 2, item 3 is rank 4.

Return type:

list

Raises:

ValueError – If h is out of valid range [0, F(n)-1]

Notes

  • Time complexity: O(n²) for decoding

  • Space complexity: O(n) for ordered Bell numbers

  • Returns competition/min-rank format with gaps after ties

  • Fully reconstructs original ranking including tie structure

Examples

>>> unhash_ranking(0, 3)  # First possible ranking
[1, 2, 3]
>>> unhash_ranking(2, 3)  # Items 1,2 tied
[1, 2, 2]
>>> unhash_ranking(9, 3)  # Items 0,1 tied
[1, 1, 3]
>>> unhash_ranking(12, 3)  # All tied
[1, 1, 1]
>>> unhash_ranking(5, 3)  # Items 0,2 tied at rank 2
[2, 1, 2]

See also

ranking_hash(): Inverse operation to hash a ranking.