Ranking

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Some notes on ranking techniques

Basics

Pointwise, Pairwise and Listwise Learning to Rank

Point-wise ranking

In point-wise ranking, you have some scores for you document \(\displaystyle y_i\) so you can train your model \(\displaystyle f\) to predict such scores in a supervised manner.

Pair-wise ranking

If you data is of the form: \(\displaystyle y(x_a) \gt y(x_b)\) then you can train so that your model maximizes \(\displaystyle f(x_a) - f(x_b)\) using a hinge loss: \(\displaystyle \begin{equation} L(x_a, x_b) = max(0, 1-(f(x_a) - f(x_b))) \end{equation} \)

Listwise ranking

Use something like ListMLE

Metrics

See https://medium.com/swlh/rank-aware-recsys-evaluation-metrics-5191bba16832

Cumulative Gain

Suppose you have a list of results \(\displaystyle x_1,..., x_n\) with relevency \(\displaystyle r_1,...,r_n\).
Then the cumulative gain at position \(\displaystyle p\) is the sum of the relevency of the first \(\displaystyle p\) results: \(\displaystyle \begin{equation} CG_p = \sum_{i=1}^{p} r_i \end{equation} \)

The discounted cumulative gain (DCG) takes the position into account, discounting lower-ranked results: \(\displaystyle \begin{equation} DCG_p = \sum_{i=1}^{p} \frac{r_i}{\log_2 (i+1)} \end{equation} \)

The normalized discounted cumulative gain (NDCG) is 1-normalized by dividing over the best possible ranking: \(\displaystyle \begin{equation} NCDG_p = \frac{DCG_g(\mathbf{r})}{\max_{\mathbf{r}}DCG_p(\mathbf{r})} \end{equation} \)

Mean Reciprocal Rank

If you only have one correct answer which is placed in rank \(\displaystyle i\) then the reciprocal rank is \(\displaystyle 1/i\).
For multiple queries and results, the mean reciprocal rank is simply \(\displaystyle \operatorname{mean}(1/rank)\).