knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(knitr)
library(data.table)
One can use data.table::frank()
to rank the rows of a data.table or simply
a vector. Compared to the base R function rank()
, frank()
is faster. Today
I will show how to use this function.
First, let’s generate a example data.table with 10 rows and 3 columns, for simplicity, we will make first 2 columns are integer and the last one is a character. Also, we will duplicate some values to show how tied values are sorted:
set.seed(123)
n <- 10
dt <- data.table(
a = sample(1:10, n, replace = TRUE),
b = sample(1:10, n, replace = TRUE),
c = sample(letters[1:5], n, replace = TRUE)
)
kable(dt, caption = "Example data.table")
a | b | c |
---|---|---|
3 | 5 | a |
3 | 3 | c |
10 | 9 | d |
2 | 9 | a |
6 | 9 | c |
5 | 3 | e |
4 | 8 | d |
6 | 10 | b |
9 | 7 | e |
10 | 10 | a |
First, let’s see how to use frank()
to rank the whole data.table.
dt[, rank := frank(.SD)]
kable(dt[order(rank)], caption = "Ranked data.table")
a | b | c | rank |
---|---|---|---|
2 | 9 | a | 1 |
3 | 3 | c | 2 |
3 | 5 | a | 3 |
4 | 8 | d | 4 |
5 | 3 | e | 5 |
6 | 9 | c | 6 |
6 | 10 | b | 7 |
9 | 7 | e | 8 |
10 | 9 | d | 9 |
10 | 10 | a | 10 |
As you can see, the frank()
function ranks the rows of the data.table
by first checking the first column, then the second column, and finally the third column.
One can also sort a data.table based on selected columns, for example, let’s use the 2nd and 3rd columns to rank the data.table. But for this, one need to use its variant frankv():
dt[, rank := frankv(.SD, cols = c("b","c"))]
kable(dt[order(rank)], caption = "Ranked data.table by 2nd and 3rd columns")
a | b | c | rank |
---|---|---|---|
3 | 3 | c | 1 |
5 | 3 | e | 2 |
3 | 5 | a | 3 |
9 | 7 | e | 4 |
4 | 8 | d | 5 |
2 | 9 | a | 6 |
6 | 9 | c | 7 |
10 | 9 | d | 8 |
10 | 10 | a | 9 |
6 | 10 | b | 10 |
Finally, we would like to talk about the ties.method
argument. To make it simple,
we will simiply use the 2nd column to rank the table so you can see the effect of
the ties.method
argument.
newDT <- dt[, .(b)]
newDT[, rankAverage := frank(b, ties.method = "average")] # the default
newDT[, rankFirst := frank(b, ties.method = "first")]
newDT[, rankLast := frank(b, ties.method = "last")]
newDT[, rankRandom := frank(b, ties.method = "random")]
newDT[, rankMax := frank(b, ties.method = "max")]
newDT[, rankMin := frank(b, ties.method = "min")]
newDT[, rankDense := frank(b, ties.method = "dense")]
kable(newDT[order(b)], caption = "Ranked data.table by 2nd column")
b | rankAverage | rankFirst | rankLast | rankRandom | rankMax | rankMin | rankDense |
---|---|---|---|---|---|---|---|
3 | 1.5 | 1 | 2 | 2 | 2 | 1 | 1 |
3 | 1.5 | 2 | 1 | 1 | 2 | 1 | 1 |
5 | 3.0 | 3 | 3 | 3 | 3 | 3 | 2 |
7 | 4.0 | 4 | 4 | 4 | 4 | 4 | 3 |
8 | 5.0 | 5 | 5 | 5 | 5 | 5 | 4 |
9 | 7.0 | 6 | 8 | 7 | 8 | 6 | 5 |
9 | 7.0 | 7 | 7 | 8 | 8 | 6 | 5 |
9 | 7.0 | 8 | 6 | 6 | 8 | 6 | 5 |
10 | 9.5 | 9 | 10 | 9 | 10 | 9 | 6 |
10 | 9.5 | 10 | 9 | 10 | 10 | 9 | 6 |
As you can see, here are how the ties.method
argument works:
- average: the average of the ranks of the tied values
- first: the order in which the values appear in the data
- last: the order in which the values appear in the data
- random: a random order for the ties
- max: the maximum rank of the tied values
- min: the minimum rank of the tied values
- dense: the values in a tie set get the same rank, and the rank
value increases by 1 when moving to the next tie set. This is
a unique feature of
frank()
and is not available in the base R.
When one wants to use the rank to choose top N rows, it is important
to know how the rank is computed; in this case, you may want to
avoid the ties.method
values: max
, min
, and dense
.
Happy programming 😄
Last modified on 2025-04-12