Zhenguo Zhang's Blog
Sharing makes life better
[R] data.table's frank()
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(knitr)
library(data.table)

One can use data.table::frank() to rank the rows of a data.table or simply a vector. Compared to the base R function rank(), frank() is faster. Today I will show how to use this function.

First, let’s generate a example data.table with 10 rows and 3 columns, for simplicity, we will make first 2 columns are integer and the last one is a character. Also, we will duplicate some values to show how tied values are sorted:

set.seed(123)
n <- 10
dt <- data.table(
  a = sample(1:10, n, replace = TRUE),
  b = sample(1:10, n, replace = TRUE),
  c = sample(letters[1:5], n, replace = TRUE)
)
kable(dt, caption = "Example data.table")
Table 1: Example data.table
a b c
3 5 a
3 3 c
10 9 d
2 9 a
6 9 c
5 3 e
4 8 d
6 10 b
9 7 e
10 10 a

First, let’s see how to use frank() to rank the whole data.table.

dt[, rank := frank(.SD)]
kable(dt[order(rank)], caption = "Ranked data.table")
Table 2: Ranked data.table
a b c rank
2 9 a 1
3 3 c 2
3 5 a 3
4 8 d 4
5 3 e 5
6 9 c 6
6 10 b 7
9 7 e 8
10 9 d 9
10 10 a 10

As you can see, the frank() function ranks the rows of the data.table by first checking the first column, then the second column, and finally the third column.

One can also sort a data.table based on selected columns, for example, let’s use the 2nd and 3rd columns to rank the data.table. But for this, one need to use its variant frankv():

dt[, rank := frankv(.SD, cols = c("b","c"))]
kable(dt[order(rank)], caption = "Ranked data.table by 2nd and 3rd columns")
Table 3: Ranked data.table by 2nd and 3rd columns
a b c rank
3 3 c 1
5 3 e 2
3 5 a 3
9 7 e 4
4 8 d 5
2 9 a 6
6 9 c 7
10 9 d 8
10 10 a 9
6 10 b 10

Finally, we would like to talk about the ties.method argument. To make it simple, we will simiply use the 2nd column to rank the table so you can see the effect of the ties.method argument.

newDT <- dt[, .(b)]
newDT[, rankAverage := frank(b, ties.method = "average")] # the default
newDT[, rankFirst := frank(b, ties.method = "first")]
newDT[, rankLast := frank(b, ties.method = "last")]
newDT[, rankRandom := frank(b, ties.method = "random")]
newDT[, rankMax := frank(b, ties.method = "max")]
newDT[, rankMin := frank(b, ties.method = "min")]
newDT[, rankDense := frank(b, ties.method = "dense")]
kable(newDT[order(b)], caption = "Ranked data.table by 2nd column")
Table 4: Ranked data.table by 2nd column
b rankAverage rankFirst rankLast rankRandom rankMax rankMin rankDense
3 1.5 1 2 2 2 1 1
3 1.5 2 1 1 2 1 1
5 3.0 3 3 3 3 3 2
7 4.0 4 4 4 4 4 3
8 5.0 5 5 5 5 5 4
9 7.0 6 8 7 8 6 5
9 7.0 7 7 8 8 6 5
9 7.0 8 6 6 8 6 5
10 9.5 9 10 9 10 9 6
10 9.5 10 9 10 10 9 6

As you can see, here are how the ties.method argument works:

  • average: the average of the ranks of the tied values
  • first: the order in which the values appear in the data
  • last: the order in which the values appear in the data
  • random: a random order for the ties
  • max: the maximum rank of the tied values
  • min: the minimum rank of the tied values
  • dense: the values in a tie set get the same rank, and the rank value increases by 1 when moving to the next tie set. This is a unique feature of frank() and is not available in the base R.

When one wants to use the rank to choose top N rows, it is important to know how the rank is computed; in this case, you may want to avoid the ties.method values: max, min, and dense.

Happy programming 😄


Last modified on 2025-04-12

Comments powered by Disqus