ROUGE-N Score

  • ROGUE = Recall-Oriented Understudy for Gisting Evaluation
  • As ROGUE compare with the ALL target sentences, it is often compared with Recall
  • ROGUE-N score is the sum of Recall of N-grams
  • Heavily used in Text Summarization, Also usually used in Machine Translation with BLEU Score

[!def] ROGUE Score
$$
\text{ROGUE-N} = \sum_{i=1}^N Recall_i
$$
$$
\text{Recall}_n = \frac{\text{# of n-grams matched both on generation and on reference}}{\text{# of n-grams in reference}}
$$

Problems with ROGUE Score

  1. Doesn't consider semantic meaning
  2. Hard to compare with different tokenizers
  3. Doesn't consider synonyms