• ROGUE = Recall-Oriented Understudy for Gisting Evaluation
  • As ROGUE compare with the ALL target sentences, it is often compared with Recall
  • Better at comparing semantic meaning than ROUGE-N Score
  • Heavily used in Text Summarization, Also usually used in Machine Translation with BLEU Score

[!def] ROGUE-L Score
\text{ROGUE-L} = \frac{\text{Length of ALL LCS (same size) on both generation and reference}}{\text{# of words in reference}}

Problems with ROGUE Score

  1. Hard to compare with different tokenizers
  2. Doesn't consider synonyms