ROUGE-L Score
- ROGUE = Recall-Oriented Understudy for Gisting Evaluation
- As ROGUE compare with the ALL target sentences, it is often compared with Recall
- Better at comparing semantic meaning than ROUGE-N Score
- Heavily used in Text Summarization, Also usually used in Machine Translation with BLEU Score
[!def] ROGUE-L Score
$$
\text{ROGUE-L} = \frac{\text{Length of ALL LCS (same size) on both generation and reference}}{\text{# of words in reference}}
$$
Problems with ROGUE Score
- Hard to compare with different tokenizers
- Doesn't consider synonyms