Greedy Decoding

  • It's a method used by the decoders to generate texts
  • On greedy decoding, for each step, decoder predict the word that is most likely (highest probability) given the previously generated words
  • Main issue is that most of the it doesn't give the global optimal result

[!def] Greedy Decoding Formula
\hat{y}^t = argmax_i P_\theta (y_t = w|y_{1:t-1}, X)