Greedy Decoding
- It's a method used by the decoders to generate texts
- On greedy decoding, for each step, decoder predict the word that is most likely (highest probability) given the previously generated words
- Main issue is that most of the it doesn't give the global optimal result
[!def] Greedy Decoding Formula
$$
\hat{y}^t = argmax_i P_\theta (y_t = w|y_{1:t-1}, X)
$$