Beam Search

  • Its a method used to generate texts based on the probability
  • Depends on Beam size B
  • Better than Greedy Decoding as looking at multiple possibilities than 1
    • when B = 1, it's Greedy Decoding
    • Larger B: Better Result, Slow decoding
    • Small B: Worse Result, Faster Decoding
  • Beam Search is mostly used in inference, but it can be used in training [1]


  1. Start with <SOS> token
  2. For each step,
    1. Find the top B words with most probabilities, given encoded input X and generated output for this time t, $Y_{t-1}$
    2. Take the top B
    3. Continue to Step 2, unless <EOS> is generated