• LSTM = Long Short Term Memory
  • LSTM unit shares weight
    • That is the main reason that LSTM can work with the variable input and output
  • It can handle longer sequence that RNN
  • LSTM uses both Sigmoid Function and Tanh Activation Function
  • The problem is that whatever the input size the context vector to decoder is fixed and so information get lost,
    • Solution: Transformer

[!def] LSTM Unit Steps

  1. Forget Gate: What percent of previous Long Term Memory to remember
  2. Input Gate:
  3. Calculate Potential Long Term Memory for this unit
  4. What percent of current Long Term Memory to remember
  5. Output Gate:
  6. Calculate current Short Term Memory
  7. What percent of current Short Term Memory to remember
  8. Output = current short term + newly formed long term

Pasted image 20231022012330.png