Training A Deep Neural Network

  1. Data: Get as many data as possible
  2. Hidden Units: Better to have more hidden units than less, with less the model can be prone to Underfitting
  3. Weights: Follow Weight Initialization
  4. Activation Function: ReLU is the rule of thumb for hidden units, for output Sigmoid Function or Softmax, depending on the output type
  5. Learning Rate: Try to use Learning Rate Scheduler or even low learning rate. Never high, because that will overshoot the convergence
  6. Debug: Follow Debugging Neural Network