

Generalization: Ability to Apply what was learned during Training to new (Test) Data

Reasons for bad generalization

  • Overfitting/Overtraining (trained too long)
  • Too little training material
  • Too many Parameters (weights) or inappropriate network architecture error
截屏2020-08-17 12.59.51

Prevent Overfitting

  • The obviously best approach: Collect More Data! 💪
  • If Data is Limited
    • Simplest Method for Best Generalization: Early Stopping
    • Optimize Parameters/Arcitecture

Destructive Methods

Reduce Complexity of Network through Regularization

Optimal Brain Damage

  • 💡Idea: Certain connections are removed from the network to reduce complexity and to avoide overfitting
  • Remove those connections that have the least effect on the Error (MSE, ..), i.e. are the least important.
    • But this is time consuming (difficult) 🤪

Constructive Methods

Iteratively Increasing/Growing a Network (construktive) starting from a very small one

Cascade Correlation

截屏2020-08-17 13.22.42 截屏2020-08-17 13.23.01 截屏2020-08-17 13.23.22
  • Adding a hidden unit

    • Input connections from all input units and from all already existing hidden units
    • First only these connections are adapted
    • Maximize the correlation between the activation of the candidate units and the residual error of the net
  • Not necessary to determine the number of hidden units empirically

  • Can produce deep networks without dramatic slowdown (bottom up, constructive learning)

  • At each point only one layer of connections is trained

  • Learning is fast

  • Learning is incremental


  • Popular and very effective method for generalization

  • 💡Idea

    • Randomly drop out (zero) hidden units and input features during training
    • Prevents feature co-adaptation
  • Illustration

    截屏2020-08-17 13.27.32
  • Dropout training & test

    截屏2020-08-17 13.28.32