Generalization

Generalization: Ability to Apply what was learned during Training to new (Test) Data

Reasons for bad generalization

  • Overfitting/Overtraining (trained too long)
  • Too little training material
  • Too many Parameters (weights) or inappropriate network architecture error
ζˆͺ屏2020-08-17 12.59.51

Prevent Overfitting

  • The obviously best approach: Collect More Data! πŸ’ͺ
  • If Data is Limited
    • Simplest Method for Best Generalization: Early Stopping
    • Optimize Parameters/Arcitecture

Destructive Methods

Reduce Complexity of Network through Regularization

Optimal Brain Damage

  • πŸ’‘Idea: Certain connections are removed from the network to reduce complexity and to avoide overfitting
  • Remove those connections that have the least effect on the Error (MSE, ..), i.e. are the least important.
    • But this is time consuming (difficult) πŸ€ͺ

Constructive Methods

Iteratively Increasing/Growing a Network (construktive) starting from a very small one

Cascade Correlation

ζˆͺ屏2020-08-17 13.22.42 ζˆͺ屏2020-08-17 13.23.01 ζˆͺ屏2020-08-17 13.23.22
  • Adding a hidden unit

    • Input connections from all input units and from all already existing hidden units
    • First only these connections are adapted
    • Maximize the correlation between the activation of the candidate units and the residual error of the net
  • Not necessary to determine the number of hidden units empirically

  • Can produce deep networks without dramatic slowdown (bottom up, constructive learning)

  • At each point only one layer of connections is trained

  • Learning is fast

  • Learning is incremental

Dropout

  • Popular and very effective method for generalization

  • πŸ’‘Idea

    • Randomly drop out (zero) hidden units and input features during training
    • Prevents feature co-adaptation
  • Illustration

    ζˆͺ屏2020-08-17 13.27.32
  • Dropout training & test

    ζˆͺ屏2020-08-17 13.28.32
Previous
Next