TL;DR Problem: During training, updating a lower layer changes the input distribution for the next layer β next layer constantly needs to adapt to changing inputs π‘Idea: mean/variance normalization step between layers
2020-08-16
2020-08-16
Resource Calculus on Computational Graphs: Backpropagation
2020-07-31