Jun 1, 2021
If i'm not wrong, there is no condition of equivalence for weight decay and L2 regularization, as per as your derivation in both cases we end-up having (1-learning_rate * lambda)w (Intrested in only first term)
You have used alpha as learning rate in one case and
η in another case