Rakshith V.
Jun 1, 2021

If i'm not wrong, there is no condition of equivalence for weight decay and L2 regularization, as per as your derivation in both cases we end-up having (1-learning_rate * lambda)w (Intrested in only first term)

You have used alpha as learning rate in one case and

η in another case

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response