Gradient Descent
Videos:
- Gradient Descent
What is Gradient Descent?
Simply put, gradient descent is an algorithm for minimizing the loss function of a model during training.
J
, the cost function, is a convex function. Referring to the shape of the output of theJ
function — it’s a bowl shape. Not a squiggly shape. That means that no matter where we start, we eventually end up in the same area.- As mentioned, regardless of the staring point in the data set,
J
always puts us at the same level in the end for the overall cost function. We call that “level” in the “end” the global optimum.
Here’s Andrew Ng’s illustration of gradient descent, using mathematical notation:
repeat {
w := w - 𝛼(dj(w)/dw)
}
𝛼
is the learning ratew
is the weights and biases- Gradient descent works by iteratively
w
andb
in the opposite direction ofJ
, until the loss function eventually hits “global optimum”, the lowest it will go before it starts to go up again. This is what we know as “convergence”.
Last modified: