Module 2 Book Prose#
Gradients, backprop, and computational graphs#
Training requires computing gradients of a loss with respect to parameters. Backpropagation applies the chain rule layer by layer; modern frameworks build dynamic computational graphs so practitioners focus on architecture and data.
Vanishing and Exploding Gradients#
Deep stacks amplify gradient products across layers. Architecture (residual connections, normalization), activations, and optimization choices interact to keep training stable.
Framework Practice#
PyTorch and similar libraries implement reverse-mode autodiff. Students should spend less time deriving every partial derivative by hand and more time interpreting gradient flow and debugging training.