Computational Graphs#
Nodes: operations and tensors
Edges: data dependencies
Autograd records operations for reverse-mode AD
Backpropagation#
Forward pass computes activations
Backward pass applies chain rule
Vanishing/exploding gradients motivate architecture choices