Module 3 Assignment: Optimizer and regularization comparison#
Theme#
Optimization, loss, and regularization
Exercises#
Train the same small model with SGD and Adam on a synthetic dataset.
Compare at least two regularization settings such as weight decay, dropout, or early stopping.
Plot or summarize loss curves and final accuracy for each run.
Recommend a training configuration and justify the tradeoff between fit and generalization.
Submission#
Submit a 600-900 word technical memo plus any code, plots, or shape traces needed to support your claims. Use the starter cell as a minimum reproducible experiment, then make at least one meaningful modification.
Rubric#
Correct use of module vocabulary and notation
Clear connection between design choices and data/problem structure
Evidence from the starter experiment or your own extension
Concise reflection on limitations, failure modes, or next steps
import torch
from torch import nn
torch.manual_seed(3)
X = torch.randn(160, 6)
y = ((X[:, :3].sum(dim=1) + 0.35 * torch.randn(160)) > 0).long()
def train(optimizer_name="Adam", weight_decay=0.0):
model = nn.Sequential(nn.Linear(6, 16), nn.ReLU(), nn.Linear(16, 2))
opt_cls = torch.optim.Adam if optimizer_name == "Adam" else torch.optim.SGD
opt = opt_cls(model.parameters(), lr=0.04, weight_decay=weight_decay)
loss_fn = nn.CrossEntropyLoss()
losses = []
for _ in range(90):
opt.zero_grad()
loss = loss_fn(model(X), y)
loss.backward()
opt.step()
losses.append(loss.item())
return losses[-1]
for name in ["SGD", "Adam"]:
print(name, train(name, weight_decay=1e-3))
SGD 0.38726890087127686
Adam 0.010066337883472443
Reflection prompts#
What changed when you modified the starter experiment?
Which result surprised you, and what diagnostic would you run next?
What assumption would you document before handing this model to another practitioner?