Module 8 Lab: GPU profiling checklist#
Profile device placement, runtime metadata, and batch-size effects for a tiny training loop.
Run the setup cell, inspect the printed diagnostics, and then complete the exercises at the end. The lab is intentionally small enough to run in GitHub Codespaces without a GPU.
import time
import torch
from torch import nn
torch.manual_seed(18)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = nn.Sequential(nn.Linear(32, 64), nn.ReLU(), nn.Linear(64, 2)).to(device)
loss_fn = nn.CrossEntropyLoss()
opt = torch.optim.Adam(model.parameters(), lr=0.01)
def timed_step(batch_size):
X = torch.randn(batch_size, 32, device=device)
y = torch.randint(0, 2, (batch_size,), device=device)
start = time.perf_counter()
opt.zero_grad()
loss = loss_fn(model(X), y)
loss.backward()
opt.step()
if device.type == "cuda":
torch.cuda.synchronize()
return loss.item(), time.perf_counter() - start
for batch in [16, 64, 256]:
loss, seconds = timed_step(batch)
print(f"device={device} batch={batch:3d} loss={loss:.3f} step_seconds={seconds:.5f}")
print("Document hardware, seed, batch size, package versions, and metric definitions.")
device=cpu batch= 16 loss=0.700 step_seconds=0.00193
device=cpu batch= 64 loss=0.680 step_seconds=0.00063
device=cpu batch=256 loss=0.724 step_seconds=0.00064
Document hardware, seed, batch size, package versions, and metric definitions.
Lab exercises#
Change one model or data parameter and rerun the lab.
Record whether the metric improved, worsened, or stayed roughly the same.
Add one sentence connecting the result to GPU workflows, scale, and deployment.
Identify one limitation of this toy setup before applying the idea to a real dataset.
# Reflection workspace
observation = ""
next_experiment = ""
print({"observation": observation, "next_experiment": next_experiment})
{'observation': '', 'next_experiment': ''}