Module 8 Assignment: Reproducible training workflow checklist#
Theme#
GPU workflows, scale, and deployment
Exercises#
Write a reproducibility checklist for a small deep learning experiment.
Record environment, device, random seed, data split, metrics, and model checkpoint assumptions.
Use the starter cell to report whether the runtime has CUDA and where tensors are placed.
Explain how you would scale the experiment from notebook exploration to a repeatable training job.
Submission#
Submit a 600-900 word technical memo plus any code, plots, or shape traces needed to support your claims. Use the starter cell as a minimum reproducible experiment, then make at least one meaningful modification.
Rubric#
Correct use of module vocabulary and notation
Clear connection between design choices and data/problem structure
Evidence from the starter experiment or your own extension
Concise reflection on limitations, failure modes, or next steps
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.randn(4, 4, device=device)
print("torch version:", torch.__version__)
print("device:", device)
print("tensor device:", x.device)
print("Set seeds, log package versions, and save configs before claiming reproducibility.")
torch version: 2.12.0+cu130
device: cpu
tensor device: cpu
Set seeds, log package versions, and save configs before claiming reproducibility.
Reflection prompts#
What changed when you modified the starter experiment?
Which result surprised you, and what diagnostic would you run next?
What assumption would you document before handing this model to another practitioner?