Module 8 Assignment: Reproducible training workflow checklist

Module 8 Assignment: Reproducible training workflow checklist#

Theme#

GPU workflows, scale, and deployment

Exercises#

  1. Write a reproducibility checklist for a small deep learning experiment.

  2. Record environment, device, random seed, data split, metrics, and model checkpoint assumptions.

  3. Use the starter cell to report whether the runtime has CUDA and where tensors are placed.

  4. Explain how you would scale the experiment from notebook exploration to a repeatable training job.

Submission#

Submit a 600-900 word technical memo plus any code, plots, or shape traces needed to support your claims. Use the starter cell as a minimum reproducible experiment, then make at least one meaningful modification.

Rubric#

  • Correct use of module vocabulary and notation

  • Clear connection between design choices and data/problem structure

  • Evidence from the starter experiment or your own extension

  • Concise reflection on limitations, failure modes, or next steps

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.randn(4, 4, device=device)
print("torch version:", torch.__version__)
print("device:", device)
print("tensor device:", x.device)
print("Set seeds, log package versions, and save configs before claiming reproducibility.")
torch version: 2.12.0+cu130
device: cpu
tensor device: cpu
Set seeds, log package versions, and save configs before claiming reproducibility.

Reflection prompts#

  • What changed when you modified the starter experiment?

  • Which result surprised you, and what diagnostic would you run next?

  • What assumption would you document before handing this model to another practitioner?