Module 4 Assignment: CNN architecture memo#
Theme#
Convolutional neural networks for vision
Exercises#
Design a CNN for a small grayscale image classification problem.
Specify convolution, activation, pooling, flattening, and classification-head choices.
Use the starter model to inspect output shapes after each stage.
Explain why local receptive fields and weight sharing fit image data.
Submission#
Submit a 600-900 word technical memo plus any code, plots, or shape traces needed to support your claims. Use the starter cell as a minimum reproducible experiment, then make at least one meaningful modification.
Rubric#
Correct use of module vocabulary and notation
Clear connection between design choices and data/problem structure
Evidence from the starter experiment or your own extension
Concise reflection on limitations, failure modes, or next steps
import torch
from torch import nn
model = nn.Sequential(
nn.Conv2d(1, 4, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
nn.Conv2d(4, 8, kernel_size=3, padding=1), nn.ReLU(), nn.AdaptiveAvgPool2d((1, 1)),
nn.Flatten(), nn.Linear(8, 2)
)
x = torch.randn(5, 1, 16, 16)
for layer in model:
x = layer(x)
print(f"{layer.__class__.__name__:>18}: {tuple(x.shape)}")
Conv2d: (5, 4, 16, 16)
ReLU: (5, 4, 16, 16)
MaxPool2d: (5, 4, 8, 8)
Conv2d: (5, 8, 8, 8)
ReLU: (5, 8, 8, 8)
AdaptiveAvgPool2d: (5, 8, 1, 1)
Flatten: (5, 8)
Linear: (5, 2)
Reflection prompts#
What changed when you modified the starter experiment?
Which result surprised you, and what diagnostic would you run next?
What assumption would you document before handing this model to another practitioner?