PyTorch 101: From Tensors to Neural Networks
Welcome to this interactive PyTorch tutorial! By the end, you'll understand:
- Tensors - PyTorch's fundamental data structure
- Autograd - automatic differentiation for backpropagation
- Neural Networks - building and training your first model
Every code cell below is interactive - feel free to modify and re-run!
1. What is PyTorch?
PyTorch is a deep learning framework that provides:
- A NumPy-like tensor library with GPU acceleration
- Automatic differentiation for building neural networks
- A dynamic computation graph (define-by-run)
Let's start by importing it:
import torch
import torch.nn as nn
import torch.nn.functional as F
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
PyTorch version: 2.1.0
CUDA available: False
2. Tensors: The Building Blocks
Tensors are multi-dimensional arrays, similar to NumPy's ndarray, but with GPU support and automatic differentiation capabilities.
Creating Tensors
# Different ways to create tensors
# From Python list
tensor_from_list = torch.tensor([1, 2, 3, 4, 5])
# Zeros and ones
zeros = torch.zeros(3, 4)
ones = torch.ones(2, 3)
# Random tensors
random_tensor = torch.rand(3, 3) # Uniform [0, 1)
randn_tensor = torch.randn(3, 3) # Normal distribution
# Like NumPy
arange_tensor = torch.arange(0, 10, 2)
linspace_tensor = torch.linspace(0, 1, 5)
print("From list:", tensor_from_list)
print("\nZeros (3x4):\n", zeros)
print("\nRandom (3x3):\n", random_tensor)
print("\nArange:", arange_tensor)
From list: tensor([1, 2, 3, 4, 5])
Zeros (3x4):
tensor([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Random (3x3):
tensor([[0.4963, 0.7682, 0.0885],
[0.1320, 0.3074, 0.6341],
[0.4901, 0.8964, 0.4556]])
Arange: tensor([0, 2, 4, 6, 8])
Tensor Attributes
Every tensor has three key attributes:
t = torch.rand(3, 4)
print(f"Shape: {t.shape}") # Dimensions
print(f"Dtype: {t.dtype}") # Data type
print(f"Device: {t.device}") # CPU or GPU
Shape: torch.Size([3, 4])
Dtype: torch.float32
Device: cpu
3. Tensor Operations
PyTorch supports all the operations you'd expect from a numerical library:
a = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
b = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)
print("a:\n", a)
print("\nb:\n", b)
print("\n--- Operations ---")
print("\na + b:\n", a + b)
print("\na * b (element-wise):\n", a * b)
print("\na @ b (matrix multiply):\n", a @ b)
print("\na.T (transpose):\n", a.T)
print("\na.sum():", a.sum().item())
print("a.mean():", a.mean().item())
a:
tensor([[1., 2.],
[3., 4.]])
b:
tensor([[5., 6.],
[7., 8.]])
--- Operations ---
a + b:
tensor([[ 6., 8.],
[10., 12.]])
a * b (element-wise):
tensor([[ 5., 12.],
[21., 32.]])
a @ b (matrix multiply):
tensor([[19., 22.],
[43., 50.]])
a.T (transpose):
tensor([[1., 3.],
[2., 4.]])
a.sum(): 10.0
a.mean(): 2.5
4. Autograd: Automatic Differentiation
This is where PyTorch shines. Setting requires_grad=True tells PyTorch to track all operations on a tensor for automatic differentiation.
The Basics
# Create a tensor with gradient tracking
x = torch.tensor([2.0, 3.0], requires_grad=True)
# Perform operations
y = x ** 2 # y = x^2
z = y.sum() # z = sum(y) = x1^2 + x2^2
print(f"x = {x}")
print(f"y = x^2 = {y}")
print(f"z = sum(y) = {z}")
# Compute gradients
z.backward()
# dz/dx = 2x
print(f"\nGradient dz/dx = 2x = {x.grad}")
x = tensor([2., 3.], requires_grad=True)
y = x^2 = tensor([4., 9.], grad_fn=<PowBackward0>)
z = sum(y) = tensor(13., grad_fn=<SumBackward0>)
Gradient dz/dx = 2x = tensor([4., 6.])
5. Building a Neural Network
Now let's put it all together and build a simple neural network!
The Dataset
We'll use a classic: learning the XOR function.
# XOR dataset
X_train = torch.tensor([
[0, 0],
[0, 1],
[1, 0],
[1, 1],
], dtype=torch.float32)
y_train = torch.tensor([
[0],
[1],
[1],
[0],
], dtype=torch.float32)
print("XOR Truth Table:")
print("X1 X2 | Y")
print("-" * 14)
for inputs, output in zip(X_train, y_train):
print(f" {int(inputs[0])} {int(inputs[1])} | {int(output[0])}")
XOR Truth Table:
X1 X2 | Y
--------------
0 0 | 0
0 1 | 1
1 0 | 1
1 1 | 0
The Model
XOR isn't linearly separable, so we need a hidden layer:
class XORNet(nn.Module):
def __init__(self, hidden_size=8):
super().__init__()
self.layer1 = nn.Linear(2, hidden_size)
self.layer2 = nn.Linear(hidden_size, 1)
def forward(self, x):
x = torch.sigmoid(self.layer1(x)) # Hidden layer with sigmoid
x = torch.sigmoid(self.layer2(x)) # Output layer with sigmoid
return x
# Quick test
_model = XORNet()
print(_model)
print(f"\nTotal parameters: {sum(p.numel() for p in _model.parameters())}")
XORNet(
(layer1): Linear(in_features=2, out_features=8, bias=True)
(layer2): Linear(in_features=8, out_features=1, bias=True)
)
Total parameters: 33
6. Key Takeaways
Tensors are the fundamental data structure - like NumPy arrays but with GPU support and autograd.
Autograd automatically computes gradients by tracking operations in a computation graph.
Neural networks are built by:
- Defining layers in
__init__ - Defining the forward pass in
forward - PyTorch handles the backward pass automatically!
- Defining layers in
Training loop:
for epoch in range(epochs): output = model(input) # Forward loss = criterion(output, target) optimizer.zero_grad() # Clear gradients loss.backward() # Backward optimizer.step() # Update weights
What's Next?
Now that you have the fundamentals, explore:
- CNNs for image data (
nn.Conv2d) - RNNs/Transformers for sequences
- GPU training with
.to('cuda') - Datasets & DataLoaders for real data
- Pretrained models from
torchvisionandtransformers
Happy learning!