PyTorch 101: From Tensors to Neural Networks

Welcome to this interactive PyTorch tutorial! By the end, you'll understand:

Tensors - PyTorch's fundamental data structure
Autograd - automatic differentiation for backpropagation
Neural Networks - building and training your first model

Every code cell below is interactive - feel free to modify and re-run!

1. What is PyTorch?

PyTorch is a deep learning framework that provides:

A NumPy-like tensor library with GPU acceleration
Automatic differentiation for building neural networks
A dynamic computation graph (define-by-run)

Let's start by importing it:

import torch
import torch.nn as nn
import torch.nn.functional as F

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

PyTorch version: 2.1.0
CUDA available: False

2. Tensors: The Building Blocks

Tensors are multi-dimensional arrays, similar to NumPy's ndarray, but with GPU support and automatic differentiation capabilities.

Creating Tensors

# Different ways to create tensors

# From Python list
tensor_from_list = torch.tensor([1, 2, 3, 4, 5])

# Zeros and ones
zeros = torch.zeros(3, 4)
ones = torch.ones(2, 3)

# Random tensors
random_tensor = torch.rand(3, 3)  # Uniform [0, 1)
randn_tensor = torch.randn(3, 3)  # Normal distribution

# Like NumPy
arange_tensor = torch.arange(0, 10, 2)
linspace_tensor = torch.linspace(0, 1, 5)

print("From list:", tensor_from_list)
print("\nZeros (3x4):\n", zeros)
print("\nRandom (3x3):\n", random_tensor)
print("\nArange:", arange_tensor)

From list: tensor([1, 2, 3, 4, 5])

Zeros (3x4):
 tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

Random (3x3):
 tensor([[0.4963, 0.7682, 0.0885],
        [0.1320, 0.3074, 0.6341],
        [0.4901, 0.8964, 0.4556]])

Arange: tensor([0, 2, 4, 6, 8])

Tensor Attributes

Every tensor has three key attributes:

t = torch.rand(3, 4)

print(f"Shape: {t.shape}")        # Dimensions
print(f"Dtype: {t.dtype}")        # Data type
print(f"Device: {t.device}")      # CPU or GPU

Shape: torch.Size([3, 4])
Dtype: torch.float32
Device: cpu

3. Tensor Operations

PyTorch supports all the operations you'd expect from a numerical library:

a = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
b = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)

print("a:\n", a)
print("\nb:\n", b)
print("\n--- Operations ---")
print("\na + b:\n", a + b)
print("\na * b (element-wise):\n", a * b)
print("\na @ b (matrix multiply):\n", a @ b)
print("\na.T (transpose):\n", a.T)
print("\na.sum():", a.sum().item())
print("a.mean():", a.mean().item())

a:
 tensor([[1., 2.],
        [3., 4.]])

b:
 tensor([[5., 6.],
        [7., 8.]])

--- Operations ---

a + b:
 tensor([[ 6.,  8.],
        [10., 12.]])

a * b (element-wise):
 tensor([[ 5., 12.],
        [21., 32.]])

a @ b (matrix multiply):
 tensor([[19., 22.],
        [43., 50.]])

a.T (transpose):
 tensor([[1., 3.],
        [2., 4.]])

a.sum(): 10.0
a.mean(): 2.5

4. Autograd: Automatic Differentiation

This is where PyTorch shines. Setting requires_grad=True tells PyTorch to track all operations on a tensor for automatic differentiation.

The Basics

# Create a tensor with gradient tracking
x = torch.tensor([2.0, 3.0], requires_grad=True)

# Perform operations
y = x ** 2        # y = x^2
z = y.sum()       # z = sum(y) = x1^2 + x2^2

print(f"x = {x}")
print(f"y = x^2 = {y}")
print(f"z = sum(y) = {z}")

# Compute gradients
z.backward()

# dz/dx = 2x
print(f"\nGradient dz/dx = 2x = {x.grad}")

x = tensor([2., 3.], requires_grad=True)
y = x^2 = tensor([4., 9.], grad_fn=<PowBackward0>)
z = sum(y) = tensor(13., grad_fn=<SumBackward0>)

Gradient dz/dx = 2x = tensor([4., 6.])

5. Building a Neural Network

Now let's put it all together and build a simple neural network!

The Dataset

We'll use a classic: learning the XOR function.

# XOR dataset
X_train = torch.tensor([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1],
], dtype=torch.float32)

y_train = torch.tensor([
    [0],
    [1],
    [1],
    [0],
], dtype=torch.float32)

print("XOR Truth Table:")
print("X1  X2  |  Y")
print("-" * 14)
for inputs, output in zip(X_train, y_train):
    print(f" {int(inputs[0])}   {int(inputs[1])}  |  {int(output[0])}")

XOR Truth Table:
X1  X2  |  Y
--------------
 0   0  |  0
 0   1  |  1
 1   0  |  1
 1   1  |  0

The Model

XOR isn't linearly separable, so we need a hidden layer:

class XORNet(nn.Module):
    def __init__(self, hidden_size=8):
        super().__init__()
        self.layer1 = nn.Linear(2, hidden_size)
        self.layer2 = nn.Linear(hidden_size, 1)

    def forward(self, x):
        x = torch.sigmoid(self.layer1(x))  # Hidden layer with sigmoid
        x = torch.sigmoid(self.layer2(x))  # Output layer with sigmoid
        return x

# Quick test
_model = XORNet()
print(_model)
print(f"\nTotal parameters: {sum(p.numel() for p in _model.parameters())}")

XORNet(
  (layer1): Linear(in_features=2, out_features=8, bias=True)
  (layer2): Linear(in_features=8, out_features=1, bias=True)
)

Total parameters: 33

6. Key Takeaways

Tensors are the fundamental data structure - like NumPy arrays but with GPU support and autograd.
Autograd automatically computes gradients by tracking operations in a computation graph.
Neural networks are built by:
- Defining layers in __init__
- Defining the forward pass in forward
- PyTorch handles the backward pass automatically!

Training loop:

for epoch in range(epochs):
    output = model(input)      # Forward
    loss = criterion(output, target)
    optimizer.zero_grad()      # Clear gradients
    loss.backward()            # Backward
    optimizer.step()           # Update weights

What's Next?

Now that you have the fundamentals, explore:

CNNs for image data (nn.Conv2d)
RNNs/Transformers for sequences
GPU training with .to('cuda')
Datasets & DataLoaders for real data
Pretrained models from torchvision and transformers

Happy learning!