Code Tutorial 10 min read

Building Your First Neural Network in Python from Scratch

Neural Networks Guide

November 30, 2024

The best way to understand neural networks is to build one yourself. In this tutorial, we'll create a complete neural network from scratch using only NumPy — no PyTorch, no TensorFlow, just pure math and Python.

Prerequisites

Basic Python knowledge and an understanding of linear algebra (matrix multiplication). Familiarity with derivatives is helpful but not required.

Step 1: Define the Network Structure

We'll build a simple 3-layer network: an input layer, one hidden layer, and an output layer. Our goal is to train it to learn the XOR function — a classic problem that a single perceptron cannot solve.

neural_net.py

import numpy as np

# XOR dataset
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([[0], [1], [1], [0]])

# Initialize weights randomly
np.random.seed(42)
W1 = np.random.randn(2, 4) * 0.5
b1 = np.zeros((1, 4))
W2 = np.random.randn(4, 1) * 0.5
b2 = np.zeros((1, 1))

Step 2: Define Activation Functions

We need the sigmoid function for our activations and its derivative for backpropagation:

activations.py

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_derivative(z):
    s = sigmoid(z)
    return s * (1 - s)

Step 3: The Training Loop

Now we put it all together — forward pass, loss computation, backward pass, and weight update — repeated for many iterations:

training.py

learning_rate = 1.0

for epoch in range(10000):
    # Forward pass
    z1 = X @ W1 + b1
    a1 = sigmoid(z1)
    z2 = a1 @ W2 + b2
    a2 = sigmoid(z2)
    
    # Compute loss (MSE)
    loss = np.mean((a2 - y) ** 2)
    
    # Backward pass
    dz2 = (a2 - y) * sigmoid_derivative(z2)
    dW2 = a1.T @ dz2 / 4
    db2 = np.sum(dz2, axis=0) / 4
    dz1 = (dz2 @ W2.T) * sigmoid_derivative(z1)
    dW1 = X.T @ dz1 / 4
    db1 = np.sum(dz1, axis=0) / 4
    
    # Update weights
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1

print(f"Predictions: {a2.round(2).flatten()}")
# Output: [0.02, 0.98, 0.98, 0.02] ≈ XOR!

Understanding What Happened

After 10,000 iterations, our network learned to approximate XOR perfectly. The hidden layer learned to create two intermediate features that, when combined by the output layer, produce the correct XOR output.

This is the fundamental power of neural networks: by stacking layers of simple operations, they can learn to represent complex, non-linear functions. A single layer couldn't solve XOR, but two layers can — and deeper networks can solve far more complex problems.

Key Takeaways

Neural networks are just chains of matrix multiplications and non-linear activation functions.
Weight initialization matters — random initialization breaks symmetry and allows different neurons to learn different features.
The training loop (forward → loss → backward → update) is the same whether you have 4 parameters or 175 billion.

Next Steps

Now that you've built a network from scratch, try modifying it! Add more hidden layers, experiment with different learning rates, or try a different dataset. Then explore our backpropagation deep-dive for the mathematical foundations.

Backpropagation Deep Dive → More Code Examples